Code for "On Learning Text Style Transfer with Direct Rewards", NAACL 2021.
We use PyTorch 1.2.0 for our experiments. The dependencies are specified in requirements.txt
Please download the data at Yelp and Amazon and put the downloaded data at ./yelp
and ./amazon
.
Please download the data at IMDb and put the downloaded data at ./imdb
.
Please download the data at GYAMC and put the downloaded data at ./formality_family
. (We use Family & Relationships category for our experiments)
Please rename the downloaded files following the format {sentiment, formality}.{train, dev, test}.{0, 1}.txt
.
For example, sentiment.test.0.txt
contains the negative samples in the test set.
We use huggingface Transformers in our experiments.
bootstrap.py
- First Stage Training
To run the experiment, you may specify the hyperparameters in config
dict, and run
python bootstrap.py --cuda --gpuid [GPUID] -l
main.py
- Second Stage Training
To run the experiment, you may specify the hyperparameters in config
dict, and run
python main.py --cuda --gpuid [GPUID] -l -s -r -p -u
evaluate.py
- Evaluation Functions
To run the experiment, please run
python evaluate.py --cuda --gpuid [GPUID] --file [OUTPUT_FILE_NAME] --dataset [DATASET_NAME] --model_pt [MODEL_CHECKPOINT]
classifier.py
- classifiers for training
Dataloader.py
- dataloaders
gpt_utils.py
- modified transformers function to enable approximate word indexes
ref_sim.py
, sim_models.py
, sim_utils.py
- code for SIM model
utils.py
- utility functions
The results of our models can be found in the ./output
directory.
Each line of the files contains the source, reference and model output, seperated by \t
(source and model output only for IMDb dataset).
Due to licensing restrictions, we only provide the model outputs for GYAMC dataset.