How to finetune with a text file? #124
-
Hey! Thanks for your amazing work on Arabic GPT-2. I was wondering: is there a way to finetune the pretrained model using a text file? I know Max Woolf already has this feature in his Colab notebook. Thank you! |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment
-
Hey, Can I suggest you try the pytorch notebook from huggingface here https://github.com/huggingface/notebooks/blob/master/examples/language_modeling.ipynb or you can check the single command training from the examples folder https://github.com/huggingface/transformers/tree/master/examples/pytorch/language-modeling To run on your own training and validation files, use the following command: python run_clm.py \
--model_name_or_path gpt2 \
--train_file path_to_train_file \
--validation_file path_to_validation_file \
--do_train \
--do_eval \
--output_dir /tmp/test-clm This uses the built in HuggingFace Trainer for training. If you want to use a custom training loop, you can utilize or adapt the run_clm_no_trainer.py script. Take a look at the script for a list of supported arguments. An example is shown below: python run_clm_no_trainer.py \
--dataset_name wikitext \
--dataset_config_name wikitext-2-raw-v1 \
--model_name_or_path gpt2 \
--output_dir /tmp/test-clm |
Beta Was this translation helpful? Give feedback.
Hey,
Can I suggest you try the pytorch notebook from huggingface here https://github.com/huggingface/notebooks/blob/master/examples/language_modeling.ipynb
or you can check the single command training from the examples folder https://github.com/huggingface/transformers/tree/master/examples/pytorch/language-modeling
To run on your own training and validation files, use the following command:
This uses the built in HuggingFace Trainer for training. If you want to use a custom training loop, y…