How to finetune with a text file? #124

Mennaruuk · 2021-07-26T05:20:05Z

Mennaruuk
Jul 26, 2021

Hey! Thanks for your amazing work on Arabic GPT-2.

I was wondering: is there a way to finetune the pretrained model using a text file?

I know Max Woolf already has this feature in his Colab notebook.

Thank you!

Answered by WissamAntoun

Jul 27, 2021

Hey,

Can I suggest you try the pytorch notebook from huggingface here https://github.com/huggingface/notebooks/blob/master/examples/language_modeling.ipynb

or you can check the single command training from the examples folder https://github.com/huggingface/transformers/tree/master/examples/pytorch/language-modeling

To run on your own training and validation files, use the following command:

python run_clm.py \
    --model_name_or_path gpt2 \
    --train_file path_to_train_file \
    --validation_file path_to_validation_file \
    --do_train \
    --do_eval \
    --output_dir /tmp/test-clm

This uses the built in HuggingFace Trainer for training. If you want to use a custom training loop, y…

View full answer

WissamAntoun · 2021-07-27T13:59:05Z

WissamAntoun
Jul 27, 2021
Maintainer

Hey,

Can I suggest you try the pytorch notebook from huggingface here https://github.com/huggingface/notebooks/blob/master/examples/language_modeling.ipynb

or you can check the single command training from the examples folder https://github.com/huggingface/transformers/tree/master/examples/pytorch/language-modeling

To run on your own training and validation files, use the following command:

python run_clm.py \
    --model_name_or_path gpt2 \
    --train_file path_to_train_file \
    --validation_file path_to_validation_file \
    --do_train \
    --do_eval \
    --output_dir /tmp/test-clm

This uses the built in HuggingFace Trainer for training. If you want to use a custom training loop, you can utilize or adapt the run_clm_no_trainer.py script. Take a look at the script for a list of supported arguments. An example is shown below:

python run_clm_no_trainer.py \
    --dataset_name wikitext \
    --dataset_config_name wikitext-2-raw-v1 \
    --model_name_or_path gpt2 \
    --output_dir /tmp/test-clm

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to finetune with a text file? #124

{{title}}

Replies: 1 comment

{{title}}

Select a reply

How to finetune with a text file? #124

Mennaruuk Jul 26, 2021

Replies: 1 comment

WissamAntoun Jul 27, 2021 Maintainer

Mennaruuk
Jul 26, 2021

WissamAntoun
Jul 27, 2021
Maintainer