Skip to content

Finetune transformer model trained on Wikipedia on a smaller corpus for classification (99% IMDB validation accuracy).

Notifications You must be signed in to change notification settings

roberto1648/classification_using_transformers

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Classification with Transformers

Modified the code from here with the goal of making it more modular and easier to understand.

Requirements

Create conda env:

conda env create -f requirements.yml

Settings

Edit config.py. Need to particularly fill the upper (required) group of inputs (e.g., data directory).

Data

Make with your data a tsv file (i.e., tab-separated values) that looks like the following:

alt text

An easy way to do so is to make a Pandas dataframe as above and then save like so

train_df.to_csv("/path/to/data/dir/train.tsv", sep="\t", index=False)

dev_df.to_csv("/path/to/data/dir/dev.tsv", sep="\t", index=False)

notes:

  • Make sure that (in config.py): data_dir = /path/to/data/dir/
  • id_b and text_b are not used for classification, but just enter something (leaving them empty gave me an error I believe).
  • Yes, instead of "val" it is called "dev", and it is tab-separated instead of comma-separated.
  • All this could be changed, but just wanted to use the code that was already there.

Run

On terminal type:

python -m transformers_clf.finetune_pretrained

This is because tranformers_clf is a package.

To monitor run tensorboard as usual: tensorboard --logdir runs

Example on IMDB

99% validation accuracy... alt text

About

Finetune transformer model trained on Wikipedia on a smaller corpus for classification (99% IMDB validation accuracy).

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages