BERT unveiled the high potential that transformer owns in a great range of NLP tasks. Inspired by d2l, we use the dataset WikiText-2 to pretrain a miniBERT. Then we finetune the miniBERT to apply to both token-level tasks (SQuAD) in eg.3 and sequence-level tasks (IMDb) in eg.4.
./model
: Save model files../dataset
: Cache directory for datasets.pretrain.py
: Pretrain a miniBERT.WikiTextDataset.py
: Dataset Class for WikiText-2.maskDemo.py
: Demo for MLM task.
python pretrain.py
: Start the pretraining.python maskDemo.py
: Show MLM demo.
-
miniBERT
-
baseBERT
- Note that our BERT model can't do as well as the official BERT model in MLM tasks, due to the insufficient pretraining data we use (only about 10MB wiki-text).