Skip to content

Latest commit

 

History

History

eg2_miniBert

Example 2. Mini BERT Pretrain

BERT unveiled the high potential that transformer owns in a great range of NLP tasks. Inspired by d2l, we use the dataset WikiText-2 to pretrain a miniBERT. Then we finetune the miniBERT to apply to both token-level tasks (SQuAD) in eg.3 and sequence-level tasks (IMDb) in eg.4.

File Structure

  • ./model: Save model files.
  • ./dataset: Cache directory for datasets.
  • pretrain.py: Pretrain a miniBERT.
  • WikiTextDataset.py: Dataset Class for WikiText-2.
  • maskDemo.py: Demo for MLM task.

Run

  • python pretrain.py: Start the pretraining.
  • python maskDemo.py: Show MLM demo.

Result

  • miniBERT
    • parameters: $d_{model}=128, d_{ff}=256, Layer = 2, Head = 2$
    • MLM loss: MLM
    • NSP loss: NSP
  • baseBERT
    • parameters: $d_{model}=768, d_{ff}=3072, Layer = 12, Head = 12$
    • MLM loss: MLM
    • NSP loss: NSP
  • Note that our BERT model can't do as well as the official BERT model in MLM tasks, due to the insufficient pretraining data we use (only about 10MB wiki-text).