Name		Name	Last commit message	Last commit date
parent directory ..
model		model
README.md		README.md
WikiTextDataset.py		WikiTextDataset.py
maskDemo.py		maskDemo.py
mlm_loss_base.png		mlm_loss_base.png
mlm_loss_mini.png		mlm_loss_mini.png
nsp_loss_base.png		nsp_loss_base.png
nsp_loss_mini.png		nsp_loss_mini.png
pretrain.py		pretrain.py

README.md

Example 2. Mini BERT Pretrain

BERT unveiled the high potential that transformer owns in a great range of NLP tasks. Inspired by d2l, we use the dataset WikiText-2 to pretrain a miniBERT. Then we finetune the miniBERT to apply to both token-level tasks (SQuAD) in eg.3 and sequence-level tasks (IMDb) in eg.4.

File Structure

./model: Save model files.
./dataset: Cache directory for datasets.
pretrain.py: Pretrain a miniBERT.
WikiTextDataset.py: Dataset Class for WikiText-2.
maskDemo.py: Demo for MLM task.

Run

python pretrain.py: Start the pretraining.
python maskDemo.py: Show MLM demo.

Result

miniBERT
- parameters: $d_{model}=128, d_{ff}=256, Layer = 2, Head = 2$
- MLM loss: MLM
- NSP loss: NSP
baseBERT
- parameters: $d_{model}=768, d_{ff}=3072, Layer = 12, Head = 12$
- MLM loss: MLM
- NSP loss: NSP
Note that our BERT model can't do as well as the official BERT model in MLM tasks, due to the insufficient pretraining data we use (only about 10MB wiki-text).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

eg2_miniBert

eg2_miniBert

README.md

Example 2. Mini BERT Pretrain

File Structure

Run

Result

Files

eg2_miniBert

Directory actions

More options

Directory actions

More options

Latest commit

History

eg2_miniBert

Folders and files

parent directory

README.md

Example 2. Mini BERT Pretrain

File Structure

Run

Result