预训练的差异 #171

447428054 · 2022-03-19T03:21:27Z

GPU(brightmart版, tiny模型):
export BERT_BASE_DIR=./albert_tiny_zh
nohup python3 run_pretraining.py --input_file=./data/tf*.tfrecord
--output_dir=./my_new_model_path --do_train=True --do_eval=True --bert_config_file=$BERT_BASE_DIR/albert_config_tiny.json
--train_batch_size=4096 --max_seq_length=512 --max_predictions_per_seq=51
--num_train_steps=125000 --num_warmup_steps=12500 --learning_rate=0.00176
--save_checkpoints_steps=2000 --init_checkpoint=$BERT_BASE_DIR/albert_model.ckpt &

GPU(Google版本, small模型):
export BERT_BASE_DIR=./albert_small_zh_google
nohup python3 run_pretraining_google.py --input_file=./data/tf*.tfrecord --eval_batch_size=64
--output_dir=./my_new_model_path --do_train=True --do_eval=True --albert_config_file=$BERT_BASE_DIR/albert_config_small_google.json --export_dir=./my_new_model_path_export
--train_batch_size=4096 --max_seq_length=512 --max_predictions_per_seq=20
--num_train_steps=125000 --num_warmup_steps=12500 --learning_rate=0.00176
--save_checkpoints_steps=2000 --init_checkpoint=$BERT_BASE_DIR/albert_model.ckpt

TPU, add something like this:
--use_tpu=True --tpu_name=grpc://10.240.1.66:8470 --tpu_zone=us-central1-a

@brightmart 您好，但我看modeling_google 与 modeling 似乎前者仍是bert embedding方式，后者才是加上了因式分解，那为什么small进行预训练要用与bert一样的方式呢，若理解不对还请指正

brightmart · 2022-03-22T02:08:29Z

modeling_google就是用google albert版本的模型的代码；modeling就是我们自己调整的代码哦。

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

预训练的差异 #171

预训练的差异 #171

447428054 commented Mar 19, 2022 •

edited

Loading

brightmart commented Mar 22, 2022

预训练的差异 #171

预训练的差异 #171

Comments

447428054 commented Mar 19, 2022 • edited Loading

brightmart commented Mar 22, 2022

447428054 commented Mar 19, 2022 •

edited

Loading