How to reproduce the paper results? #6387

StiphyJay · 2024-12-19T07:46:23Z

Reminder

I have read the README and searched the existing issues.

System Info

llamafactory version: 0.9.2.dev0
Platform: Linux-5.15.0-67-generic-x86_64-with-glibc2.31
Python version: 3.10.12
PyTorch version: 2.4.1+cu121 (GPU)
Transformers version: 4.46.1
Datasets version: 3.1.0
Accelerate version: 1.0.1
PEFT version: 0.12.0
TRL version: 0.9.6
GPU type: NVIDIA GeForce RTX 4090
Bitsandbytes version: 0.45.0

Reproduction

how to reproduce the results about table 4 and table 5 based on the newest codebase

Expected behavior

作者是否可以提供1/2个可以复现出论文中table4/table5结果的case 用于快速复现和对比结果。我目前采用llama3-8b模型去采用table5的方式微调和评估，但整体结果似乎比论文中高出很多？不知是否合理

以下是我的训练和评估脚本设置：
sft train:

model_name_or_path: /data01/llama3-8b-instruct-hf #meta-llama/Meta-Llama-3-8B-Instruct
trust_remote_code: true
#method
stage: sft
do_train: true #true
finetuning_type: lora
lora_target: all
#dataset
dataset: xsum_tiny
template: llama3
cutoff_len: 2048
max_samples: 1000
overwrite_cache: true
preprocessing_num_workers: 16
#output
output_dir: saves/llama3-8b/lora/sft
logging_steps: 10
save_steps: 500
plot_loss: true
overwrite_output_dir: true
#train
per_device_train_batch_size: 1
gradient_accumulation_steps: 8
learning_rate: 1.0e-4
num_train_epochs: 3.0
lr_scheduler_type: cosine
warmup_ratio: 0.1
bf16: true
ddp_timeout: 180000000
#eval
val_size: 0.1
per_device_eval_batch_size: 1
eval_strategy: steps
eval_steps: 500`

sft eval:

model_name_or_path: /data01/llama3-8b-instruct-hf #meta-llama/Meta-Llama-3-8B-Instruct
adapter_name_or_path: saves/llama3-8b/lora/sft
trust_remote_code: true
#method
stage: sft
do_predict: true
finetuning_type: lora
#dataset
eval_dataset: xsum_tiny
template: llama3
cutoff_len: 2048
max_samples: 50
overwrite_cache: true
preprocessing_num_workers: 16
#output
output_dir: saves/llama3-8b/lora/predict_sft_xsum_llama3_8B
overwrite_output_dir: true
#eval
per_device_eval_batch_size: 1
predict_with_generate: true
ddp_timeout: 180000000`

the final result is:
"predict_bleu-4": 53.501343999999996,
"predict_model_preparation_time": 0.0046,
"predict_rouge-1": 54.96382,
"predict_rouge-2": 33.267082,
"predict_rouge-l": 47.676412,
"predict_runtime": 52.8157,
"predict_samples_per_second": 0.947,
"predict_steps_per_second": 0.947

so the corresponding results is: (54.96+33.26+47.67)/3=45.3, however the corresponding results in paper table 5 is 30.63 for LoRA + Llama3-8B. There looks like have a large differences？

Others

No response

The text was updated successfully, but these errors were encountered:

github-actions bot added the pending This problem is yet to be addressed label Dec 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to reproduce the paper results? #6387

How to reproduce the paper results? #6387

StiphyJay commented Dec 19, 2024

How to reproduce the paper results? #6387

How to reproduce the paper results? #6387

Comments

StiphyJay commented Dec 19, 2024

Reminder

System Info

Reproduction

Expected behavior

Others