-
Notifications
You must be signed in to change notification settings - Fork 96
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unable to Reproduce LLM2Vec Training Results Using GradCache on Echo Dataset #135
Comments
Hello @viet-data, where you able to reproduce with GradCache? If you are interested, we'll like to integrate GradCache to the LLM2Vec library |
Hi @vaibhavad , I have successfully trained with gradcache, using a batch size of 128, and achieved results close to those reported in llm2vec. However, I'm curious about llm2vec's performance when scaling up the data. I haven't been able to improve performance with more training data, which might be due to the smaller batch size. Could you share the llm2vec results when training on the full dataset? It would be very useful if you could integrate gradcache into llm2vec to help us train with fewer GPUs. Thank you. |
Hi @viet-data, I reproduced the Llama 3 supervised version trained 1000 steps on the MNTP task and 1000 steps on the E5 dataset (according to the original LLM2Vec) training configs. I currently run the full MTEB evaluation, but the first results look very similar to the ones reported on HuggingFace for the model. I am currently training a Llama 3.1 version with the same training recipe. |
@stefanhgm Thanks so much for sharing! I agree, llm2vec seems quite reproducible. Excited to see your results with Llama 3.1 |
Currently, the evaluation of the Llama 3.1 version on MTEB hangs on a task where repeatedly 391 batches are processed. It already repeats this for over a day now. I think it is the @vaibhavad any chance you observed a similar behavior when evaluating on MTEB? |
Hi @stefanhgm! Would it be possible to share your reproduced numbers? I am currently following the LLM2Vec recipe and for some benchmarks (eg: FiQA2018) the numbers I get are way off from what was reported. 48% vs 55%... |
Hi @atutej, I still have problem with running all tasks because the running time is just very long even when using multiple GPUs. I also asked a question regarding there here: #140 I got FiQA2018 running though and obtained |
This is strange behaviour, I haven't faced this issue. Can you share your evaluation script? |
Hi @stefanhgm I think the 391 batches are just sub-batches in the dataset. I see the same thing but it eventually finishes evaluating. Regarding training: are you doing both mntp followed by supervised? I'm starting from mntp checkpoint provided on huggingface for supervised training. @vaibhavad is it possible there are some differences between following either of the above methods? |
I have been attempting to reproduce the training results on the same echo data. Due to hardware limitations, I had to reimplement the training process using GradCache.
Although my model code can load the LLM2Vec public checkpoint and perform inference correctly, I am unable to achieve comparable performance to LLM2Vec when training a bidirectional Mistral model (without MNTP and unsupervised SimCSE) using GradCache. My training used a batch size of 512 on the echo dataset and stopped after 750 iterations.
Specifically, on the STS tasks, I have not been able to exceed 75 on SICK-R and 65 on STS-12 (other tasks also show low performance, except for BIOSSES).
Has anyone else tried to train LLM2Vec with GradCache, or has anyone successfully reproduced the LLM2Vec results using the original code? Any insights or suggestions would be greatly appreciated.
The text was updated successfully, but these errors were encountered: