-
Notifications
You must be signed in to change notification settings - Fork 131
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Out-of-memory error #27
Comments
Experiencing the same issue on a Titan V
|
Hi, mustaszewski |
Hey, could you solve your problem? I have the same problem. |
Also having the same problem. |
same issue here, both with CUDA and without. |
Dear Mikel,
first of all congratulations on this great piece of work and thank you for sharing it with the community.
I experienced out-of-memory errors when mapping pre-trained fastText embeddings trained on Wikipedia (https://fasttext.cc/docs/en/pretrained-vectors.html). For the EN-DE language pair, the embeddings are quite large, having 300 dimensions and vocabulary sizes of approx 2.2M to 2.5M.
Out-of-memory errors occurred in both the supervised and unsupervised modes, and both with and without
--cuda
.In the supervised mode (using the EN-DE training dictionary from your 2017 ACL paper), the following error ocurred:
Call:
python3 vecmap/map_embeddings.py --cuda --supervised TRAIN_DICT EMB_SRC EMB_TRG EMB_SRC_MAPPED EMB_TRG_MAPPED --log log.txt --verbose
Output:
In the unsupervised mode (
python3 vecmap/map_embeddings.py --cuda --unsupervised EMB_SRC EMB_TRG EMB_SRC_MAPPED EMB_TRG_MAPPED --log log.txt --verbose
), the error was:I was running vecmap on Google Colab with 12.75 GB RAM and with GPU hardware acceleration activated.
Some more background: Out-of-memory errors occurred even when the target embedding file was much smaller, approx. 0.2 M in size. On the other hand, when both the source and target embeddings were around 0.2 M in size, the mapping worked perfectly fine, both in supervised and unsupervised mode.
What is the recommended way to deal with such memory issues? To limit the vocabulary size of the embedding files? To set the
--batch_size
parameter, or to set the--vocabulary_cutoff
parameter? By the way, when setting the--vocabulary_cutoff
parameter, does vecmap draw a random sample of size n from the original vocabulary, or does it limit the vocabulary to the n most frequent entries?The text was updated successfully, but these errors were encountered: