You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When I try to run the demo in session0, I notice there should be a wiki dataset, but it is missing. I go through the data folder and run the .sh file to download the europarl-v7.fr-en and newstest dataset, but I cannot find the wiki. Can you point out how to obtain the wiki.tok.txt.gz file?
Thanks
The text was updated successfully, but these errors were encountered:
Hi @JunjieHu. Unfortunately, the provided scripts under data directory do not download wikipedia dumps, because of their size (~3.5G), but you can manually download them from here.
If you just want to play with the code, you can use either fr or en side of the europarl-v7.fr-en
We found so many Wikimedia Downloads link(e.g.: Database backup dumps, Mirror Sites of the XML dumps provided above, Static HTML dumps, DVD distributions, Analytics data files, Other files, Kiwix files).
could you please help clarify which one should we use? We would appreciate it.
Hi
When I try to run the demo in session0, I notice there should be a wiki dataset, but it is missing. I go through the data folder and run the .sh file to download the europarl-v7.fr-en and newstest dataset, but I cannot find the wiki. Can you point out how to obtain the wiki.tok.txt.gz file?
Thanks
The text was updated successfully, but these errors were encountered: