Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tokentisation #2

Open
kritisingh24 opened this issue Aug 7, 2022 · 1 comment
Open

Tokentisation #2

kritisingh24 opened this issue Aug 7, 2022 · 1 comment

Comments

@kritisingh24
Copy link

In the [paper] (https://www.cfilt.iitb.ac.in/iitb_parallel/lrec2018_iitbparallel.pdf) it is mentioned, that the data is tokenised by using Indic_NLP language for Hindi and Moses for English, but this is not the case with the given example code, are the results still reproducible if we avoid this step?

@dipteshkanojia
Copy link
Collaborator

The results shall be reproducible with the help of the methodology mentioned in the paper. The example code just helps is provided to help a beginner in the area start with the task. Always recommended to use the method described in the paper.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants