Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Evaluate lines with newline characters #139

Open
LeonardoEmili opened this issue Jun 5, 2023 · 0 comments
Open

Evaluate lines with newline characters #139

LeonardoEmili opened this issue Jun 5, 2023 · 0 comments
Labels
bug Something isn't working

Comments

@LeonardoEmili
Copy link

LeonardoEmili commented Jun 5, 2023

🐛 Bug

Currently, the library does not allow sentences that contain newline characters (i.e. '\n') but rather will split them into subsentences to compute scores. This is due to how the input sentences are read (e.g. see here for the scorer code). A better way to achieve this would be to simply read the files as binary and then apply decoding to the individual lines. Happy to contribute with a small PR if you feel like this might be useful to other users.

@ricardorei

To Reproduce

Simply executing COMET on the input files, either via scoring or compare.

Expected behaviour

If I have a file that consists of 1000 lines (i.e. wc -l output_it/src.txt), I would expect exactly 1000 sentence-level scores.

Environment

OS: Ubuntu 20.04.5 LTS (Focal Fossa)
Python 3.8.16 via Conda

@LeonardoEmili LeonardoEmili added the bug Something isn't working label Jun 5, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant