You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently, the library does not allow sentences that contain newline characters (i.e. '\n') but rather will split them into subsentences to compute scores. This is due to how the input sentences are read (e.g. see here for the scorer code). A better way to achieve this would be to simply read the files as binary and then apply decoding to the individual lines. Happy to contribute with a small PR if you feel like this might be useful to other users.
🐛 Bug
Currently, the library does not allow sentences that contain newline characters (i.e. '\n') but rather will split them into subsentences to compute scores. This is due to how the input sentences are read (e.g. see here for the scorer code). A better way to achieve this would be to simply read the files as binary and then apply decoding to the individual lines. Happy to contribute with a small PR if you feel like this might be useful to other users.
@ricardorei
To Reproduce
Simply executing COMET on the input files, either via scoring or compare.
Expected behaviour
If I have a file that consists of 1000 lines (i.e.
wc -l output_it/src.txt
), I would expect exactly 1000 sentence-level scores.Environment
OS: Ubuntu 20.04.5 LTS (Focal Fossa)
Python 3.8.16 via Conda
The text was updated successfully, but these errors were encountered: