Evaluate lines with newline characters #139

LeonardoEmili · 2023-06-05T13:23:11Z

🐛 Bug

Currently, the library does not allow sentences that contain newline characters (i.e. '\n') but rather will split them into subsentences to compute scores. This is due to how the input sentences are read (e.g. see here for the scorer code). A better way to achieve this would be to simply read the files as binary and then apply decoding to the individual lines. Happy to contribute with a small PR if you feel like this might be useful to other users.

@ricardorei

To Reproduce

Simply executing COMET on the input files, either via scoring or compare.

Expected behaviour

If I have a file that consists of 1000 lines (i.e. wc -l output_it/src.txt), I would expect exactly 1000 sentence-level scores.

Environment

OS: Ubuntu 20.04.5 LTS (Focal Fossa)
Python 3.8.16 via Conda

The text was updated successfully, but these errors were encountered:

LeonardoEmili added the bug Something isn't working label Jun 5, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Evaluate lines with newline characters #139

Evaluate lines with newline characters #139

LeonardoEmili commented Jun 5, 2023 •

edited

Loading

Evaluate lines with newline characters #139

Evaluate lines with newline characters #139

Comments

LeonardoEmili commented Jun 5, 2023 • edited Loading

🐛 Bug

To Reproduce

Expected behaviour

Environment

LeonardoEmili commented Jun 5, 2023 •

edited

Loading