evaluation for duplicated answer choices #117

KonstantinHebenstreit · 2023-03-08T12:23:49Z

In datasets are sometimes examples with 4 or 5 answer choices. I think what has been done is just to duplicate one of the answer choices to always have 5 choices.
The evaluation script does not include this option.
Since we put letters in front of the choices (A,B,C,D,E), the model can also answer with a letter. But if the right choice it as two places it has two letters. This can lead to wrong evaluation scores based on the letters.

First example is commonsense_qa, but there might be others.

KonstantinHebenstreit · 2023-03-08T20:34:00Z

Helper code to find those examples:

coll["commonsense_qa"].filter(lambda example: len(set(example["choices"]))==4)

KonstantinHebenstreit changed the title ~~Med_qa evaluation for duplicated answer choices~~ evaluation for duplicated answer choices Mar 9, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

evaluation for duplicated answer choices #117

evaluation for duplicated answer choices #117

KonstantinHebenstreit commented Mar 8, 2023 •

edited

Loading

KonstantinHebenstreit commented Mar 8, 2023

evaluation for duplicated answer choices #117

evaluation for duplicated answer choices #117

Comments

KonstantinHebenstreit commented Mar 8, 2023 • edited Loading

KonstantinHebenstreit commented Mar 8, 2023

KonstantinHebenstreit commented Mar 8, 2023 •

edited

Loading