Hendrycks Math extraction rule seems too strict #2552

fzyzcjy · 2024-12-08T09:19:30Z

Hi thanks for the library! It seems that the way how math answer is extracted, i.e.

lm-evaluation-harness/lm_eval/tasks/hendrycks_math/utils.py

Lines 20 to 24 in bcb4cbf

    
           indices = [pos for pos, char in enumerate(results[0]) if char == "$"] 
        
           if len(indices) <= 1: 
        
               answer = results[0] 
        
           else: 
        
               answer = results[0][indices[0] + 1 : indices[-1]]

, may be too strict.

For example, the following answer: ... some reasoning logic ... Thus the answer is \[ \boxed{42} \] is not extracted, because it is not a $.

The text was updated successfully, but these errors were encountered:

baberabb · 2024-12-09T13:22:01Z

Hi! This is based on the original code, but we can add another metric to show alongside by adding a flexible-extract filter, as gsm8k does it. PR welcome!

baberabb added good first issue Good for newcomers validation For validation of task implementations. labels Dec 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Hendrycks Math extraction rule seems too strict #2552

Hendrycks Math extraction rule seems too strict #2552

fzyzcjy commented Dec 8, 2024

baberabb commented Dec 9, 2024

Hendrycks Math extraction rule seems too strict #2552

Hendrycks Math extraction rule seems too strict #2552

Comments

fzyzcjy commented Dec 8, 2024

baberabb commented Dec 9, 2024