Skip to content

Issues: EleutherAI/lm-evaluation-harness

reproduce llama 3 evals
#2557 opened Dec 10, 2024 by baberabb
Open
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Author
Filter by author
Loading
Label
Filter by label
Loading
Use alt + click/return to exclude labels
or + click/return for logical OR
Projects
Filter by project
Loading
Milestones
Filter by milestone
Loading
Assignee
Filter by who’s assigned
Sort

Issues list

fail to evaluate piqa
#2597 opened Dec 25, 2024 by vejaxu
Evalating model on MT-Bench and LBPP
#2590 opened Dec 23, 2024 by sorobedio
Strange memory footprint
#2589 opened Dec 22, 2024 by zxgx
Weird results for 70b models
#2584 opened Dec 19, 2024 by BeksultanSagyndyk
Wrong few_shot format of mgsm zh. validation For validation of task implementations.
#2578 opened Dec 18, 2024 by timturing
train, val-test split: How to select the right split on demand asking questions For asking for clarification / support on library usage.
#2573 opened Dec 17, 2024 by sorobedio
CaseHOLD Task Implementation
#2571 opened Dec 16, 2024 by zolastro
When can support MATH/HumanEval datasets eval asking questions For asking for clarification / support on library usage.
#2564 opened Dec 12, 2024 by shawn0wang
reproduce llama 3 evals good first issue Good for newcomers validation For validation of task implementations.
#2557 opened Dec 10, 2024 by baberabb
fail to reproduce Deepseek-math result asking questions For asking for clarification / support on library usage. validation For validation of task implementations.
#2555 opened Dec 10, 2024 by zhuqiangLu
Hendrycks Math extraction rule seems too strict good first issue Good for newcomers validation For validation of task implementations.
#2552 opened Dec 8, 2024 by fzyzcjy
Inquiry about the feature to continue evaluation after abnormal termination asking questions For asking for clarification / support on library usage.
#2548 opened Dec 6, 2024 by minimi-kei
Add Global-MMLU
#2547 opened Dec 6, 2024 by shivalika-singh
Support for for squad dataset
#2538 opened Dec 4, 2024 by danielkorzekwa
ProTip! Find all open issues with in progress development work with linked:pr.