-
Notifications
You must be signed in to change notification settings - Fork 2k
Issues: EleutherAI/lm-evaluation-harness
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Author
Label
Projects
Milestones
Assignee
Sort
Issues list
Passing a limit doesn't randomly sample, but rather takes dataset[:limit], introducing dataset bias
#2598
opened Dec 27, 2024 by
aalpat1
How to resolve the “Too Many Requests” issue encountered when using the OpenAI API?
#2594
opened Dec 24, 2024 by
Here1sWqW
Couldn't detect gpu when generation using ray data_parallel_size > 1
#2591
opened Dec 23, 2024 by
zhaocaibei123
How to exactly reproduce the results on the openllm leaderboard?
#2583
opened Dec 19, 2024 by
Zilinghan
Repeated Running Scripts During Perplexity Task Execution on Windows
#2581
opened Dec 19, 2024 by
zhuyuhua-v
Chat template not being used despite passing Something isn't working.
--apply_chat_template
option
bug
#2579
opened Dec 18, 2024 by
juliafalcao
Wrong few_shot format of mgsm zh.
validation
For validation of task implementations.
#2578
opened Dec 18, 2024 by
timturing
train, val-test split: How to select the right split on demand
asking questions
For asking for clarification / support on library usage.
#2573
opened Dec 17, 2024 by
sorobedio
Question: Is there an easy way for me to know all the generation_until tasks?
#2569
opened Dec 14, 2024 by
Ki-Seki
When can support MATH/HumanEval datasets eval
asking questions
For asking for clarification / support on library usage.
#2564
opened Dec 12, 2024 by
shawn0wang
reproduce llama 3 evals
good first issue
Good for newcomers
validation
For validation of task implementations.
#2557
opened Dec 10, 2024 by
baberabb
fail to reproduce Deepseek-math result
asking questions
For asking for clarification / support on library usage.
validation
For validation of task implementations.
#2555
opened Dec 10, 2024 by
zhuqiangLu
Hendrycks Math extraction rule seems too strict
good first issue
Good for newcomers
validation
For validation of task implementations.
#2552
opened Dec 8, 2024 by
fzyzcjy
Inconsistent responses for the same case with different limit parameters
#2550
opened Dec 7, 2024 by
Starry-Liu1
Inquiry about the feature to continue evaluation after abnormal termination
asking questions
For asking for clarification / support on library usage.
#2548
opened Dec 6, 2024 by
minimi-kei
Answer extraction logic for Math Lvl 5 (Open LLM Leaderboard 2) may be too strict
#2539
opened Dec 5, 2024 by
suhara
Previous Next
ProTip!
Find all open issues with in progress development work with linked:pr.