EleutherAI / lm-evaluation-harness Public

Notifications You must be signed in to change notification settings
Fork 2k
Star 7.3k

Code
Issues 344
Pull requests 102
Actions
Projects 1
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Actions
Projects
Security
Insights

Issues: EleutherAI/lm-evaluation-harness

reproduce llama 3 evals

#2557 opened Dec 10, 2024 by baberabb

Open

Labels 10 Milestones 1

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

344 Open 858 Closed

Author

Filter by author

Label

Filter by label

Use alt + click/return to exclude labels

or ⇧ + click/return for logical OR

Projects

Filter by project

Milestones

Filter by milestone

Assignee

Filter by who’s assigned

Assigned to nobody

Sort

Sort by

Newest Oldest Most commented Least commented Recently updated Least recently updated Best match

Most reactions

Issues list

Passing a limit doesn't randomly sample, but rather takes dataset[:limit], introducing dataset bias

#2598 opened Dec 27, 2024 by aalpat1

fail to evaluate piqa

#2597 opened Dec 25, 2024 by vejaxu

How to resolve the “Too Many Requests” issue encountered when using the OpenAI API?

#2594 opened Dec 24, 2024 by Here1sWqW

Couldn't detect gpu when generation using ray data_parallel_size > 1

#2591 opened Dec 23, 2024 by zhaocaibei123

Evalating model on MT-Bench and LBPP

#2590 opened Dec 23, 2024 by sorobedio

Strange memory footprint

#2589 opened Dec 22, 2024 by zxgx

Using ElutherAI to create a interactive evaluation app

#2588 opened Dec 20, 2024 by Karan-0206

Weird results for 70b models

#2584 opened Dec 19, 2024 by BeksultanSagyndyk

How to exactly reproduce the results on the openllm leaderboard?

#2583 opened Dec 19, 2024 by Zilinghan

Repeated Running Scripts During Perplexity Task Execution on Windows

#2581 opened Dec 19, 2024 by zhuyuhua-v

Chat template not being used despite passing --apply_chat_template option bug

Something isn't working.

#2579 opened Dec 18, 2024 by juliafalcao

Wrong few_shot format of mgsm zh. validation

For validation of task implementations.

#2578 opened Dec 18, 2024 by timturing

train, val-test split: How to select the right split on demand asking questions

For asking for clarification / support on library usage.

#2573 opened Dec 17, 2024 by sorobedio

CaseHOLD Task Implementation

#2571 opened Dec 16, 2024 by zolastro

Question: Is there an easy way for me to know all the generation_until tasks?

#2569 opened Dec 14, 2024 by Ki-Seki

When can support MATH/HumanEval datasets eval asking questions

For asking for clarification / support on library usage.

#2564 opened Dec 12, 2024 by shawn0wang

reproduce llama 3 evals good first issue

Good for newcomers

validation

For validation of task implementations.

#2557 opened Dec 10, 2024 by baberabb

fail to reproduce Deepseek-math result asking questions

For asking for clarification / support on library usage.

validation

For validation of task implementations.

#2555 opened Dec 10, 2024 by zhuqiangLu

Hendrycks Math extraction rule seems too strict good first issue

Good for newcomers

validation

For validation of task implementations.

#2552 opened Dec 8, 2024 by fzyzcjy

Inconsistent responses for the same case with different limit parameters

#2550 opened Dec 7, 2024 by Starry-Liu1

Inquiry about the feature to continue evaluation after abnormal termination asking questions

For asking for clarification / support on library usage.

#2548 opened Dec 6, 2024 by minimi-kei

Add Global-MMLU

#2547 opened Dec 6, 2024 by shivalika-singh

Answer extraction logic for Math Lvl 5 (Open LLM Leaderboard 2) may be too strict

#2539 opened Dec 5, 2024 by suhara

Support for for squad dataset

#2538 opened Dec 4, 2024 by danielkorzekwa

lm_eval on squadv2 and meta-llama/Meta-Llama-3.1-8B fails with TypeError: Instance.__init__() got an unexpected keyword argument 'apply_chat_template'

#2537 opened Dec 4, 2024 by danielkorzekwa

Previous 1 2 3 4 5 … 13 14 Next

Previous Next

ProTip! Find all open issues with in progress development work with linked:pr.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly