Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gpt3.5 > gpt4 on pass rate? #14

Open
stanpcf opened this issue Jun 24, 2024 · 1 comment
Open

gpt3.5 > gpt4 on pass rate? #14

stanpcf opened this issue Jun 24, 2024 · 1 comment

Comments

@stanpcf
Copy link

stanpcf commented Jun 24, 2024

great work for tool use!!!
how ever, I had some question about the result. I would be grateful if you reply~~

1. In StableToolBench I find pass rate result in https://zhichengg.github.io/stb.github.io/ show that gpt3.5 > gpt4 in DFS, any analysis on such result?

image while in `ToolBench` which is gpt4 > gpt3.5 (both react and dfs) image

2. much diff vs paper report

below is my rerun result on pass rate.
image

gpt-4-turbo-preview_cot(report on github) is paper report.
gpt-4-turbo-preview_cot(based data_baselines rerun), first download inference answer from https://huggingface.co/datasets/stabletoolbench/baselines, then use dir gpt-4-turbo-preview_cot run pass rate which eval-model is gpt-4-turbo-2024-04-09. which has much diff vs report.
gpt-4-turbo-2024-04-09_cot, run inference via script inference_chatgpt_pipeline_virtual.sh, GPT_MODEL is gpt-4-turbo-2024-04-09. eval model is gpt-4-turbo-2024-04-09.
gpt-4-turbo-2024-04-09_cot_rerun, is same as gpt-4-turbo-2024-04-09_cot but run again. which show that eval pass rate is stable.

@stanpcf
Copy link
Author

stanpcf commented Jun 25, 2024

baselines https://huggingface.co/datasets/stabletoolbench/baselines
file data_baselines/gpt-4-turbo-preview_dfs/G1_instruction/4505_DFS_woFilter_w2.json is broken

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant