Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Could you release the reproduction data for your result #5

Open
p1nksnow opened this issue Apr 4, 2024 · 1 comment
Open

Could you release the reproduction data for your result #5

p1nksnow opened this issue Apr 4, 2024 · 1 comment

Comments

@p1nksnow
Copy link

p1nksnow commented Apr 4, 2024

I'm testing the pass rate evaluation, could you offer the reproduction data like Toolbench?
Thanks for your reply

@zhichengg
Copy link
Collaborator

Hi! Thank you for your interest in our work.

We are planning to publish our model inference results soon. However, OpenAI updated their gpt-4-turbo models this month. With the new model as the evaluator, the performance will systematically drop. We used gpt-4-turbo-preview in our experiments but the behaviour of this model also changed a lot. We will soon update the model performance with gpt-4-turbo-2024-04-09. We are also training our own evaluator model with an open-source model to replace these closed-source models.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants