-
Notifications
You must be signed in to change notification settings - Fork 298
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Attempt to do better at estimating playouts remaining, while requiring less time to do so. #582
base: next
Are you sure you want to change the base?
Conversation
Currently +100 Elo after 25 games with network 245 at 1+1. (Large error bars.) |
src/UCTSearch.cpp
Outdated
// Until we reach 1 second or 100 playouts playout_rate | ||
// is not reliable, so just return max. | ||
} else if (elapsed_millis < 10 || playouts < 10) { | ||
// Until we reach 10 millisecond and 10 playouts playout_rate |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you changed the comment from "or" to "and" but the logic remains ||, not &&
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a or b == !(!a and !b)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the existing comment was wrong.
Final results after 150 games not very convincing. Only +15 Elo, with error bars which include no improvement. |
I suspect that this approach has improved the estimate too well, and that it might need a reduction multiplier like jjoshua2 has in his tuning PRs to compensate for the unlikelyhood that every visit goes to one of the trailing ones. |
@zz tried clop tuning a combination of my patch and this, and it came out with 100 for slow mover 1.4 for time multipler and 1.0 for pruning factor. Which I was very happy with because it makes a lot of sense. |
Sounds good. I think this PR will probably be good to go if I add a multiplier to the minimum playouts equal to the number of threads. (I assume no one will ever set the threads at ridiculously large levels compared to their actual hardware capacity.) |
Implementing the idea in #581. A key aspect here is that we shift the start time later, but we don't decrement the playouts by how many there are at that start time. This means we generally shift from an underestimate that converges upwards, to an overestimate which mostly converges downwards and so inaccuracy due to early calculation is less likely to cause us to prune early.
I don't actually know if this provides an Elo win yet, but it does seem to provide an improvement in ability to estimate playouts at short time scales. I'm running a self-play tournament on 1+1 to start.
I think the logic may not currently be very sound with large thread count of slow evals though, if each of your threads can only do 50nps, but you have 40+ of them (aka TCEC), 10 playouts/10ms is basically not going to limit anything. I need to think more about how to scale those constants with thread count.