-
Notifications
You must be signed in to change notification settings - Fork 294
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Idea: Pose additional questions to the net in order to make it smarter #11
Comments
It sounds interesting, but there is a high cost. I will use Suggestion 1 as an example. You talked about only doing it for random positions, but in the introduction you mentioned the main point is to combine everything into a single NN with multiple outputs. I think that means you need to calculate what the net should output for those cases for all positions, not just some. So you need self-play games to do a full 800 node search for both sides for every position. And you need to double the number of outputs. Combining value and policy heads is essentially free because we were already computing them. So the question is will it be worth it to generate games at half the rate, and have your NN evals be slower? I can't even guess. :) |
I agree killerducky, getting policy for both sides should involve twice as much calculation, or half as many games, not to mention the added net structure cost of a bigger input and output. With regard to games, I think the eventual ceiling of the net is the important thing, as things seem to be heading now. With regard to the cost/benefit of having better but slower NN evals, I believe that aiming for better is always good, it is what drives the improvement and makes Leela viable at all. The hope is that adding more information/structure will lead to a quantum leap in strength. (As opposed to now, where we simply make the net bigger until it's not worth it anymore because it becomes too slow.) |
There are two types of suggestions here:
After thinking more about it, the second part seems to be more involved and not readily facilitated by the current leela structure. It is not clear that Leela is thinking much at all in terms of tactical logic or move logic (if I move this piece, what will then be possible in the position), as evidenced by the problem with discovered checks. Instead, maybe it is working more on very finetuned/balanced but shallow pattern recognition, to try and put some simple words on the difference. |
The idea of making more information flow into the net by predicting the next k moves has been explored in go game by Tian et al. (FB, Darkforest bot) in supervised learning mode, achieving better prediction accuracy: |
While I don't have any better suggestions of a better place where to post suggestions like this (dev forum?), it would be nice github issues to be more actionable and task oriented, so that we can mark them "done" sometimes. |
I think good place for writeups like those would be a section in our lczero.org website (even if it's already implemented or not relevant anymore). |
[This goes beyond Leela 0. I am not sure where to post it, please feel free to move it somewhere relevant where these discussions are more suitable.]
Background: One of the things that helps to make Leela and Alpha Zero so smart is the idea to combine two networks into one, here policy and value, only splitting by the heads. The point is that when Leela is trained to answer and thus predict ("know") the policy question, then the net is better informed to answer the value question as well, and vice versa. Or, put in another way, the net will create decision structures that will incorporate more information for either answer. Please note: The net does actually get more information from the training in this way, because it is trained on two outputs rather than just one, so there is a genuine addition of information/knowledge.
General idea: Ask the net a few other fundamental, trainable questions regarding the position. Adjust the structure of the net so that it can give this output and be trained on it. Hopefully, this will lead to an even deeper understanding of any position and stronger policy and value heads.
Suggestion 1: Ask the net to produce policy probabilities and value estimates for BOTH sides to move in a given position. Training could be done on both "colors" by sometimes using one color of the net, sometimes the other, by random, during training matches, or during training afterwards. Alternatively, you could pick positions from games by random and do an 800 node search for both sides and train both policy heads simultaneously on the same position (if there is a point to this).
Motivation: We know from human learning of chess that learning to carefully consider the opponent's options gives a quantum leap in playing strength. Currently, Leela policy can react to threats, but it is probably rather superfluously (queen threatened by lower value piece etc.). By asking Leela the trainable question "what would the opponent do if you passed", we give Leela access to a far better appreciation of the nature of the position. Hypothetical example: To move, Leela policy may miss a tactical threat of the opponent, but it would actually find it itself if it sat on the other side of the board. By asking the question "what will the opponent do", Leela will be provoked to incorporate this threat into her decision process for her policy. In general, Leela will learn to integrate initial knowledge from both policies to determine either final policy.
Suggestion 2: Ask the net to predict the opponent move after each legal move (and again, do it for both sides to move). [So: we don't only ask for a probability of a move, but also the expected response, which can be trained by inspecting what Leela actually chooses as the opponent in the training game. Of course only the response to the one move actually played can be trained at a time, unless you specifically train this with full investigations of random positions from games].
Motivation: We are not trying to "make the net do search internally" here. Rather, apart from being a "forced sanity check, did you actually LOOK at this" type question, this question is also a fundamental question to ask in order to understand the structural logic of a chess position. Bent Larsen, the great Danish Grandmaster, talked about how many moves were linked in pairs. You can think of, say, a6 Ba4 in the Ruy Lopez, or, "if I ever remove this piece, then the opponent is likely to respond with this". The full, informed by training, picture of all this is likely to provide a much richer and sophisticated understanding of the logic of the position.
Suggestion 3: Ask the net to give a value estimate after each opponent move in suggestion 2. Train it, of course, as in 2.
Motivation: To give Leela an even deeper appreciation of the position and its possibilities, and a stronger consistency demand/sanity check. Please note that we do NOT ask Leela to "evaluate this position" here, but rather, we are asking, "from this position, after these two moves, what is your evaluation"? This forces some kind of appreciation of impact of various mini-changes to the position.
Suggestion 4: Ask Leela to produce a 4-ply deep pv. Train it by inspecting if it actually chose the first move, then the next move if yes, until the 4th ply.
Motivation: Asking this question, and training it, will force Leela to actually compose some most likely ordered line from all the insights gathered in the above 2-3 suggestions, and thus develop an even better understanding of the balance of dynamics in the position.
To sum up: The common trait of these 4 suggestions is that we are asking Leela questions that we know from human experience are fundamental to comprehending a chess position. The beauty is that we have a means to train Leela's answers to the questions, and train her to be extremely strong at answering them. And, we can make her integrate the knowledge from all parts into the same net, just like policy and value are already integrated.
Not that it matters much for the merit of these ideas (I don't have much if any experience with Neural Networks and I probably overlook hundreds of things), but a humble word on my background so the ideas here can maybe more easily be taken seriously: I was on Team Rybka, worked with Noomen on the tournament book for Rybka, won one of the Freestyle tournaments solo and was runner up twice, and I had a significant role in the creation of the IDeA tree environment in Aquarium. My Fide Elo is about 2175.
I know these ideas won't be tested any day soon, but hopefully we can have a rich discussion of ideas like this and eventually we can work on such stuff to make Leela not just as strong, but much, much stronger than Alpha Zero and Stockfish! Please give me all your comments and criticism!
The text was updated successfully, but these errors were encountered: