-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Multiple models? Remove model compare? #120
Comments
Wow!! It's amazing we solved 5, and in general these experiments are really cool.
|
Not sure i understand your suggestion about increasing the interval between comparison. With the current implementation, the trained model is allowed to go down for multiple iterations and comparison: the best model will continue to generate data and the trained model will be fed more more and data until it outplay the best one with at least 55% winning rate. About training multiple models, for generating training data using self play, we always expoit as much as possible, namely we are using only the best model to generate data. |
I trained Hex7 models, using a new feature where I always maintain multi models. Interestingly, although the policy accuracy was somewhat equal along all models, a significant value accuracy difference was achieve between model 5 and the reset of the models after about 20 iterations. It seems as continuing training a bad model is not the great idea, i think we should try training new models from scratch during the run. |
Wow, really cool!!
|
Im not sure why the other models are not performing ad well. 'Best model' clarification: With this whole multi model debate, i think we should understand if the hard part is creating good training data, or is it training the models. |
Ahh okay I didn't take the >0.55 into account. So now if I understand correctly model 4 became best at around iteration 39 (row number in the Excel) but then model 5 took it back in the next iteration and otherwise 5 was always best. Good question regarding the boost... I assume some boost will be necessary otherwise it probably won't be good enough before the next competition (and it will be wiped again). An expensive but powerful boost could be - train |
When training a Hex5 model something interesting happened:
I run two training runs, first with [model_compare][games_num]=30, in which the model learned the game slowly, increasing its policy accuracy from 0.11 to 0.26, switching models each few iterations until iteration 11. From iteration 11 up to iteration 50, the trained model wasn't able to beat the best model (>55%), and the same model continued to produce the training data. The trained model increased its accuraty up to 0.39 but that may be just overfit to the best model which didnt change. I abandoned the run as it seems to me it stopped learning.
I run a second training run, and changed [model_compare][games_num]=0, meaning no model comparison is performed and the trained model is always considered to be the best model. The run was very good, and the model achieved policy accuracy of 0.63 after 75 iterations and it plays against me very good. Its interesting as itself that without comparison the learning was successful, but more interesting was that already after 3 iterations the accuracy was 0.48. The reason is, the new run was using the training data of the previous run to learn, and by using already decent quality training data it was able to learn very quickly.
This rise few questions:
Does the model comparison is necessary or it blocks us from advancing in some cases? Maybe we should increase the number of games played during comparison to overcome this.
Does training new models from scratch in the middle of the run has its benefits? Assuming the real effort is to produce good quality training data, and the training itself is not the heavy compute stage, maybe we should consider multiple models.
logs attached
1122_1459.txt
1122_1740.txt
221122_150030.csv
221122_174034.csv
The text was updated successfully, but these errors were encountered: