Tilps

Results 21 comments of Tilps

This example is currently specifically for the experimental_aggregate_gradients=False scenario, which is a case not handled in either of those tutorials.

Currently +100 Elo after 25 games with network 245 at 1+1. (Large error bars.)

Final results after 150 games not very convincing. Only +15 Elo, with error bars which include no improvement.

I suspect that this approach has improved the estimate too well, and that it might need a reduction multiplier like jjoshua2 has in his tuning PRs to compensate for the...

Sounds good. I think this PR will probably be good to go if I add a multiplier to the minimum playouts equal to the number of threads. (I assume no...

I would suggest (weakly) that this shouldn't matter if noise is applied correctly - averaged over many training games, you get some statistical averaging that I think would have a...

The proposed logic changes the minimum chance of selection from 0 (vs 1/800 for 1 visit) to 1/835 in training scenarios. (assuming 35 possible moves) - so its not far...

I think you misinterpreted my comment - I was trying to show the connection between the original proposal and comments about it making 0 visits act like 1 visits. (I...

Idea to consider. Tuning for strength might be a valid approach - but rather than tuning for strength at 800 nodes, we tune for strength at large number of nodes....

I tried to gather some data to decide whether 1.5 was a good multiplier or not. Results are a bit strange. Seems to suggest it could be higher, maybe even...