Tilps issues

Results 9 issues of


                                            Tilps

Update example of aggregating gradients by self

Using the optimizer's own gradient aggregator seems a better choice here - the default one handles corner cases the previous documentation does not, and if the user wants to conditionally...

stat:awaiting response from contributor

cla: yes

Have a separate test and training window folder for training.

Currently every training run re-selects the test and training at random from a combined window. This makes it very difficult to track persistent divergence as any prior divergence gets nullified...

Treat UCT search configuration parameters as hyperparameters

Investigation in to possible reasons for the cascading failure of value head quality after v0.8 was released, suggested PUCT and fpu-reduction changes as likely causes. If we believe that these...

Training should perform a test run before step 1

Currently its quite difficult to work out how much the graph moves because the data has changed, vs how much the current lr has moved it in the first 2k...

Attempt to do better at estimating playouts remaining, while requiring less time to do so.

Implementing the idea in #581. A key aspect here is that we shift the start time later, but we don't decrement the playouts by how many there are at that...

Hack to allow testing first order value head quality.

This is not ever for submission, but I wanted it somewhere I could get opinions.

Experiment with alternate fpu definition

Looking at the new ELF system for go, it seems they use a very subtly different fpu definition. Its kind of like get_raw_eval(), but rather than being the weighted average...

est_playouts_left should calculate rate average using start time after first playout.

The very first playout in a move is far more expensive then any other (at least it is on my gpu). This means it takes a very long time for...

Enforce a maximum visit limit to avoid numeric overflow

It was possible under go infinite to overflow visit counter if terminal nodes get a lot of exploration. As reported in #305