fishtest icon indicating copy to clipboard operation
fishtest copied to clipboard

The effective testing TC should not depend on master

Open Alayan-stk-2 opened this issue 6 years ago • 1 comments

Why does the testing TC matter ?

There is a quality-speed tradeoff. A shorter TC consumes less fishtest resources to complete a test. A longer TC puts less emphasis on raw nps and has an optimum closer to the one for e.g. long analysis and tournaments.

Frequent small changes in the effective TC (which can be viewed as the total amount of computational power available to the engine and for which we try to optimize) depending on master are unwanted.

Right now, the effective TC is calibrated on master : base_nps = verify_signature(base_engine, run['args']['base_signature'], remote, result, games_concurrency * threads)

  1. If some speedup (simplification, optimization) is merged in SF, the effective TC at fishtest gets shorter. This reduces the time tests take, but makes it harder for any patch with a slowdown to pass.

  2. If some elo-gainer that increase operations and reduces nps is merged, the effective TC at fishtest gets longer. We are then trying to optimize for a better optimum, but tests take longer to complete and fishtest may get clogged.

  3. It would allow a cleaner comparison of regression tests with different base versions if the effective TC does not change each time.

  4. There is no clear reason why over time speedups and slowdowns would precisely compensate each other. I can't properly bench SF versions on my CPU due to turbo boost behavior, but I'd expect 10+% variations in nps between main releases.

Using some constant benchmark version to evaluate a CPU's speed would remove this unwanted variation.

Depending on available resources, we may choose to increase (or reduce) the TC we use, but this should stem from a conscious decision rather than moving on its own.

Alayan-stk-2 avatar Sep 08 '19 09:09 Alayan-stk-2

to change this, one would need to have a reference that is more fixed than master, e.g. latest release. I would guess this adds quite some complexity to add this, I'm not sure this is worth it.

The main purpose of this was to equalize the different workers, IMO, i.e. a 2Ghz vs a 4Ghz machine. I think this is sufficiently covered.

vondele avatar Sep 24 '19 06:09 vondele