Elias Mizan

Results 17 comments of Elias Mizan

A comment from the training group: In inference there are separate dirs for open and closed, should we follow that for training too?

For that we need to have a deterministic 1-1 relation between results dirs and implementation dirs, this is not true right now, see https://github.com/mlcommons/logging/issues/129

On this topic since we discussed in the meeting: If we implement it, the plan is to check whether we start from the same checkpoint. For SSD the only valid...

Shang, can you take a look? I am swamped with all other tasks, and this is non-trivial. However, we have not committed to implementing for this round, so it is...

Good point. Marek, since it is all the benchmarks, if you need any help let me know. So, restating the problem: In the 1.0 submission training and eval samples were...

You mean you are failing the RCP test?

The results dirs should have separate directories for these implementations. For example here NVDA has 2 implementations for SSD: https://github.com/mlcommons/training_results_v0.7/tree/master/NVIDIA/benchmarks/ssd/implementations and there are 9 sets of results both mxnet and...

Good question. In the policies doc it says: "System names and implementation names may be arbitrary." https://github.com/mlcommons/policies/blob/master/submission_rules.adoc#561-training Given that it is indeed hard to fix. I guess we need to...

This can be fixed, but only if we ensure a deterministic behavior between results directory and source code directory. For example: Result dir: https://github.com/mlcommons/training_results_v1.0/tree/master/Google/results/tpu-v4-128-TF/resnet matches to implementation dir: https://github.com/mlcommons/training_results_v1.0/tree/master/Google/benchmarks/resnet/implementations/resnet-preview-TF-tpu-v4-128 This...