Elias Mizan comments

Results 17 comments of


                                            Elias Mizan

New directory structure proposal.

A comment from the training group: In inference there are separate dirs for open and closed, should we follow that for training too?

Compliance checker should check if the log shows correct line numbers

For that we need to have a deterministic 1-1 relation between results dirs and implementation dirs, this is not true right now, see https://github.com/mlcommons/logging/issues/129

Pretrained model checkpoint validation

On this topic since we discussed in the meeting: If we implement it, the plan is to check whether we start from the same checkpoint. For SSD the only valid...

Use original RCP for RCP test if batch size and ALL hparams of submission match the RCP, otherwise used pruned RCPs.

Shang, can you take a look? I am swamped with all other tasks, and this is non-trivial. However, we have not committed to implementing for this round, so it is...

Add hparams and compliance checks for training and eval samples for all benchmarks

Good point. Marek, since it is all the benchmarks, if you need any help let me know. So, restating the problem: In the 1.0 submission training and eval samples were...

Add missing RCPs for RNN-T

You mean you are failing the RCP test?

Seed checker gets confused when handling multiple implementations

The results dirs should have separate directories for these implementations. For example here NVDA has 2 implementations for SSD: https://github.com/mlcommons/training_results_v0.7/tree/master/NVIDIA/benchmarks/ssd/implementations and there are 9 sets of results both mxnet and...

Seed checker gets confused when handling multiple implementations

Good question. In the policies doc it says: "System names and implementation names may be arbitrary." https://github.com/mlcommons/policies/blob/master/submission_rules.adoc#561-training Given that it is indeed hard to fix. I guess we need to...

Seed checker gets confused when handling multiple implementations

This can be fixed, but only if we ensure a deterministic behavior between results directory and source code directory. For example: Result dir: https://github.com/mlcommons/training_results_v1.0/tree/master/Google/results/tpu-v4-128-TF/resnet matches to implementation dir: https://github.com/mlcommons/training_results_v1.0/tree/master/Google/benchmarks/resnet/implementations/resnet-preview-TF-tpu-v4-128 This...

Seed checker gets confused when handling multiple implementations

This depends on issue #156