Viraat Chandra

Results 13 comments of Viraat Chandra

> Nvidia code actually requires running [this script](https://github.com/mlcommons/inference_results_v4.0/blob/main/closed/NVIDIA/code/dlrm-v2-99/tensorrt/scripts/gen_frequency_data.py) to generate the frequency data. But this step is not documented anywhere and not sure if this script is the actual expected...

> Are we required to do "make calibrate" for Dlrmv2? We tried that but that was giving error with v4.0 code. Yes, that is required after running gen frequency data...

thanks @arjunsuresh for all the details, I will be now looking at this. let us first focus on v4.0-H100x1. assuming default config for v4.0-H100x1-Offline. can you please follow the following...

> Can you please confirm if this is the same one Nvidia is using? Yes, this is correct. Contents are same as our readme at code/dlrm-v2/tensorrt/README.md. 1. some remaining concerns...

Hi @arjunsuresh it seems we have narrowed down the problem(s): 1. the md5 for `day_23_sparse_concatenated.npy` does not match, this is auto-generated using the downloaded files (day_23_criteo_sparse_multi_hot_unpacked/....npy): - expected md5s ```...

> Further the problematic file is not produced as part of the [Nvidia README](https://github.com/mlcommons/inference_results_v4.0/tree/main/closed/NVIDIA/code/dlrm-v2/tensorrt). Concatenation must be done on the fly by the Nvidia container right? Yes, correct. Perhaps consider...

> was this file generated on the fly or as a separate previous step outside of the container? its generated when `run_harness` is invoked (or reached via `run`) and is...

Thanks @arjunsuresh for the testing, glad to see correct sparse input file being generated. Dense and labels match on our end as well, so we can now eliminate DS mismatch...

@pgmpablo157321 pl. help review:)