LouisDDN

Results 18 comments of LouisDDN

I don't think it makes sense as the goal is to reproduce a workload not to get maximum throughput out of the storage.

Hi, any update on this? AFAIK this is blocking for cosmo & resnet50 for MLPerfStorage v1.0.

As this PR appears to be updating the rules for v2.0, there was a recent discussion in the checkpointing subgroup about model sizes. The table below shows the memory requirements...

> > As this PR appears to be updating the rules for v2.0, there was a recent discussion in the checkpointing subgroup about model sizes. The table below shows the...

> This should be addressed already with the new PR [#278](https://github.com/argonne-lcf/dlio_benchmark/pull/278) . Could you please check. [@LouisDDN](https://github.com/LouisDDN) I tried this PR. The read performance is back to normal (not 50 ...

The issue actually persist for read if I use LLAMA8B Zero3 and 8 mpi processes. My node has 2TB of RAM. The 10 steps of write are just 1TB.

@zhenghh04 I am working on a new patch as an alternative to hash consing, based on Friday’s discussion. It will increase startup time but fully preserve the original I/O pattern...

> > @zhenghh04 I am working on a new patch as an alternative to hash consing, based on Friday’s discussion. It will increase startup time but fully preserve the original...

@zhenghh04, I just pushed the replacement for hash consing for the write operation. It is in this PR. I closed the two other PRs for simplicity. The algorithm is the...