Can I use a GPU other than a100, h100?

Open fadu-dufresne opened this issue 1 year ago • 1 comments

I want to measure storage performance, but I want to use GPU model(NVIDIA GeForce RTX 2060) I currently have. Is it possible?

First of all, I measured it with the command below, but only the CPU utilization is 100%, and it is confirmed that the installed GPU is not utilized. (The GPU was installed normally with the latest Nvidia driver.)

./benchmark.sh run --hosts 127.0.0.1 --workload unet3d --accelerator-type a100 --num-accelerators 8 --results-dir /storage/log/nvme0n1_unet3d_xfs_files500_proc8_20240329_152122_340072 --param dataset.num_files_train=500 --param dataset.data_folder=/mnt/nvme0n1/unet3d

Mar 29 '24 06:03 fadu-dufresne

Hi,

I want to measure storage performance, but I want to use and measure the GPU model(NVIDIA GeForce RTX 2060) I currently have. Is it possible?

The MLPerf Storage benchmark puts the same load on the storage system that using a GPU would, but it does not actually use any GPUs, it simulates what a GPU would do. The benchmark.sh script:

Calculates the amount of (simulated) data that must be used during the (simulated) training task to ensure that no significant level of data caching is taking place in the host that is running the benchmark based on the amount of DRAM in the machine(s) running the benchmark,
It generates that number of data files (full of random bytes of data),
It reads those data files back from the storage system in the same patterns and at the same intervals that would be done if you were actually training a neural network on the Unet3D workload.

The only supported (simulated) GPUs in the v0.5 release were the V100 and the only supported (simulated) GPUs in the upcoming v1.0 release will be the A100 and the H100.

You can experiment with other “accelerator types” (other NVIDIA GPUs or silicon from other vendors) by changing the “sleep time” in the configuration file(s) and then running the benchmark. The “sleep time” used in the v0.5 release for the Unet3D workload was the time it took a V100 GPU to compute one batch of the Unet3D workload, and similarly for the other workloads and other accelerator types. To run the benchmark with a simulated RTX 2060 accelerator, you would need to run a real Unet3D training task on the RTX 2060 and track how long it took the RTX 2060 to calculate a single batch (the average length across all batches), then you could run the benchmark with that “sleep time”.

Thanks,

Curtis

From: fadu-defresne @.> Sent: Thursday, March 28, 2024 11:39 PM To: mlcommons/storage @.> Cc: Subscribed @.***> Subject: [mlcommons/storage] Can I use a GPU other than a100, h100? (Issue #58)

I want to measure storage performance, but I want to use and measure the GPU model(NVIDIA GeForce RTX 2060) I currently have. Is it possible?

First of all, I measured it with the command below, but only the CPU utilization is 100%, and it is confirmed that the installed GPU is not utilized.

./benchmark.sh run --hosts 127.0.0.1 --workload unet3d --accelerator-type a100 --num-accelerators 8 --results-dir /storage/log/nvme0n1_unet3d_xfs_files500_proc8_20240329_152122_340072 --param dataset.num_files_train=500 --param dataset.data_folder=/mnt/nvme0n1/unet3d

— Reply to this email directly, view it on GitHub https://github.com/mlcommons/storage/issues/58 , or unsubscribe https://github.com/notifications/unsubscribe-auth/AXZDB7KBXUIDLTGO72ISDD3Y2UEBNAVCNFSM6AAAAABFOAITD6VHI2DSMVQWIX3LMV43ASLTON2WKOZSGIYTINZWGI2DKMA . You are receiving this because you are subscribed to this thread. https://github.com/notifications/beacon/AXZDB7JDI44LZPBUBLBAKELY2UEBPA5CNFSM6AAAAABFOAITD6WGG33NNVSW45C7OR4XAZNFJFZXG5LFVJRW63LNMVXHIX3JMTHIIAUX2I.gif Message ID: @.*** @.***> >

Mar 29 '24 14:03 FileSystemGuy