Benchmarking Question List #1
Hello everyone. I have been using MLperf benchmarks for some time. And I have a small list of questions about them. I am asking them here because I have not found answers in other sources of information.
-
I have several video cards in my system. Can I explicitly set the number of video cards for the test?
-
This question follows from the question above. Do all tests use all available GPUs?
-
Many tests have different profiles like "edge" "datacenter" what is the difference between them?
-
Since the space on my SSD is limited, how can I tell the benchmarks to use a different directory to store the cache?
-
The tests (in the profiles that I used) do not always use 100% of the video memory. Are there any scenarios for which all the video memory will be used or is this not necessary?
-
Perhaps there are more subtle benchmark settings, is there any user guide.
Hi @Agalakdak Some of your questions are "benchmark implementation" dependent and we currently have Nvidia, Intel, and Reference implementations for most/all of the benchmarks and other vendor implementations are available for some of the benchmarks.
- "no" for most of the reference implementations except some like for llama2. "yes" for Nvidia implementation though it uses all the GPUs by default.
- For Nvidia implementation - "yes". For reference implementation, it uses 1 GPU by default and in some benchmark implementations it supports multiple GPUs.
- Those are 2 different submission categories. The required scenarios to be run differs for them and "Offline" scenario is the only common one for both.
-
export CM_REPOS=<NEW_PATH>can be used to do this or we can create softlink for any folder inside $HOME/CM/repos/local/cache path. - Many small inference models do not need large amount of GPU memory. Parameter size given here is usually a good guide for the required GPU memory.
- Unfortunately not much currently - as most implementations by default only support the systems on which the MLPerf results were submitted. We are trying to extend this - but it is a WIP and the implementations and the results changes every 6 months.
@arjunsuresh , Thank you! And the last 2 questions for now.
- How can benchmarking results be interpreted? Is this some abstract metric or can the data be interpreted as "Model A can process x requests per second".
- How can I donate $10
- For offline scenario - samples per second is the usual metric. Requests per second or queries per second may not be correct as a single request or query can contain multiple samples. But for LLMs it is often tokens per second.
- I don't think MLCommons is taking donations but I might be wrong. You can contact the right people here