TCC_HIT_sum for MI210 is only half of MI100 when L2CacheHit is ~100%
Hi,
I found that TCC_READ_sum is only half about the real size which from MI100 result. The same as TCC_HIT_sum. They both have 64 bytes cacheline size so the number doesn't make sense. Could someone help to check with it?
To be clear, there is no TCC_READ_sum for MI100. The situation is for L2CacheHit ~100%. TCC_HIT_sum for MI210 is half of MI100.
I observed the same, on MI100, (TCC_HIT_sum + TCC_MISS_sum) * 32 matched the expected L2 cache data volume. On MI210, this expressions results in exactly half of what is expected.
There are more counters on gfx90a. For example, (TCP_TCC_READ_REQ_sum)*32 corresponds to expected load data volume between L1 and L2 cache.
@lingjiew93 Apologies for the lack of response. Can you please check if your issue still exists with the latest ROCm 6.2? If resolved, please close the ticket. Thanks!
Hi @lingjiew93, thanks for reaching out. You can use rocprof --list-derived to see a list of performance counters and how they are calculated or visit the official documentation. L2CacheHit is the "Percentage of fetch, write, atomic, and other instructions that hit the data in L2 cache (value range: 0% (no hit) to 100% (optimal))". It does not relate to how much L2 cache is being used. So 100% hit rate does not translate to full utilization of the L2 cache volume. Hope this clarifies your question.