Improve work packets stat counter performance
At the end of Work::do_work, the stat counter code will get the type of current work packet, map it to a local buffer, and push the current work packet execution time to the buffer.
The work-packet-type -> local-buffer mapping is using a hashmap right now (https://github.com/mmtk/mmtk-core/blob/master/src/scheduler/stat.rs#L141). But considering the large amount of work-packet instances (probably millions), doing hashmap lookups here is expensive.
Solution 1 Direct pointer load to get the buffer address using type-id (as offset/index).
Solution 2
Use a single local buffer, and push (type-id, time) pairs to the buffer. The classification and counting happens later in harness_end.
@caizixian @wenyuzhao Can you add some information for the issue, like it was resolved?
@wenyuzhao can you confirm the current implementation is what you had in mind? https://github.com/mmtk/mmtk-core/blob/a058b8c4f3dbfa625823847bb5df8e5c87f44e34/src/scheduler/stat.rs#L46
I'm reopening this until you confirm.
@wenyuzhao can you confirm the current implementation is what you had in mind?
https://github.com/mmtk/mmtk-core/blob/a058b8c4f3dbfa625823847bb5df8e5c87f44e34/src/scheduler/stat.rs#L46
I'm reopening this until you confirm.
The WorkerLocalStat solved most of the issue, but not completely. The main concern of this issue is that the following hashmap lookup is done once per work-packet, which might be expensive.
https://github.com/mmtk/mmtk-core/blob/a058b8c4f3dbfa625823847bb5df8e5c87f44e34/src/scheduler/stat.rs#L190
https://github.com/mmtk/mmtk-core/blob/a058b8c4f3dbfa625823847bb5df8e5c87f44e34/src/scheduler/stat.rs#L229-L233