donghaku issues

Repositories
Issues
Comments

Results 2 issues of


                                            donghaku

[FEATURES]How cache workers are scheduled to computing nodes

When the computing node does not have a cache worker，Scheduling cache workers to computing nodes can speed up training。Do you have relevant information?

features

how to find max BLOCK_SIZE?

in 05-layer-norm.py Less than 64KB per feature: enqueue fused kernel MAX_FUSED_SIZE = 65536 // x.element_size() BLOCK_SIZE = min(MAX_FUSED_SIZE, triton.next_power_of_2(N)) 1、what is 65536 ？ i hava some Hardware indicators guess，But it's...