Patrick Toulme

Results 5 issues of Patrick Toulme

I have found in some models that have poor SPMD partitioning the below pattern. ``` all-gather.1 = all-gather(x) dot.1 = dot(all-gather.1, y) dynamic-slice.1 = dynamic-slice(all-gather.1) // can be cancelled ```...

This PR adds the availability to configure while loop unroll thresholds. Existing defaults are maintained. This PR also adds the option for the user to specify an HloPassPipeline that will...

I am seeing very strange sharding with pipeline parallel and tensor, data parallel. Below is the HLO exactly before partitioning: ``` while.9466 = (s32[], bf16[4,128,512]{2,1,0}, bf16[4,128,512]{2,1,0}, bf16[4,512,128]{2,1,0}, bf16[4,128]{1,0}, /*index=5*/bf16[4,3,128,32,4]{4,3,2,1,0}, bf16[4,128,32,4]{3,2,1,0},...

Add option to disable conversion of dynamic-slice to slice. Defaulting to false to maintain existing behavior.

Could someone explain or point to a doc that explains how MOE is implemented on Jetstream? Specifically, the all-to-all communications, static vs dynamic, sparse matmuls. I would like to understand...