Adam Louly
Adam Louly
**Description**: utils for federated learning. **Motivation and Context** - This PR includes utils that will be used on federated learning scenarios. - Exposing python bindings to some utils, and added...
Adding Bfloat16 to scale op
### Description Infer cast function in symbolic shape infer cannot handle inf values and returns this error: ```python File "/usr/local/lib/python3.10/dist-packages/onnxruntime/tools/symbolic_shape_infer.py", line 607, in int_or_float return int(value) OverflowError: cannot convert float...
I'm using flash attention in my code. and I installed FA3 and I noticed to perf gain when I run the model again. My CUDA is 12.4 and I'm using...
I've had this issue and also seen many people questioning how to use FA3. usually people try to `import as flash_attn.hopper ` or as `import flashattn_hopper ` but it should...
**Summary** (only 94 lines of code) Adds an opt-in, per-request profiling path: clients can send `"profile": true` and mini-sglang will start a `torch.profiler` session for that request, then export a...