Patrick Toulme
Patrick Toulme
> @apoorvtintin I see this PR is quite stale for sometime. If no objection, I'd like to have @Ruixuan who is working on Trn from our end to port your...
> > Increases memory efficiency > > Do you have measurements on how DATA improves memory efficiency? Thanks. > > Increases memory efficiency > > Do you have measurements on...
> > > > Increases memory efficiency > > > > > > > > > Do you have measurements on how DATA improves memory efficiency? Thanks. > > >...
There is a misconception here. The RMSNorm will lower into the HLO as decomposed Jax ops, but the XLA GPU compiler will fuse the ops together and potentially with more...
Im glad to see TPU is using the setup passes I contributed!
The jnp.take causes the indices to be assumed to be in bounds. This assumption will be faster on chip. See the IR here - https://github.com/openxla/xla/issues/20899#issuecomment-2570010611 The jnp.take also seems to...
Why has this not been merged? vLLM has no way right now to easily access the underlying model. That is a rather basic feature.
@ezhulenev can this be merged?
@xla-rotation gentle ping
@fhoushmand can you help get this merged?