Juncheng Gu
Juncheng Gu
Infiniswap fails at the stage of creating RDMA QP when using SoftRoCE. Maybe the kernel-level ib interfaces haven't adapted to SoftRoCE.
@Rivendile Thanks for trying out Tiresias. The message should mean that the cluster can not satisfy the resource requirement (especially, GPU) of the top job in the queue. Would you...
@CryptoSalamander, I would prefer to use the second option. I faced the same issue when using torch 2.0, and the `item()` method in the first option will lead torch.dynamo to...
@robertgshaw2-redhat, @wwl2755
> Thanks [@juncgu](https://github.com/juncgu), I think this should now be fixed by [#18631](https://github.com/vllm-project/vllm/pull/18631). Hi @njhill, IMHO, #18631 is not relevant to the case here (when prompt < block_size and no remote...
@richardliaw, @robertgshaw2-redhat
@yaochengji, pushed accuracy and edge case tests. But they can only be tested offline.
Thanks for the reviews, @njhill. > I'm unsure about the design of some of the abstractions. In particular: > > * Are there cases where the intermediate device would be...
appreciate the insightful feedbacks, @njhill. > Thanks @juncgu! > > > I agree. Having async and layer-wise h2d/d2h copies without blocking the next forward pass is very important. It will...
> Thanks @juncgu, have added a couple more responses. Thanks, @njhill. As you suggested, I added `do_save_to_host` and `do_load_to_device` attributes in `ReqMeta/NixlConnectorMetadata` to specify the h2d/d2h copy operations when host...