Juncheng Gu

Results 11 comments of Juncheng Gu

Infiniswap fails at the stage of creating RDMA QP when using SoftRoCE. Maybe the kernel-level ib interfaces haven't adapted to SoftRoCE.

@Rivendile Thanks for trying out Tiresias. The message should mean that the cluster can not satisfy the resource requirement (especially, GPU) of the top job in the queue. Would you...

@CryptoSalamander, I would prefer to use the second option. I faced the same issue when using torch 2.0, and the `item()` method in the first option will lead torch.dynamo to...

> Thanks [@juncgu](https://github.com/juncgu), I think this should now be fixed by [#18631](https://github.com/vllm-project/vllm/pull/18631). Hi @njhill, IMHO, #18631 is not relevant to the case here (when prompt < block_size and no remote...

@richardliaw, @robertgshaw2-redhat

@yaochengji, pushed accuracy and edge case tests. But they can only be tested offline.

Thanks for the reviews, @njhill. > I'm unsure about the design of some of the abstractions. In particular: > > * Are there cases where the intermediate device would be...

appreciate the insightful feedbacks, @njhill. > Thanks @juncgu! > > > I agree. Having async and layer-wise h2d/d2h copies without blocking the next forward pass is very important. It will...

> Thanks @juncgu, have added a couple more responses. Thanks, @njhill. As you suggested, I added `do_save_to_host` and `do_load_to_device` attributes in `ReqMeta/NixlConnectorMetadata` to specify the h2d/d2h copy operations when host...