cailun01

Results 5 issues of cailun01

创建vcuda这个pod之后,输入`nvidia-smi`报了找不到`libnvidia-ml.so`的错误: ``` NVIDIA-SMI couldn't find libnvidia-ml.so library in your system. Please make sure that the NVIDIA Display Driver is properly installed and present in your system. Please also try adding...

您好!我看到在Readme中提到”输出还没有全用shared memory“,可是我看到在源码里面,模型进程推理出结果后,通过一块共享内存把输出传递给Backend进程(结果处理线程),再通过另一块共享内存把输出传回数据(业务)进程。 请问对输出使用共享内存还有优化的空间吗?

Hello! Followed by Installation Guide, I installed gpushare-scheduler-extender successfully. When I input `kubectl inspect gpushare` I got this: ``` NAME IPADDRESS GPU0(Allocated/Total) PENDING(Allocated) GPU Memory(GiB) abcd9 146.12.9.23 0/10 3 3/10...

### 🚀 The feature, motivation and pitch Is there any plan to add [FSDP2](https://github.com/pytorch/torchtitan/blob/main/docs/fsdp.md) for training? ### Alternatives _No response_ ### Additional context _No response_

Hi, ThunderKittens team! Do you have any plan to support Grace Blackwell? I compiled b200.cu failed on GB200. ``` ../../../include/common/base_types.cuh(258): error: invalid narrowing conversion from "char" to "signed char" static...