Guoteng
Guoteng
## Description DI-engine integrates torch.distributed.rpc module. 1. CPU-P2P-RDMA: In IB network environment, support RDMA CPU-P2P transmission 2. GPU-P2P-RDMA: supports GPU p2p communication ## Related Issue ## TODO 1. Dynamic communication...
## Motivation We have `checkpoint_fraction` but have no interface in config file, this PR will support this. ## Modification 1. add `checkpoint_fraction` option into model config. 2. add `checkpoint_fraction` sanity...
Commit: 1. Add torchrpc message queue. 2. Implement buffer based on CUDA-shared-tensor to optimize the data path of torchrpc. 3. Add 'bypass_eventloop' arg in Task() and Parallel(). 4. Add thread...
Hi guys, I'm recently trying to use ` torch.profile` for profiling of a large NLP model. However, I have encountered some problems and would like to get some advice: 1....
Hello, my code is running in the k8s environment. I started pytorch in two pods and tried to use torchrpc , but I encountered an error in the torch.distributed.rpc.init_rpc function....
If the deviceList contains multiple ibv devices, we want to select the device of the port whose port_state is active, instead of just selecting the first device in the deviceList...
Very cool work, we really hope to use Glake in our LLM training. However, I failed when trying to compile glake on pytorch release 2.1. My system information and error...
# InternLM Simulator ## 1. Introduction The solver mainly consists of two components: 1. `profiling`: Collects the time consumption of each stage during the model training process in advance and...
Hello, recently I read a [blog](https://mp.weixin.qq.com/s/J-EP6ZOeLS_lFZFD3oTNtA) about colossial supporting lora finetune deepseek-v3, it is a very great work for opensource community. But I have a question about the picture in...