Guoteng

Results 9 issues of Guoteng

## Description DI-engine integrates torch.distributed.rpc module. 1. CPU-P2P-RDMA: In IB network environment, support RDMA CPU-P2P transmission 2. GPU-P2P-RDMA: supports GPU p2p communication ## Related Issue ## TODO 1. Dynamic communication...

efficiency optimization
enhancement

## Motivation We have `checkpoint_fraction` but have no interface in config file, this PR will support this. ## Modification 1. add `checkpoint_fraction` option into model config. 2. add `checkpoint_fraction` sanity...

enhancement

Commit: 1. Add torchrpc message queue. 2. Implement buffer based on CUDA-shared-tensor to optimize the data path of torchrpc. 3. Add 'bypass_eventloop' arg in Task() and Parallel(). 4. Add thread...

efficiency optimization

Hi guys, I'm recently trying to use ` torch.profile` for profiling of a large NLP model. However, I have encountered some problems and would like to get some advice: 1....

bug
plugin

Hello, my code is running in the k8s environment. I started pytorch in two pods and tried to use torchrpc , but I encountered an error in the torch.distributed.rpc.init_rpc function....

If the deviceList contains multiple ibv devices, we want to select the device of the port whose port_state is active, instead of just selecting the first device in the deviceList...

cla signed

Very cool work, we really hope to use Glake in our LLM training. However, I failed when trying to compile glake on pytorch release 2.1. My system information and error...

# InternLM Simulator ## 1. Introduction The solver mainly consists of two components: 1. `profiling`: Collects the time consumption of each stage during the model training process in advance and...

Hello, recently I read a [blog](https://mp.weixin.qq.com/s/J-EP6ZOeLS_lFZFD3oTNtA) about colossial supporting lora finetune deepseek-v3, it is a very great work for opensource community. But I have a question about the picture in...