chenyaofo

Results 12 comments of chenyaofo

I maybe found the reason for the large tarfile size. I found a similar question in [Stackoverflow](https://stackoverflow.com/questions/69585800/what-is-the-fundamental-difference-between-tar-unix-and-tarfile-python). In short, python (>=3.8) built-in library uses `tarfile.PAX_FORMAT` as default to store tarfile....

我这也是,用`CloudflareST`测速的时候好好的,然后把优选ip写到`/etc/hosts`后,访问cloudflare的worker,直接就SSL连接都建立不了,同一时间挂代理又很可以,感觉cloudflare的CDN在我这网络环境下已经废了。 ![image](https://user-images.githubusercontent.com/32816823/170066131-8565dbf1-ae09-41c4-98fe-23dc2d591ef9.png)

Hi, @stiepan, thanks for your patient reply. > By splitting the webdatasets, do you mean that each worker process should have its subset of archives to read? Then, when each...

@stiepan Thanks you. I get the idea. The current parallel external source API is more suitable for map-style datasets. So the solution for iterable-style datasets looks a little bit complex...

@jqtmviyu 非常感谢您的教程 我有一点疑问,按照我的理解,这个`moon.json`只是中间产物,最终是为了生成`planet`文件的,那么在您教程中的第5步中”在容器内生成moon文件“: ``` mkdir moons.d cp *.moon moons.d/ ``` 这两行命令是不是没有必要的?

> I was able to repro this using your Dockerfile, but I do see this error in the logs: > > ``` > #8 439.8 g++ -pthread -B /opt/conda/envs/dev/compiler_compat -shared...

> Also we do not have compatibility with triton 2.0.0 yet, so you may want to try building with `pip install triton==1.0.0` to see if that resolves any issues as...

> This should be resolved if you can try with the latest master branch Following your suggestions, I try to build deepseed from lastest master branch, the dockerfile is at...

> @chenyaofo - I tried making my own dockerfile to test this, and I'm able to get the below working. I'm not familiar with the needs of your system, but...

+1. I am also confusing how to train a model with both tensor parallelism and pipeline parallelism in deepspeed.