chenyaofo
chenyaofo
I maybe found the reason for the large tarfile size. I found a similar question in [Stackoverflow](https://stackoverflow.com/questions/69585800/what-is-the-fundamental-difference-between-tar-unix-and-tarfile-python). In short, python (>=3.8) built-in library uses `tarfile.PAX_FORMAT` as default to store tarfile....
我这也是,用`CloudflareST`测速的时候好好的,然后把优选ip写到`/etc/hosts`后,访问cloudflare的worker,直接就SSL连接都建立不了,同一时间挂代理又很可以,感觉cloudflare的CDN在我这网络环境下已经废了。 
Hi, @stiepan, thanks for your patient reply. > By splitting the webdatasets, do you mean that each worker process should have its subset of archives to read? Then, when each...
@stiepan Thanks you. I get the idea. The current parallel external source API is more suitable for map-style datasets. So the solution for iterable-style datasets looks a little bit complex...
@jqtmviyu 非常感谢您的教程 我有一点疑问,按照我的理解,这个`moon.json`只是中间产物,最终是为了生成`planet`文件的,那么在您教程中的第5步中”在容器内生成moon文件“: ``` mkdir moons.d cp *.moon moons.d/ ``` 这两行命令是不是没有必要的?
> I was able to repro this using your Dockerfile, but I do see this error in the logs: > > ``` > #8 439.8 g++ -pthread -B /opt/conda/envs/dev/compiler_compat -shared...
> Also we do not have compatibility with triton 2.0.0 yet, so you may want to try building with `pip install triton==1.0.0` to see if that resolves any issues as...
> This should be resolved if you can try with the latest master branch Following your suggestions, I try to build deepseed from lastest master branch, the dockerfile is at...
> @chenyaofo - I tried making my own dockerfile to test this, and I'm able to get the below working. I'm not familiar with the needs of your system, but...
+1. I am also confusing how to train a model with both tensor parallelism and pipeline parallelism in deepspeed.