Jinghan Yao issues

Results 16 issues of


                                            Jinghan Yao

About the training log?

Would you please post the whole training log of this retina net, which including box loss and class loss? When I trained it from scratch, I found it took too...

Build from source without container errors

### Branch/Tag/Commit main ### Docker Image Version none ### GPU name A100 ### CUDA Driver 525.60.13 ### Reproduced Steps ```shell PATH=$(getconf PATH) module purge module load cuda/11.6 module load gcc/9.4.0...

bug

GPT-MoE supports for expert parallel

Hi, I am wondering if [https://github.com/NVIDIA/FasterTransformer/blob/main/docs/gpt_guide.md#gpt-with-moe](GPT-MoE) example provided has the support for expert parallel. The provided examples are using `nlp_gpt3_text-generation_0.35B_MoE-64`, but there are only tensor parallel and pipeline parallel options....

Is there any examples of using offload feature in GPT/BLOOM/OPT inference?

Hi, currently in the examples, only `linear` describes a naive example of offload, in other projects such as `opt`, `bloom`, `gpt`, there is no option for offload. I am wondering...

[Bug]: undefined reference to `absl::lts_20211102::

### Describe the issue Hi, I have built abseil from source, but when I compile other projects using absl, I got a lot `undefined reference to `absl::lts_20211102:: .....` errors. ```...

Is it possible to modify the network structure and train on it?

Thanks for your great work! I wonder if I could modify some layers or conv params based on your code? ....feel sorry but not familiar with matlab or caffe.... Also,...

Could this repo reproduce the results in the original paper?

Thanks for your contribution. I'm wondering in their original paper, results on Penn and JHMDB were reported, how about this repo?

Do you have a complete training log?

Could you post the whole training log of your retina-net? Including the box loss and class loss, thx a lot!!

stable MPI.COMM_WORLD for scaling out to hundreds of node

Previously, `initialize()` allows creating MPI comm world after `import distributed` ``` from distributed import Client, Nanny, Scheduler from distributed.utils import import_term ... def initialize( ...): if comm is None: from...

Question on how to set --shape when using perf_analyzer

Hi, I am trying to use `perf_analyzer` on the predefined models in fastertransformer, such as gpt, gptj, and etc. I am very confused about how to properly set the `--shape`...