ppetrushkov issues

Repositories
Issues
Comments

Results 3 issues of


                                            ppetrushkov

DNNL vs FBGEMM u8s8s32 with small m

I noticed that DNNL u8s8s32 single core performance is slower than [FBGEMM](https://github.com/pytorch/FBGEMM) when m is small (m

enhancement

performance

[BUG] GPT-NeoX Inference returns nonsense

**Describe the bug** Running inference with Deepspeed using GPT-NeoX 20B model produces garbage output, indicating an implementation bug. **To Reproduce** For example, can be seen when using example script: `deepspeed...

bug

inference

Improve import speed with lazy initialization

Currently importing transformer_engine takes ~10s on my machine and it also starts a background process pool because of all the JIT initialization like [here](https://github.com/NVIDIA/TransformerEngine/blob/main/transformer_engine/pytorch/jit.py#L50-L54) . It would be better if...

performance