DeepLearningExamples issues

[Model/Framework] What is the problem?

Related to **Model/Framework(s)** *(e.g. GNMT/PyTorch or FasterTransformer/All)* **Describe the bug** A clear and concise description of what the bug is. **To Reproduce** Steps to reproduce the behavior: 1. Install '...'...

lawchingman

[SE3-Transformer] ScanObjectNN trainning

Hi @milesial. Could you share ScanObjectNN trainning demo code or instruction in order to research for pointcloud community? I don't understand point cloud process using graph, such as the model...

xins981

enhancement

[SpeechSynthesis/Tactron2] Training crashes with AttributeError: module 'torch._C' has no attribute '_jit_set_autocast_mode'

1

Related to **Model/Framework(s)** https://github.com/NVIDIA/DeepLearningExamples/tree/master/PyTorch/SpeechSynthesis/Tacotron2 **Describe the bug** The model cannot train because it hits the following error once in the docker image: ```bash root@eff550f3bccb:/workspace/tacotron2# bash scripts/train_waveglow.sh Traceback (most recent call...

mikesol

bug

Enable AutoAugment and modernize DALI pipeline for ConvNets

Update DALI implementation to use modern "fn" API instead of old class approach. Add a codepath using AutoAugment in DALI training pipeline. It can be easily extended to use other...

klecki

No LICENSE file for the repo

I do not see LICENSE file for DeepLearningExamples repo. What license this repo is released if user want to attribute to it.

jitendra42

bug

[SE(3)-Transformer] Output embeddings collapse

Related to **Model/Framework(s)** * *SE(3)-Transformer* **Describe the bug** I train SE(3)-Transformer to predict type-0 node embeddings. I notice that with multiple training hyper-parameter setups I get the same following error:...

anton-bushuiev

bug

Unable to understand throughput calculation

3

Within `run_squad.py` we have two cases based on args.max_steps https://github.com/NVIDIA/DeepLearningExamples/blob/master/PyTorch/LanguageModeling/BERT/run_squad.py#L1198 I just want to understand in else case what are we trying to do, because I have tried 3 experiments...

Druva24

[Transformer/Translation] OSError: [Errno 28] No space left on device

2

Related to **Model/Framework(s)** PyTorch/Translation/Transformer **Describe the bug** I followed by the instructions in README.md but found this issue. Tue Aug 8 23:35:31 2023 +---------------------------------------------------------------------------------------+ | NVIDIA-SMI 530.30.02 Driver Version: 530.30.02...

hulihan-start

bug

8xH100 server training time higher than 8xA100 server.

Related to **Model/Framework(s)** Tensorflow/Pytorch **Describe the bug** While running [Yolox](https://github.com/Megvii-BaseDetection/YOLOX) on servers described, H100 total training time is higher compared to A100 server. I also ran test script on servers...

PurvangL

bug

Quantized (QAT) EfficientNet Classification Model TensorRT engine

**Quantized (QAT) EfficientNet Classification Model TensorRT engine** I saw you have a QAT performed version of Efficientnet and a container to use it for workaround. I suggest to compile this...

panahikhas

enhancement

DeepLearningExamples
DeepLearningExamples copied to clipboard

Metadata

[Model/Framework] What is the problem?

[SE3-Transformer] ScanObjectNN trainning

[SpeechSynthesis/Tactron2] Training crashes with AttributeError: module 'torch._C' has no attribute '_jit_set_autocast_mode'

Enable AutoAugment and modernize DALI pipeline for ConvNets

No LICENSE file for the repo

[SE(3)-Transformer] Output embeddings collapse

Unable to understand throughput calculation

[Transformer/Translation] OSError: [Errno 28] No space left on device

8xH100 server training time higher than 8xA100 server.

Quantized (QAT) EfficientNet Classification Model TensorRT engine

← Metadata

Owner

Metadata

DeepLearningExamples DeepLearningExamples copied to clipboard

Metadata

← Metadata

Owner

Metadata

DeepLearningExamples
DeepLearningExamples copied to clipboard