Jason Krone
Jason Krone
…or arg ## Description of changes: Modify StreamingDataset to support passing process_group as a constructor argument. Currently, StreamingDataset assumes it should use the default process group; however, for certain use...
A few folks, including me, have been [trying and failing](https://github.com/EleutherAI/lm-evaluation-harness/issues/1292) to reproduce the llama Trivia QA scores using the [Eluther Evaluation Harness](https://github.com/EleutherAI/lm-evaluation-harness). Specifically, for llama 3 8B I'm getting an...
**Summary** I'm hitting a NaN loss issue when I use the TransformerLayer in place of a pytorch transformer layer I wrote. **Details** I'm using the nvcr.io/nvidia/pytorch:24.04-py3 docker container. I train...
Hi there, First, really admire the work on OpenELM! Thank you for making your models and code available. Question regarding the [pre-training checkpoints linked here](https://github.com/apple/corenet/blob/main/projects/openelm/README-pretraining.md#pretraining-checkpoints-model-weights-and-logs): how can we convert these...
Added instructions for how to use elastic fiber adapter (EFA) with a capacity reservation on AWS to faq.rst. Tested (run the relevant ones): - [ ] Code formatting: `bash format.sh`...
## Environment Enroot image built off the nvcr.io/nvidia/pytorch:24.11-py3 docker image. - OS: Deep Learning OSS Nvidia Driver AMI GPU PyTorch 2.5.1 (Ubuntu 22.04) 20241208 - Hardware (GPU, or instance type):...
When reduction == "mean", the current implementation sets the z_squared values to 0 vs. ignoring them. These 0 values impact the number of element in the mean reduction and therefore...