Santosh Bhavani
Santosh Bhavani
Getting train_loss: nan at every step running training for SSD. I tried varying batch_size and learning_rate but still no improvement. InvalidArgumentError (see above for traceback): Nan in summary histogram for:...
Even having simple Python scripts would be helpful
*Concise Description:* I'd like to use JAX for distributed training of LLMs. In addition, the new release of Keras supports JAX as a backend in addition to TF. *Describe the...
# Description Our examples are split between examples/ and docs/examples/. We also have features (e.g. inference) hidden in our examples that would be worth summarizing in a single README. I'd...
Are there any plans to integrate the embedding_modules or custom samplers back into TorchRec?
I saw MLX is adding DPO, ORPO, PPO and GRPO. Any plans to add those to AXLearn as well?
**Describe the bug** A clear and concise description of what the bug is. **Reproduction** 1. What command or script did you run? ```none pip install openmim ``` **Environment** Using Python...
### Expected behaviour Getting the version of opencv should allow a user to pip install that same version in a different environment ### Actual behaviour The version number attribute for...
## **Description** This outlines the current status of gpt-oss features that need to be implemented in Megatron Core, leveraging Transformer Engine. **✅ UPDATE: All core GPT-OSS functionality is now available...