Changjiang GOU

Results 6 issues of Changjiang GOU

Hi there, I am reproducing the muNet on 8 A100 GPUs. Compared to running it on Colab TPUv2 8 cores, it takes too long to compile each child model. XLA...

### 🐛 Describe the bug I encountered this problem when running the examples/language/gpt/titans/train_gpt.py using real data provided by the example. This probelm only occurs when we set the argument 'num_workers'...

bug

It's more a question than an issue. The tensor [w2](https://github.com/stanford-futuredata/megablocks/blob/main/megablocks/layers/mlp.py#L341C9-L341C50) of class SparseMLP has the same shape as the w1, is it because of the DSD operation? like, it requires...

question

Hi dear torchrec developers. I found a fatal bug when using EmbeddingCollection. The full stack is ``` [rank0]: File "/home/admin/hippo/worker/slave/aop_418921_aop_launcher_job_temp_m_20250528093245_6524584_job.worker_0_57_12/train/test_ebd.py", line 44, in [rank0]: main() [rank0]: File "/home/admin/hippo/worker/slave/aop_418921_aop_launcher_job_temp_m_20250528093245_6524584_job.worker_0_57_12/train/test_ebd.py", line 36,...

I found an interesting phenomenon that could be enhanced when using KeyedJaggedTensor. ``` import torch from torchrec.sparse.jagged_tensor import JaggedTensor, KeyedJaggedTensor values = [ torch.Tensor([1.0]), torch.Tensor(), torch.Tensor([7.0, 8.0]), torch.Tensor([10.0, 11.0, 12.0]),...

### System Info - `transformers` version: 4.57.3 - Platform: Linux-6.6.97+-x86_64-with-glibc2.35 - Python version: 3.11.11 - Huggingface_hub version: 0.36.0 - Safetensors version: 0.7.0 - Accelerate version: 1.12.0 - Accelerate config: not...

bug