Hao Liu comments

Results 10 comments of


                                            Hao Liu

PyTorch Dataloading doesn't work with >0 workers

Are there any updates on this? It is frustrating to find out that PyTorch data loader cannot work with Jax on TPU despite it is used in Jax's official examples.

PyTorch Dataloading doesn't work with >0 workers

@levskaya there is a code snippet for the failure in [#9767](https://github.com/google/jax/issues/9767).

Run Ant-v1, got WARNING: Nan, Inf or huge value in QACC at DOF 0. The simulation is unstable. Time = 0.0000.

I met the same issue on several environments, does anyone know a solution?

Any plans to train for 30b model

We are definitely interested in replicating 30B model but there are no concrete plans yet since currently we are focused on completing 7B model training.

Running into issues for mac M1

The code is tested on ubuntu, we are not sure about how well Jax would work on mac.

Mistral

Hi, thanks for your interest! We don't have a concrete timeline to add a Mistral version yet, as we are currently occupied with other priorities. However, we will keep this...

License

Hi, this codebase is licensed under Apache 2.0. The models, since are derivatives of Llama 2, are under the license of Llama 2. Will make this more clear in documentation.

vmem OOM on TPU

Sorry for the delay. vmem OOM error is likely due to using large chunk size. I have tried v4-64 with chunk size 512, it worked well with fast computation and...

Questions about the paper

6 comes from storing key-value from previous host, key-value for current computation, and current query and output, so in total 2x2+1+1=6.

Llama 3 ring attention implementation for inference

Hi, ringattention inference has been supported in [LWM](https://github.com/LargeWorldModel/LWM/blob/3778ac1eb0b1b4cc38fc83864879ee69c3087c07/lwm/llama.py#L571-L616)