Hao Liu

Results 10 comments of Hao Liu

Are there any updates on this? It is frustrating to find out that PyTorch data loader cannot work with Jax on TPU despite it is used in Jax's official examples.

@levskaya there is a code snippet for the failure in [#9767](https://github.com/google/jax/issues/9767).

I met the same issue on several environments, does anyone know a solution?

We are definitely interested in replicating 30B model but there are no concrete plans yet since currently we are focused on completing 7B model training.

The code is tested on ubuntu, we are not sure about how well Jax would work on mac.

Hi, thanks for your interest! We don't have a concrete timeline to add a Mistral version yet, as we are currently occupied with other priorities. However, we will keep this...

Hi, this codebase is licensed under Apache 2.0. The models, since are derivatives of Llama 2, are under the license of Llama 2. Will make this more clear in documentation.

Sorry for the delay. vmem OOM error is likely due to using large chunk size. I have tried v4-64 with chunk size 512, it worked well with fast computation and...

6 comes from storing key-value from previous host, key-value for current computation, and current query and output, so in total 2x2+1+1=6.

Hi, ringattention inference has been supported in [LWM](https://github.com/LargeWorldModel/LWM/blob/3778ac1eb0b1b4cc38fc83864879ee69c3087c07/lwm/llama.py#L571-L616)