Lance Wang issues

Results 12 issues of


                                            Lance Wang

Enable llama3.1 405B GPU Training

Here is the minimum config required to support llama3.1 405B GPU Training.

Fix bad synthetic dataloader with per device batch size < 1.

# Description Fix BadSyntheticDataIterator for grain. The local iterator is missing and workload will error out on when using Grain dataset together with pdb < 1. # Tests Manually run...

Moving the checkpointing util funtions to a spearate file to avoid circular dependency

# Description The circular dependency is multihost_dataloading -> maxtext_utils -> checkpointing -> multihost_dataloading. That's why in the previous development, we removed the deps of maxtext_utils from multihost_dataloading to avoid the...

pull ready

Adds a toy llama2 model

# Description The smallest llama model we support is 7b, which is still too slow for local development. This PR creates a toy model with much smaller For functional testing...

stale

Add a new script to build the Jax stable stack based image for xpk and helm

# Description The Jax stable stack image contain Jax, cuda and MaxText, but lack of some gcp deps. This PR intends to add these deps and build an image that...

stale

Update the megablox kernel utility function to support float16

# Description The current function only switches between bf16 and f32. It should also take into consideration of f16. # Tests No functional change. Reply on presubmit tests. # Checklist...

pull ready

Fix the dtype when calling Moe megablox gmm kernel.

# Description When calling the MoE megablox gmm kernle, the preferred_element_type is hardcoded to bfloat16. Replace with config controlled dtype instead of hardcoding. # Tests No functional change, reply on...

Explicitly pass in weight dtype and activation dtype for serving offline script.

…ine run. # Description For the offline serving engine, the weight dtype is hardcoded to bf16 which blocks the other dtype experiments. This PR pass in the weight dtype and...

Fix the embedding dtype.

# Description The embedding attend() didn't cast the query. Refactor the code to fix the issue and make __call__() and attend() consistent. # Tests Local tests passed. # Checklist Before...

Fix the activation type in the checkpoint conversion script

Replace hard coded bfloat16 with config defined activation type. # Description In this script, when creating the decoder state, the weight dtype is hardcoded to bfloat16. So the exported checkpoint...