Lance Wang

Results 12 issues of Lance Wang

Here is the minimum config required to support llama3.1 405B GPU Training.

# Description Fix BadSyntheticDataIterator for grain. The local iterator is missing and workload will error out on when using Grain dataset together with pdb < 1. # Tests Manually run...

# Description The circular dependency is multihost_dataloading -> maxtext_utils -> checkpointing -> multihost_dataloading. That's why in the previous development, we removed the deps of maxtext_utils from multihost_dataloading to avoid the...

pull ready

# Description The smallest llama model we support is 7b, which is still too slow for local development. This PR creates a toy model with much smaller For functional testing...

stale

# Description The Jax stable stack image contain Jax, cuda and MaxText, but lack of some gcp deps. This PR intends to add these deps and build an image that...

stale

# Description The current function only switches between bf16 and f32. It should also take into consideration of f16. # Tests No functional change. Reply on presubmit tests. # Checklist...

pull ready

# Description When calling the MoE megablox gmm kernle, the preferred_element_type is hardcoded to bfloat16. Replace with config controlled dtype instead of hardcoding. # Tests No functional change, reply on...

…ine run. # Description For the offline serving engine, the weight dtype is hardcoded to bf16 which blocks the other dtype experiments. This PR pass in the weight dtype and...

# Description The embedding attend() didn't cast the query. Refactor the code to fix the issue and make __call__() and attend() consistent. # Tests Local tests passed. # Checklist Before...

Replace hard coded bfloat16 with config defined activation type. # Description In this script, when creating the decoder state, the weight dtype is hardcoded to bfloat16. So the exported checkpoint...