maxtext
maxtext copied to clipboard
Update attention layer for Gemma3 ViT
Description
- Following https://github.com/AI-Hypercomputer/maxtext/pull/2616, the
attentionmodule now returns a tupleout, kv_cacheinstead ofout. This PR updates the output of Gemma3 ViT attention layer. - Update from
nnx.Dropouttolinears.Dropout.
Tests
Tested by decode forward pass.
python -m MaxText.decode MaxText/configs/base.yml model_name=gemma3-4b tokenizer_type=huggingface tokenizer_path=google/gemma-3-4b-it load_parameters_path=gs://maxtext-gemma/unified/gemma3/4b/unscanned/2025-08-09-01-17/0/items per_device_batch_size=1 run_name=ht_test max_prefill_predict_length=272 max_target_length=372 steps=1 async_checkpointing=false scan_layers=false use_multimodal=true prompt=\'Describe\ image\ \<start_of_image\>\' image_path=\'/home/hengtaoguo_google_com/projects/maxtext/src/MaxText/test_assets/test_image.jpg\' attention=\'dot_product\' hf_access_token=xxx
Checklist
Before submitting this PR, please make sure (put X in square brackets):
- [x] I have performed a self-review of my code. For an optional AI review, add the
gemini-reviewlabel. - [x] I have necessary comments in my code, particularly in hard-to-understand areas.
- [x] I have run end-to-end tests tests and provided workload links above if applicable.
- [x] I have made or will make corresponding changes to the doc if needed, including adding new documentation pages to the relevant Table of Contents (toctree directive) as explained in our documentation.