maxtext icon indicating copy to clipboard operation
maxtext copied to clipboard

Update attention layer for Gemma3 ViT

Open hengtaoguo opened this issue 2 months ago • 0 comments

Description

  • Following https://github.com/AI-Hypercomputer/maxtext/pull/2616, the attention module now returns a tuple out, kv_cache instead of out. This PR updates the output of Gemma3 ViT attention layer.
  • Update from nnx.Dropout to linears.Dropout.

Tests

Tested by decode forward pass.

python -m MaxText.decode MaxText/configs/base.yml model_name=gemma3-4b tokenizer_type=huggingface tokenizer_path=google/gemma-3-4b-it load_parameters_path=gs://maxtext-gemma/unified/gemma3/4b/unscanned/2025-08-09-01-17/0/items per_device_batch_size=1 run_name=ht_test max_prefill_predict_length=272 max_target_length=372 steps=1 async_checkpointing=false scan_layers=false use_multimodal=true prompt=\'Describe\ image\ \<start_of_image\>\' image_path=\'/home/hengtaoguo_google_com/projects/maxtext/src/MaxText/test_assets/test_image.jpg\' attention=\'dot_product\' hf_access_token=xxx

Checklist

Before submitting this PR, please make sure (put X in square brackets):

  • [x] I have performed a self-review of my code. For an optional AI review, add the gemini-review label.
  • [x] I have necessary comments in my code, particularly in hard-to-understand areas.
  • [x] I have run end-to-end tests tests and provided workload links above if applicable.
  • [x] I have made or will make corresponding changes to the doc if needed, including adding new documentation pages to the relevant Table of Contents (toctree directive) as explained in our documentation.

hengtaoguo avatar Nov 20 '25 23:11 hengtaoguo