gabeweisz

Results 4 issues of gabeweisz

MaxText uses the environment variables JAX_COORDINATOR_IP, JAX_COORDINATOR_PORT, NNODES, and NODE_RANK for multi-system GPU training, but JAX_COORDINATOR_ADDRESS, a fixed port, JAX_PROCESS_COUNT, and a combination of several environment variables, and for multi-system...

feature request

With the newest commit, I am seeing this error when running run_sample_image.sh: ValueError: Asking to pad but the tokenizer does not have a padding token. Please select a token to...

When using MaxText with slurm, our jobs only see one GPU per node because jax.distributed assumes one GPU per process when used with slurm (see the [Jax docs](https://qubitpi.github.io/google-jax/_autosummary/jax.distributed.initialize.html#jax.distributed.initialize). This behavior...

good first issue
feature request

The pure python version of the code should add the attention bias here: https://github.com/insuhan/hyper-attn/blob/68b91e9b95ee9504d5beb9f8f6fba5792dfd9a56/models/attention/utils.py#L62