gabeweisz issues

Results 4 issues of


                                            gabeweisz

Inconsistent environment variable names

MaxText uses the environment variables JAX_COORDINATOR_IP, JAX_COORDINATOR_PORT, NNODES, and NODE_RANK for multi-system GPU training, but JAX_COORDINATOR_ADDRESS, a fixed port, JAX_PROCESS_COUNT, and a combination of several environment variables, and for multi-system...

feature request

Asking to pad but the tokenizer does not have a padding token

With the newest commit, I am seeing this error when running run_sample_image.sh: ValueError: Asking to pad but the tokenizer does not have a padding token. Please select a token to...

Cannot see multiple GPUs when using Slurm (with proposed fix)

When using MaxText with slurm, our jobs only see one GPU per node because jax.distributed assumes one GPU per process when used with slurm (see the [Jax docs](https://qubitpi.github.io/google-jax/_autosummary/jax.distributed.initialize.html#jax.distributed.initialize). This behavior...

good first issue

feature request

Python version of attention fails to add the bias

The pure python version of the code should add the attention bias here: https://github.com/insuhan/hyper-attn/blob/68b91e9b95ee9504d5beb9f8f6fba5792dfd9a56/models/attention/utils.py#L62