Norman Mu issues

Results 12 issues of


                                            Norman Mu

Fix swin backbone absolute pos_embed

## Motivation The Swin transformer backbone is currently unable to process input images differing in size from the pretraining resolution when absolute positional embedding is enabled, due to lack of...

YFCC100M Caption Preprocessing

Hi, I'm trying to reproduce the YFCC100M results and would like to know how image captions were preprocessed during training. For instance, how was the caption for the following sample...

No predictions from model

Hi, I'm trying to train a detection model with the plain ViT backbone on 8 GPUs (by scaling down batch size + lr 4x) using the 100 epoch config. Training...

Tokenizers in pipelines

Is there a recommended way of using HuggingFace tokenizers inside ffcv pipelines? I realize I could pre-tokenize the text and store the raw ints in the dataset, but I'd like...

Add support for left padding and masking in forward() and generate()

This PR implements masking for left contiguous pad tokens by zeroing out intermediate state values, per the discussion at https://github.com/state-spaces/mamba/issues/66, for all three code paths: non-fused, fused without CUDA graph,...

Batched generation with masking/padding

The instructions in the README on running `lm-evaluation-harness` set batch size > 1, and I would like to try batched generation in a standalone script. Per this previous thread (https://github.com/state-spaces/mamba/issues/49#issuecomment-1850980748)...

Chat v1.0 training recipe

Hi, In the model card for the chat v1.0 model, it mentions following the "Zephyr training recipe". Does this mean using the [alignment-handbook](https://github.com/huggingface/alignment-handbook/tree/main/recipes) codebase or another reproduction of the Zephyr...

Norman Mu