Norman Mu

Results 12 issues of Norman Mu

## Motivation The Swin transformer backbone is currently unable to process input images differing in size from the pretraining resolution when absolute positional embedding is enabled, due to lack of...

Hi, I'm trying to reproduce the YFCC100M results and would like to know how image captions were preprocessed during training. For instance, how was the caption for the following sample...

Hi, I'm trying to train a detection model with the plain ViT backbone on 8 GPUs (by scaling down batch size + lr 4x) using the 100 epoch config. Training...

Is there a recommended way of using HuggingFace tokenizers inside ffcv pipelines? I realize I could pre-tokenize the text and store the raw ints in the dataset, but I'd like...

This PR implements masking for left contiguous pad tokens by zeroing out intermediate state values, per the discussion at https://github.com/state-spaces/mamba/issues/66, for all three code paths: non-fused, fused without CUDA graph,...

The instructions in the README on running `lm-evaluation-harness` set batch size > 1, and I would like to try batched generation in a standalone script. Per this previous thread (https://github.com/state-spaces/mamba/issues/49#issuecomment-1850980748)...

Hi, In the model card for the chat v1.0 model, it mentions following the "Zephyr training recipe". Does this mean using the [alignment-handbook](https://github.com/huggingface/alignment-handbook/tree/main/recipes) codebase or another reproduction of the Zephyr...

`DPOTrainer.tokenize_row` is not hashable, so the `datasets` library assigns the transformed dataset a "random" fingerprint. This fingerprint relies on the `random` library, but since the random seed is often fixed...

DPO

Thank you for releasing Llama Guard 2, it looks like a very promising model! I was wondering if it would be feasible to release precision/recall curves or numbers by harm...

Llama-Guard

The results reported in https://github.com/huggingface/alignment-handbook/pull/88 suggest that QLoRA is better for both SFT and DPO. Is this accurate, and have people seen this happen in any other settings?