Franz Louis Cesista
Franz Louis Cesista
typos
Halo! There are typos at lines 59 and 114. I'm not sure if I'm right with the latter tho. I don't know how attention mechanisms work -- I just followed...
This speeds up the `attention_forward_kernel2` kernel by replacing the implementation with a minimal Flash Attention 2 kernel as can be found in https://github.com/leloykun/flash-hyperbolic-attention-minimal/blob/main/flash_attention_2.cu Benchmark results on an A100 (80GB) Attention...
The utility repeater is stuck here: ``` Repeater Tool for Russian AI Cup By Russian AI Cup Team [Mon Dec 31 12:58:32 PHT 2018]: Repeater has been started [token=e1fab727820078842783c6eb76f15c9670b389b2_1] [....]...
Pydantic's `.model_json_schema()` and `get_schema_from_signature` don't actually make optional fields/arguments optional in the json schema. This forces the model to output the keys even when the values are `null` anyway--slowing down...
This PR auto-applies chat templates by default when using instruct/chat models. Doesn't support LlamaCPP for now tho. --- ### Why? Instruct/Chat models tend to be annoyingly template dependent (i.e. they...
### Feature request This is a tracker issue for work on _interleaved_ in-and-out image-text generation. There are now >= 4 open-source models that can do _interleaved_ image-text generation--and many more...
# What does this PR do? Fix regression on `Processor.save_pretrained` caused by https://github.com/huggingface/transformers/pull/31691 tl;dr: a month ago, we made a change that removed `"chat_template"` from `processor_dict` when saving a processor....
# What does this PR do? - Uniformizes kwargs for processors of audio-text models. - An extension of https://github.com/huggingface/transformers/issues/31911 - NOTE: don't review nor merge until this PR is complete:...
# Description This PR implements a minimal backward pass for flash attention. I got these results on my RTX 2060 ``` === profiling manual attention (backward pass) === ... Self...
  ## ChangeLog * **Added UNet connectivity structure on the value embeddings**. This allowed us to reduce the number of value embeddings from 12 to 6 and the total...