torchchat issues

Add max-autotune for CPU, update profile and fix next token calculation

5

This PR is to add max-autotune for CPU in torch.compile. Meanwhile, split first token and next token in the log print.

yanbing-j

CLA Signed

Flamingo component

1

This PR aims to support flamingo component, including model component, input preprocessing, pipeline update, etc.

Gasoonjia

CLA Signed

Slow eval performance for .pte models

### 🐛 Describe the bug Eval is very slow for PTE models vs. non-exported models - the opposite should be true and can be observed in generate. I suspect this...

vmpuri

performance

actionable

ExecuTorch

Slimming down torchchat: Replace replace_attention_with_custom_sdpa_attention() with ET's implementation

2

### 🚀 The feature, motivation and pitch First surfaced in https://github.com/pytorch/torchchat/pull/1057, the `replace_attention_with_custom_sdpa_attention` function, used when exporting models in torchchat, can be replaced with the equivalent API provided in the...

Jack-Khuu

enhancement

good first issue

ExecuTorch

triaged

Vague error when trying to use the browser if API isn't running

### 🐛 Describe the bug Instructions for running the API are collapsed by default and the instructions for the browser don't call out well that the API needs to be...

byjlw

Push Message formatting into _gen_model_input

1

Minor QoL change; push the formatting of text string prompts into the helper --- ## Testing Tested via browser: ``` python torchchat.py server llama3.2-11b streamlit run torchchat/usages/browser.py ``` Tested in...

Jack-Khuu

CLA Signed

Can torchat call /use the models already downloaded under Ollama?

### 🚀 The feature, motivation and pitch Can torchat pick up the models that have already been downloaded by Ollama. Is there a way to use them without downloading them...

sivaramn

torchchat is pinned to numpy < 2.0 but pytorch is pinned to numpy

### 🐛 Describe the bug https://github.com/pytorch/torchchat/blob/main/install/requirements.txt#L15 https://github.com/pytorch/pytorch/blame/main/requirements.txt#L5 this complicates (I would say "prohibits", but there's probably a way) running torchchat with a locally-built pytorch. ### Versions internal devserver, python 3.12

swolchok

Implement the AO API in torchchat quantization handlers and unify logic.

1

Implement the AO API in torchchat quantization handlers and unify logic. 1 - implement .quantize() for TC quantization handlers and support args to make consistent with AO 2 - remove...

mikekgfb

CLA Signed

Quantization

torchchat
torchchat copied to clipboard

Metadata

Add max-autotune for CPU, update profile and fix next token calculation

Lowbit

Flamingo component

Slow eval performance for .pte models

Slimming down torchchat: Replace replace_attention_with_custom_sdpa_attention() with ET's implementation

Vague error when trying to use the browser if API isn't running

Push Message formatting into _gen_model_input

Can torchat call /use the models already downloaded under Ollama?

torchchat is pinned to numpy < 2.0 but pytorch is pinned to numpy

Implement the AO API in torchchat quantization handlers and unify logic.

← Metadata

Owner

Metadata

torchchat torchchat copied to clipboard

Metadata

← Metadata

Owner

Metadata

torchchat
torchchat copied to clipboard