Michael Gschwind issues

Results 82 issues of


                                            Michael Gschwind

Support code gen for non-cuda targets with gpt-fast

Extend existing device variable to support code gen for other targets. New args to generate -- use_sdpa # by default, SDPA is disabled for best performance on CUDA. CPU presents...

CLA Signed

Remove dump of model IR

Please remove the IR being dumped on users that has no practical purpose for users, and just obscures real messages that the user should see.

build of executorch triggers a warning related to too many arguments provided for format string

build of executorch triggers a warning related to too many arguments provided for format string. https://github.com/pytorch/torchchat/actions/runs/8768143958/job/24062159094?pr=327 ``` In file included from /home/runner/work/torchchat/torchchat/etorch/executorch/extension/data_loader/../../../executorch/runtime/core/error.h:18, from /home/runner/work/torchchat/torchchat/etorch/executorch/extension/data_loader/../../../executorch/runtime/core/result.h:19, from /home/runner/work/torchchat/torchchat/etorch/executorch/extension/data_loader/../../../executorch/runtime/core/data_loader.h:14, from /home/runner/work/torchchat/torchchat/etorch/executorch/extension/data_loader/../../../executorch/extension/data_loader/mmap_data_loader.h:11, from /home/runner/work/torchchat/torchchat/etorch/executorch/extension/data_loader/mmap_data_loader.cpp:9:...

Rename Embedding QuantHandler

Summary: Rename Embedding QuantHandler Differential Revision: D56230678

CLA Signed

fb-exported

Downstream users have dependences on cmake variables and internals, making cmake a compatibility surface

x-ref: https://github.com/pytorch/torchchat/issues/655 We're resolving https://github.com/pytorch/torchchat/issues/655 by pinning, but after we're in beta, downstream projects that need to build with cmake may expect their cmake files to work reliably, amking cmake...

Executorch reports a bug for pages and pages: [method.cpp:939] Overriding output data pointer allocated by memory plan is not allowed.

https://github.com/pytorch/torchchat/actions/runs/8955682937/job/24596656941?pr=680 Is this just an internal message we should suppress because it makes me worried as a user that the program has a bug, or is this a real bug...

bug

high priority

triage review

Symbol not found during dlopen (Python3.7)

I've been following the instructions for building pytext documentation with Python3.7 (and 3.8) on a Mac (Catalina 10.15.6) at https://pytext.readthedocs.io/en/master/hacking_pytext.html#creating-documentation and I'm running into the following error during "make html"...

Executorch exported model produces gibberish: stories15M --dtype fp32 --quantize '{"embedding": {"bitwidth": 4, "groupsize":32}, "linear:a8w4dq": {"groupsize" : 256}}'

stories15M produces gibberish with embedding quantization and a8w4dq on macOS/ARM. Integration issue maybe? (Since this is the workhorse for mobile, with. ARM?!) https://github.com/pytorch/torchchat/actions/runs/8997932498/job/24717027755?pr=718 (at bottom) ======================================== Average tokens/sec: 19.35 Memory...

Using a simple model for torchtune setup testing in PRs?

I'm trying to run torchtune as an on-pr test for torchchat to ensure on-going comptaibility. Alas, llama3 can't be used because on-pr tests can't use HF tokens. Can you please...

[method.cpp:825] Error setting input 0: 0x10

As per @ali-khosh -- this appears to be an issue with Executorch model handling, or should we do something differently how we use ET for eval? When running python3 torchchat.py...