torchtune [IMPORTANT] The future of torchtune

Dear Tune Community,

We started torchtune in late 2023 with the goal of increasing the accessibility of finetuning in a world where cloud GPUs were prohibitively expensive for many ML practitioners. Thanks to over 150 amazing contributors, today we have 21 recipes spanning the whole post-training stack from SFT to RLHF to QAT.

During this time the AI landscape has rapidly evolved, with ever-increasing scale, an emphasis on agents, and a reinforcement learning renaissance. We have heard your requests on these fronts loud and clear, and in line with our mission at PyTorch to democratize state-of-the-art AI, we are developing a new product - in a new repo - that captures this evolution of torchtune. The product will provide a simple native PyTorch solution for end-to-end post-training with scale as a first-class citizen, while bringing over what you loved from torchtune (hackable recipes, minimal abstraction, strong integrations across the ecosystem). This is an exciting opportunity and we expect to have more to share in the upcoming weeks.

As a result of this new strategy, we are stopping active development on torchtune, effective immediately. We remain committed to supporting our partners and the community to provide a smooth transition. Concretely, this means:

Torchtune will continue to receive critical bug fixes and security patches during 2025
Discord and Github issues will remain open for support
No new features will be added to the library
We would love to keep working with you to assist with a smooth transition and build this new solution together!

If you have any questions or concerns, please reach out on Discord or comment here. Thank you all for your support and all that you've done to build this LLM post-training community with us.

From all of us on the torchtune team, @joecummings @ebsmothers @pbontrager @felipemello1 @RdoubleA @kartikayk @rohan-varma

Our heartfelt thanks to our many contributors, including @SalmanMohammadi @krammnic @daniellepintz @andrewor14 @SLR722 @ankitageorge @svekars @NicolasHug @gokulavasan @janeyx99 @jerryzh168 @thomasjpfan @Ankur-singh @acisseJZhong @calvinpelletier @mirceamironenco @weifengpy @nathan-az @solitude-alive @songhappy @Optimox @RedTachyon @lindawangg @lucylq @hardikjshah @msaroufim @yechenzhi @mreso @IvanKobzarev @tcapelle @gau-nernst @tambulkar @bogdansalyp @albert-inflection

Jul 15 '25 19:07 ebsmothers

Thanks team for all the monumental effort and the gift of Tune. Excited for what's next!

Jul 15 '25 20:07 init27

👍 Excited to know what the next WIP project will be called?

Jul 21 '25 09:07 iqiancheng

Thanks for torchtune! What's the new repo called?

Jul 22 '25 22:07 fuji2021

We hope to share the name in the coming weeks :)

Jul 28 '25 17:07 felipemello1

For torchtune itself, are there any components that could be worthy of upstreaming to core pytorch or unify with torchtitan?

Maybe the generation loop? RoPE module? Checkpointer? OffloadActivations hook?

Aug 12 '25 22:08 vadimkantorov

any updates?

Sep 10 '25 03:09 zhao9797

None that we can share publicly yet, but we are actively working on it!

Sep 15 '25 20:09 felipemello1

E.g. I noticed that a popular framework Verl uses FSDP2 in the simplest way, does not use parallelize_module / TP at all... So if torchtune could somehow upstream / preserve some of its utils / recipes - it would be great

E.g. would it be possible to publish single-file self-contained trainers / training recipes?

Sep 16 '25 22:09 vadimkantorov

I am not sure what I can or cant share, but your requests make sense :)

Edit: Sorry, i misunderstood the first part. Since our focus is on scalability, we should go beyond simple fsdp2

Sep 17 '25 00:09 felipemello1

It would also be great to have (interactive) animations of FSDP/TP/CP communication and computation patterns: unshards/reshards, gradient sync etc

Ultrascale playbook has some static viz, but having animations would change a lot for educational purposes - to add animations directly to FSDP2 docs

Sep 17 '25 00:09 vadimkantorov

that is harder to prioritize for a v1, but i agree that good docs make a big difference

Sep 17 '25 00:09 felipemello1

And it would be great to upstream to core PyTorch maybe at least functional primitives for RoPE / LinearCrossEntropyLoss (maybe even under some private torch.llm._functional) or some others - to prevent too much fragmentation at least for basic models - the most used on HF over the previous year or some other objective metric like this. Currently every framework rolls a copy: torchtune, torchtitan, vllm etc :( And it's unfortunate also when all these get abandonned

Maybe upstream to core PyTorch native support for varlen flash-attention kernels (without requiring NestedTensor). Then maybe downstream projects can drop dependency on flash-attention repo.

Sep 17 '25 00:09 vadimkantorov

Hello, I recommend supporting these features: multimodal models, MoE architecture, long-text training, LoRA, etc. I believe these are quite necessary at present. @felipemello1 @ebsmothers

Sep 20 '25 01:09 dz1iang

@vadimkantorov I will open an issue in core

Sep 20 '25 07:09 krammnic

@felipemello1 any news regarding the new repo?

Sep 30 '25 07:09 hypnopump

Is this the new lib: https://github.com/meta-pytorch/torchforge ?

Oct 22 '25 12:10 johnglover

Is this the new lib: meta-pytorch/torchforge ?

yes! :) https://pytorch.org/blog/introducing-torchforge/

Oct 22 '25 19:10 felipemello1