[IMPORTANT] The future of torchtune
Dear Tune Community,
We started torchtune in late 2023 with the goal of increasing the accessibility of finetuning in a world where cloud GPUs were prohibitively expensive for many ML practitioners. Thanks to over 150 amazing contributors, today we have 21 recipes spanning the whole post-training stack from SFT to RLHF to QAT.
During this time the AI landscape has rapidly evolved, with ever-increasing scale, an emphasis on agents, and a reinforcement learning renaissance. We have heard your requests on these fronts loud and clear, and in line with our mission at PyTorch to democratize state-of-the-art AI, we are developing a new product - in a new repo - that captures this evolution of torchtune. The product will provide a simple native PyTorch solution for end-to-end post-training with scale as a first-class citizen, while bringing over what you loved from torchtune (hackable recipes, minimal abstraction, strong integrations across the ecosystem). This is an exciting opportunity and we expect to have more to share in the upcoming weeks.
As a result of this new strategy, we are stopping active development on torchtune, effective immediately. We remain committed to supporting our partners and the community to provide a smooth transition. Concretely, this means:
- Torchtune will continue to receive critical bug fixes and security patches during 2025
- Discord and Github issues will remain open for support
- No new features will be added to the library
- We would love to keep working with you to assist with a smooth transition and build this new solution together!
If you have any questions or concerns, please reach out on Discord or comment here. Thank you all for your support and all that you've done to build this LLM post-training community with us.
From all of us on the torchtune team, @joecummings @ebsmothers @pbontrager @felipemello1 @RdoubleA @kartikayk @rohan-varma
Our heartfelt thanks to our many contributors, including @SalmanMohammadi @krammnic @daniellepintz @andrewor14 @SLR722 @ankitageorge @svekars @NicolasHug @gokulavasan @janeyx99 @jerryzh168 @thomasjpfan @Ankur-singh @acisseJZhong @calvinpelletier @mirceamironenco @weifengpy @nathan-az @solitude-alive @songhappy @Optimox @RedTachyon @lindawangg @lucylq @hardikjshah @msaroufim @yechenzhi @mreso @IvanKobzarev @tcapelle @gau-nernst @tambulkar @bogdansalyp @albert-inflection
Thanks team for all the monumental effort and the gift of Tune. Excited for what's next!
👍 Excited to know what the next WIP project will be called?
Thanks for torchtune! What's the new repo called?
We hope to share the name in the coming weeks :)
For torchtune itself, are there any components that could be worthy of upstreaming to core pytorch or unify with torchtitan?
Maybe the generation loop? RoPE module? Checkpointer? OffloadActivations hook?
any updates?
None that we can share publicly yet, but we are actively working on it!
E.g. I noticed that a popular framework Verl uses FSDP2 in the simplest way, does not use parallelize_module / TP at all... So if torchtune could somehow upstream / preserve some of its utils / recipes - it would be great
E.g. would it be possible to publish single-file self-contained trainers / training recipes?
I am not sure what I can or cant share, but your requests make sense :)
Edit: Sorry, i misunderstood the first part. Since our focus is on scalability, we should go beyond simple fsdp2
It would also be great to have (interactive) animations of FSDP/TP/CP communication and computation patterns: unshards/reshards, gradient sync etc
Ultrascale playbook has some static viz, but having animations would change a lot for educational purposes - to add animations directly to FSDP2 docs
that is harder to prioritize for a v1, but i agree that good docs make a big difference
And it would be great to upstream to core PyTorch maybe at least functional primitives for RoPE / LinearCrossEntropyLoss (maybe even under some private torch.llm._functional) or some others - to prevent too much fragmentation at least for basic models - the most used on HF over the previous year or some other objective metric like this. Currently every framework rolls a copy: torchtune, torchtitan, vllm etc :( And it's unfortunate also when all these get abandonned
Maybe upstream to core PyTorch native support for varlen flash-attention kernels (without requiring NestedTensor). Then maybe downstream projects can drop dependency on flash-attention repo.
Hello, I recommend supporting these features: multimodal models, MoE architecture, long-text training, LoRA, etc. I believe these are quite necessary at present. @felipemello1 @ebsmothers
@vadimkantorov I will open an issue in core
@felipemello1 any news regarding the new repo?
Is this the new lib: https://github.com/meta-pytorch/torchforge ?
Is this the new lib: meta-pytorch/torchforge ?
yes! :) https://pytorch.org/blog/introducing-torchforge/