Gokul

Results 13 comments of Gokul

Also make sure there is a -latest suffix checkpoint for model so that users can download the latest one easily

@Abder88 Currently Tigon supports the following Java object types for emitting from one flowlet to another: All primitive types and boxed types String, enum, arrays and collections User defined POJO...

With this change in https://github.com/pytorch-labs/torchtune/pull/289, we won't be testing HF dataset download in unit test. But we do want to do a nightly run of testing HF load_dataset API.

Looks like adding the index url (for torchdata) is causing other dependencies to not get installed. Will figure out how to fix this

@rlrs Would it be possible to test it after my latest commit ([b9b045d](https://github.com/pytorch/torchtitan/pull/279/commits/b9b045d32933c2824ae6f667e944a51c3255a2d1))? I missed adding that part.

@tianyu-l Addressed PR comments (thank you!), added unit test, and made changes to the github workflows to allow running those unit tests. Let me know if the changes look okay....

@rlrs Thank you for your great analysis here (https://github.com/pytorch/torchtitan/pull/279#issuecomment-2104797493). Helped us narrow down the issue which basically boiled down to in-place loading of checkpoint of DCP. StatefulDataLoader doesn't currently return...

@johnament Please review this PR that fixes the NOTICE file for Apache Tephra when you get a chance. Thank you!

@johnament Thanks for the review John. Please take another look when you get a chance.

Thanks for the PR! Reviewing it now