Hg model training with torch DataLoader

Open annasun28 opened this issue 3 months ago • 0 comments

What does this PR do? Please describe:

For the initial SFT recipe, we've set it up with the dataloader / preprocessing logic from LLaMa-factory which relies on torch DataLoader. TODO: handle state dict saving w/stateful sampler. TBD: create a new class instead of overloading the existing DataPipelineReader with 2 different pipeline types
Update fairseq2 HgTokenizer with some methods required from the base transformers Tokenizer. TBD: whether we should actually subclass and override methods so that we automatically fall back to the base transformers Tokenizer methods?
Enable FSDP for Hg model (matching the way FSDP wrapping is handled in transformers / accelerate). TODO: make the transformer_cls_names_to_wrap configurable

Tested with https://github.com/fairinternal/seamless_next/pull/750

Does your PR introduce any breaking changes? If yes, please list them: N/A

Check list:

[ ] Was the content of this PR discussed and approved via a GitHub issue? (no need for typos or documentation improvements)
[ ] Did you read the contributor guideline?
[ ] Did you make sure that your PR does only one thing instead of bundling different changes together?
[ ] Did you make sure to update the documentation with your changes? (if necessary)
[ ] Did you write any new necessary tests?
[ ] Did you verify new and existing tests pass locally with your changes?
[ ] Did you update the CHANGELOG? (no need for typos, documentation, or minor internal changes)

Nov 14 '25 19:11 annasun28