fairseq2 icon indicating copy to clipboard operation
fairseq2 copied to clipboard

Hg model training with torch DataLoader

Open annasun28 opened this issue 3 months ago • 0 comments

What does this PR do? Please describe:

  • For the initial SFT recipe, we've set it up with the dataloader / preprocessing logic from LLaMa-factory which relies on torch DataLoader. TODO: handle state dict saving w/stateful sampler. TBD: create a new class instead of overloading the existing DataPipelineReader with 2 different pipeline types
  • Update fairseq2 HgTokenizer with some methods required from the base transformers Tokenizer. TBD: whether we should actually subclass and override methods so that we automatically fall back to the base transformers Tokenizer methods?
  • Enable FSDP for Hg model (matching the way FSDP wrapping is handled in transformers / accelerate). TODO: make the transformer_cls_names_to_wrap configurable

Tested with https://github.com/fairinternal/seamless_next/pull/750

Does your PR introduce any breaking changes? If yes, please list them: N/A

Check list:

  • [ ] Was the content of this PR discussed and approved via a GitHub issue? (no need for typos or documentation improvements)
  • [ ] Did you read the contributor guideline?
  • [ ] Did you make sure that your PR does only one thing instead of bundling different changes together?
  • [ ] Did you make sure to update the documentation with your changes? (if necessary)
  • [ ] Did you write any new necessary tests?
  • [ ] Did you verify new and existing tests pass locally with your changes?
  • [ ] Did you update the CHANGELOG? (no need for typos, documentation, or minor internal changes)

annasun28 avatar Nov 14 '25 19:11 annasun28