litdata icon indicating copy to clipboard operation
litdata copied to clipboard

TPU support

Open miguelalba96 opened this issue 1 year ago • 6 comments

🚀 Feature

TPU support

Motivation

Does litdata supports TPU environments, specifically when using lighting fabric?

Additional context

I have >16M image-text pairs I am writing in mosaic-ml streaming format to train contrastive models, I am working with lighting fabric to train using DDP in GCP and I want to move to TPU training. mosaic-ml streaming dataset doesn't support TPU (afaik), all of this bring me to the questions:

  • Does litdata work on TPU?
  • Does it require to set up something in the code additional to what is provided in the available documentation. (ex. is it necessary to provide a distributed sampler? or set different env variables?)
  • Do you have an example of how to setup TPU training with litdata?

miguelalba96 avatar Mar 26 '24 09:03 miguelalba96

Hi! thanks for your contribution!, great first issue!

github-actions[bot] avatar Mar 26 '24 09:03 github-actions[bot]

Hey @miguelalba96,

I haven't tried with TPU. Maybe @carmocca would know more.

tchaton avatar Mar 29 '24 08:03 tchaton

litdata is meant to be used with a regular DataLoader, so there's nothing specific to do on a TPU machine. If you use Fabric or PyTorch Lightning, that will take care of enabling the DistributedSampler or do any required XLA steps, but these are common to all TPU runs, not just those using litdata

carmocca avatar Apr 03 '24 11:04 carmocca

I will recommend setting this env variables: DATA_OPTIMIZER_GLOBAL_RANK DATA_OPTIMIZER_NUM_WORKERS DATA_OPTIMIZER_NUM_NODES

Otherwise the StreamDataloader will not be aware of the distribution.

dasoto avatar Apr 10 '24 00:04 dasoto

Yes, as @dasoto mentioned, I didn't add wiring for TPU env detection. Feel free to contribute support for it if you try litdata on TPUs.

tchaton avatar Apr 11 '24 08:04 tchaton

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale[bot] avatar Apr 16 '25 06:04 stale[bot]