diffusers icon indicating copy to clipboard operation
diffusers copied to clipboard

Request proper examples on how to training a diffusion models with diffusers on large scale dataset like LAION

Open Luciennnnnnn opened this issue 1 year ago • 6 comments

Hi, I do not see any examples in diffusers/examples on how to training a diffusion models with diffusers on large scale dataset like LAION. However, it is important since many works and models is willing integrate their models into diffusers, so if they can train their models in diffusers, it would be more easy when they want to do it.

Luciennnnnnn avatar Mar 08 '24 01:03 Luciennnnnnn

By the way, I'm that guy want to train my model in diffusers with LAION.

Luciennnnnnn avatar Mar 08 '24 01:03 Luciennnnnnn

If you aim to train a text-conditioned diffusion model, one of these scripts might be suitable for your needs. However, when dealing with extensive datasets, it is advisable to convert the data into the webdataset format. This will necessitate modifying the code to accommodate the new data structure.

sapkun avatar Mar 08 '24 06:03 sapkun

Hi @sapkun , I see huggingface have their own dataset format with Apache Arrow as support, what's the advantage of webdataset compared with it?

Luciennnnnnn avatar Mar 08 '24 07:03 Luciennnnnnn

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

github-actions[bot] avatar Apr 07 '24 15:04 github-actions[bot]

Hi @sapkun , I see huggingface have their own dataset format with Apache Arrow as support, what's the advantage of webdataset compared with it?

WebDataset is designed to support streaming and shuffling of data, which is beneficial when working with large datasets that cannot fit entirely in memory. It allows you to process data on the fly, reading and preprocessing samples as needed during training.

sapkun avatar Apr 10 '24 01:04 sapkun

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

github-actions[bot] avatar May 04 '24 15:05 github-actions[bot]

Follow https://github.com/bghira/SimpleTuner.

sayakpaul avatar Jun 30 '24 05:06 sayakpaul