Request proper examples on how to training a diffusion models with diffusers on large scale dataset like LAION
Hi, I do not see any examples in diffusers/examples on how to training a diffusion models with diffusers on large scale dataset like LAION. However, it is important since many works and models is willing integrate their models into diffusers, so if they can train their models in diffusers, it would be more easy when they want to do it.
By the way, I'm that guy want to train my model in diffusers with LAION.
If you aim to train a text-conditioned diffusion model, one of these scripts might be suitable for your needs. However, when dealing with extensive datasets, it is advisable to convert the data into the webdataset format. This will necessitate modifying the code to accommodate the new data structure.
Hi @sapkun , I see huggingface have their own dataset format with Apache Arrow as support, what's the advantage of webdataset compared with it?
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.
Hi @sapkun , I see huggingface have their own dataset format with Apache Arrow as support, what's the advantage of webdataset compared with it?
WebDataset is designed to support streaming and shuffling of data, which is beneficial when working with large datasets that cannot fit entirely in memory. It allows you to process data on the fly, reading and preprocessing samples as needed during training.
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.
Follow https://github.com/bghira/SimpleTuner.