What does this PR do?

Extension of https://github.com/huggingface/diffusers/pull/7995.

Some comments are in line.

Todos

[ ] Add licensing

Jun 27 '24 11:06 sayakpaul

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Jun 27 '24 11:06 HuggingFaceDocBuilderDev

@DN6 this is a point to be worked out. What criterion to follow for placing a class in combined and what to follow for image_text. At the moment, embedding classes dealing with images and texts are placed in image_text. Combined is usually for classes that take either timesteps or other forms of conditioning along with {image,text} into account.

Jul 03 '24 12:07 sayakpaul

Can't we split these into image and text?

I was trying to follow this:

https://github.com/huggingface/diffusers/pull/7995#issuecomment-2187356771

IMO, it's perhaps better to have all the embedding classes in combined.py that deal with timestep or mix modalities (image and text, for example) regardless of whether they have Combined in their class names or not. And then split image_text into image and text separately w.r.t to their types. So, if a class is ONLY dealing with text, we put it in embeddings/text.py.

WDYT @DN6 @yiyixuxu ?

Sorry about the back and forth here.

Jul 03 '24 12:07 sayakpaul

IMO, it's perhaps better to have all the embedding classes in combined.py that deal with timestep or mix modalities (image and text, for example) regardless of whether they have Combined in their class names or not. And then split image_text into image and text separately w.r.t to their types. So, if a class is ONLY dealing with text, we put it in embeddings/text.py.

This makes sense to me.

Jul 03 '24 12:07 DN6

Alright. Will wait for @yiyixuxu to comment as well before making changes.

Jul 03 '24 12:07 sayakpaul

combined does not need "combined" in names, of course

regardless of whether they have Combined in their class names or not.

for this, I am not sure what you mean. Do you want to put all the timestep embeddings into combined.py? timestep embeddings should have their own file, but it is a "combined" embedding that takes various inputs, including timestep, of course, it should definitely go into combined

it's perhaps better to have all the embedding classes in combined.py that deal with timestep or mix modalities

ok with this

And then split image_text into image and text separately w.r.t to their types. So, if a class is ONLY dealing with text, we put it in embeddings/text.py.

and in addition, I made a comment here https://github.com/huggingface/diffusers/pull/8722#discussion_r1664447256 - we can put these attention pool layers into text

Jul 03 '24 20:07 yiyixuxu

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

Sep 14 '24 15:09 github-actions[bot]

[Core] refactor `embeddings`

What does this PR do?

Todos