OneTrainer icon indicating copy to clipboard operation
OneTrainer copied to clipboard

[Bug]: Headers of embedding incorrect with Flux LoRa + additional embedding, or Comfyui issue ?

Open Zokreb opened this issue 1 year ago • 7 comments

What happened?

OK, so, I'm trying to train a Flux LoRa (Loha to be precise) with one additional embedding During training, samples look like they are really excellent, much more so than without the additional embedding, so this seems to be of great effect. However, in comfyui, i did not get results as good as in the samples (it looked very undertrained). While looking at the console output of comfy, here is what I see :

WARNING: shape mismatch when trying to apply embedding, embedding will be ignored 768 != 4096 WARNING: shape mismatch when trying to apply embedding, embedding will be ignored 768 != 4096 WARNING: shape mismatch when trying to apply embedding, embedding will be ignored 768 != 4096 WARNING: shape mismatch when trying to apply embedding, embedding will be ignored 768 != 4096

Out of curiosity, I looked at the embedding in an Hex editor, since the files are small, well, it was worth a shot and it starts with some json data of what I suppose is the description of the binary data structure of the embedding:

{
    "clip_g": {
        "dtype": "BF16",
        "shape": [
            4,
            768
        ],
        "data_offsets": [
            0,
            6144
        ]
    },
    "t5": {
        "dtype": "BF16",
        "shape": [
            4,
            4096
        ],
        "data_offsets": [
            6144,
            38912
        ]
    }
}

So, this makes me wonder if there is an issue with the way the embedding is written to disk since Flux is supposed to be CLIP_L+T5, whereas here, I see CLIP_G+T5 ?

I trained the same dataset on SDXL to see how the clip_l section looked like and it clearly states "clip_l". In the SDXL database, the clip_g dimension was [4,1280] while the clip_l dimension was [4,768]

the dtype was F32 instead of BF16 and the offsets were doubled, but it was consistent with the output data type of the files I forgot to change from FLOAT32 to BF16 in this second run.

I tried to change in the Hex editor clip_g to clip_l, but comfy simply ignored the embedding. I know the embedding must have some kind of effect since running the same seed without the embedding is even worse.

Now, can someone confirm that, this embedding is actually properly formatted and therefore, there is a bug in the comfyui parser of the embedding, or did I actually found a bug in OneTrainer ?

Thank you, this piece of software is awesome !

What did you expect would happen?

Same quality between samples during training and comfyui output

Relevant log output (this is from Comfyui, just to be clear)

WARNING: shape mismatch when trying to apply embedding, embedding will be ignored 768 != 4096
WARNING: shape mismatch when trying to apply embedding, embedding will be ignored 768 != 4096
WARNING: shape mismatch when trying to apply embedding, embedding will be ignored 768 != 4096
WARNING: shape mismatch when trying to apply embedding, embedding will be ignored 768 != 4096

Output of pip freeze

No response

Zokreb avatar Oct 16 '24 18:10 Zokreb

I am seeing the exact same problem and have gone through almost exactly the same steps (except I hacked the comfy code to look for the clip_g tag instead. I also had to change it to look for the "t5" tag because comfy was expecting "t5xxl". (note I've looked at forge too and it is expecting "t5xxl")

After making those changes it got further but crashed on "!!! Exception during processing !!! Only Tensors of floating point and complex dtype can require gradients"

This is where my knowledge ran out so I dicked around trying different lora output formats (no change), and different precision values for text encoders (no idea if this is even a worthwhile idea but it didn't work anyway). I think I finally forced "requiresgraidents" to false in the comfy code and got some error that I can't remember about some weight value that was in the 32000ish range not matching another similarly high number.

This can be reproduced by running a train with an additional embedding turned on for flux, (I used a a custom placeholder and initial embedding text), with train embeddings for text encoder 1 turned on. I've reproduced this with train embeddings for both text encoders as well.

I'm seeing such good results with this training in the samples that I'm literally foaming at the mouth to get this working. Keen to help if I can

howsyourface avatar Nov 23 '24 00:11 howsyourface

same issue with sd3.5m embeddings: WARNING: shape mismatch when trying to apply embedding, embedding will be ignored 1280 != 4096 WARNING: shape mismatch when trying to apply embedding, embedding will be ignored 1280 != 4096 WARNING: shape mismatch when trying to apply embedding, embedding will be ignored 1280 != 4096

Koratahiu avatar Apr 23 '25 06:04 Koratahiu

I just pulled the latest and gave this one another go over the last few days, hoping it would be resolved and I have reproduced it again :(

Gah, frustrating, I can train some amazing embeddings and loras in flux with this method but can only produce good images through the OneTrainer sample generation as Kohya, Comfy etc will both produce the error:

"shape mismatch when trying to apply embedding, embedding will be ignored 768 != 4096"

howsyourface avatar Jun 26 '25 21:06 howsyourface

changing to enhancement. Embeddings were not supported by Comfy at the time. They still aren't, but there is some development work ongoing @Koratahiu

dxqb avatar Sep 07 '25 20:09 dxqb

Gave this another go and it appears fixed!! I can see the correct tags in the embedding output now. Thank you very much guys

howsyourface avatar Sep 07 '25 20:09 howsyourface

Gave this another go and it appears fixed!! I can see the correct tags in the embedding output now. Thank you very much guys

Thanks for letting us know, i'll give it a try

changing to enhancement. Embeddings were not supported by Comfy at the time. They still aren't, but there is some development work ongoing @Koratahiu

Thanks as well, i'll check that out.

Zokreb avatar Sep 09 '25 17:09 Zokreb

@Zokreb Can you confirm?

O-J1 avatar Oct 12 '25 09:10 O-J1

@Zokreb Another gentle reminder, can you please confirm if this is solved for you? If I dont hear back from you I will assume fixed and will close this issue in a week

O-J1 avatar Dec 01 '25 15:12 O-J1

Hi @O-J1 > Thanks for tagging me. Please mark as closed, I had a [redacted] a few weeks back that won't allow me to test stuff for the coming days/weeks.

My condolences, hope things get easier. Just doing my due dillegence, I pinged Zokreb and this is the first time you have have interacted in this repo, am I correct in assuming that this your other github account?

O-J1 avatar Dec 01 '25 17:12 O-J1

Oops, yes it is, i was at work. Maybe i'll delete the other post :)

Zokreb avatar Dec 01 '25 18:12 Zokreb

Thank you, best wishes. I redacted my quote reply so that its more vague in the event you do decide to delete it

O-J1 avatar Dec 01 '25 18:12 O-J1

Thank you @O-J1. I appreciate it.

Zokreb avatar Dec 01 '25 18:12 Zokreb