Text-to-Image Alignment Performance of the ELLA-SDXL Model
First of all, I would like to express my sincere gratitude for your open-source model ELLA, which is truly remarkable. I have been closely following your team's work, and as soon as the model was released, I couldn't wait to test it. I evaluated the text-to-image alignment performance of the ELLA-SD1.5 model using GenEval. Compared to the original Stable Diffusion 1.5, ELLA-SD1.5 demonstrated a 7 percentage point improvement in text-to-image alignment, and the improvement was even more significant when compared to Salesforce's diffusion-DPO method. I noticed that Stable Diffusion 3 has adopted GenEval to evaluate its text-to-image alignment performance. Therefore, I would like to inquire whether your team has plans to release the GenEval evaluation scores for ELLA-SDXL. This would enable us to compare the performance of ELLA-SDXL relative to SD3 on a unified scale.
| model | Overall | single | two | counting | colors | position | color_attr |
|---|---|---|---|---|---|---|---|
| SD1.5 | 42.34 | 95.62 | 37.63 | 37.81 | 74.73 | 3.50 | 4.75 |
| SD1.5-DPO | 43.00 | 96.88 | 39.90 | 38.75 | 75.53 | 3.25 | 3.75 |
| ELLA-SD1.5 | 49.94 | 94.69 | 55.81 | 36.56 | 77.32 | 14.75 | 20.50 |
| SDXL | 55.63 | 98.12 | 75.25 | 43.75 | 89.63 | 11.25 | 15.75 |
| SDXL-DPO | 58.02 | 99.38 | 82.58 | 49.06 | 85.11 | 13.50 | 18.50 |
| ELLA-SDXL | |||||||
| DALL-E 3 | 67.00 | 96.00 | 87.00 | 47.00 | 83.00 | 43.00 | 45.00 |
| SD3 best | 74.00 | 99.00 | 94.00 | 72.00 | 89.00 | 33.00 | 60.00 |
I can't wait! Thanks for the research!
Thanks a lot for your job and research! @xiexiaoshinick