ComfyUI icon indicating copy to clipboard operation
ComfyUI copied to clipboard

Feature/save preview latent

Open ltdrdata opened this issue 2 years ago • 16 comments

add SavePreviewLatent node.

  • use .png as container of latent.
  • exif=latent_tensor, pnginfo=same as saveimage -> you can load workflow on frontend
  • image_opt is optional: use logo.png if None -> logo.png is temporary image for testing. it must be changed to proper image.

LoadLatent

  • additional support for .latent.png
preview preview2

ltdrdata avatar May 18 '23 15:05 ltdrdata

Absolutely should not use PNG. It's a massive wasteful format, that even with optimization, will only save you anywhere form nothing to 10%.

One of the main points here was something that is tiny and can be shared, but not just a straight thumbnail because we're using a terrible format.

WASasquatch avatar May 18 '23 16:05 WASasquatch

If you want to keep the data with not loss you can use .bmp format as a no compression row format. https://en.wikipedia.org/wiki/BMP_file_format

levaleureux avatar May 18 '23 16:05 levaleureux

If you want to keep the data with not loss you can use .bmp format as a no compression row format. https://en.wikipedia.org/wiki/BMP_file_format

The reason I chose PNG is not because it is lossless. From a preview perspective, it is clearly a disadvantage because it sacrifices the advantages of file size. However, it is simply because the default image format in the current ComfyUI is PNG, which makes it easier to share the codebase. For example, there is no need to do any additional code work for tasks like workflow load using PNG.

Currently, it can be seen as an implementation that is close to a PoC. We can perceive the usability of both the safetensor format alone and its usage as an image container.

ltdrdata avatar May 19 '23 02:05 ltdrdata

I'm planning to improve it by applying a method demonstrated on how to decode without a VAE that was introduced last night. Instead of using a logo, I want to generate a basic thumbnail using this approach.

ltdrdata avatar May 19 '23 02:05 ltdrdata

I added PNG support to my class earlier, and also a form of PNG compression. Loss of colors heavily impacts PNG filesize. It could be applied to latent to RGB previews for further optimization. However the RGB previews of latents are pretty bad looking. Kinda worse then heavy jpeg compression. Probably why it hasn't been thougtht about for thumbnailing compression.

https://github.com/WASasquatch/ComfyLatentImage

WASasquatch avatar May 19 '23 03:05 WASasquatch

The consideration of using latent to RGB as a preview is solely intended as a convenient method for individuals who have no intention of connecting VAE for image visualization and storage. It would be much more useful than a meaningless logo, at the very least.

Furthermore, I am contemplating the idea of incorporating a marker indicating the presence of "latent" into the image, rather than simply providing it as a thumbnail.

I added PNG support to my class earlier, and also a form of PNG compression. Loss of colors heavily impacts PNG filesize. It could be applied to latent to RGB previews for further optimization. However the RGB previews of latents are pretty bad looking. Kinda worse then heavy jpeg compression. Probably why it hasn't been thougtht about for thumbnailing compression.

https://github.com/WASasquatch/ComfyLatentImage

The consideration of using latent to RGB as a preview is solely intended as a convenient method for individuals who have no intention of connecting VAE for image visualization and storage. It would be much more useful than a meaningless logo, at the very least.

Furthermore, I am contemplating the idea of incorporating a marker indicating the presence of "latent" into the image, rather than simply providing it as a thumbnail.

ltdrdata avatar May 19 '23 03:05 ltdrdata

You mean connecting a optional image to store as the preview? Shouldn't need VAE there. That is probably better than a placeholder image. My idea was an overlay of some basic information, as well as branding like link to repo for exposure since a1111 dominates all

Also I hope you know you are talking to WAS, who proposed this original idea in chat.

You should also consider all the uses. The latent to RGB image is tiny, which means a lot of viewers will be upscaling it to fit within their minimum width/height containers, leading to further degraded viewing. It should be a small image, and compressed, but also not just really bad. Civitai does this fors example to display images within their template correctly. Large image thumbnails in win11 which famously blur upscales too.

WASasquatch avatar May 19 '23 05:05 WASasquatch

You mean connecting a optional image to store as the preview? Shouldn't need VAE there. That is probably better than a placeholder image. My idea was an overlay of some basic information, as well as branding like link to repo for exposure since a1111 dominates all

Also I hope you know you are talking to WAS, who proposed this original idea in chat.

You should also consider all the uses. The latent to RGB image is tiny, which means a lot of viewers will be upscaling it to fit within their minimum width/height containers, leading to further degraded viewing. It should be a small image, and compressed, but also not just really bad. Civitai does this fors example to display images within their template correctly. Large image thumbnails in win11 which famously blur upscales too.

If the creator intentionally connects a decoded image and provides it, then it will be used as the preview. If it's not provided, then the intention is to generate a preview by simply pixelating it using latent to RGB.

The reason for making the provision of the image optional is twofold. Firstly, it allows avoiding VAE decoding unless a preview is genuinely needed for identification purposes during the intermediate process. Secondly, it enables the creator to provide high-quality previews if they desire to do so for the purpose of sharing with a large number of users.

And is the suggestion to enhance the pixelated image generated by latent to RGB for the preview by applying post-processing techniques such as blur and upscale?

ltdrdata avatar May 19 '23 06:05 ltdrdata

And is the suggestion to enhance the pixelated image generated by latent to RGB for the preview by applying post-processing techniques such as blur and upscale?

I think a small upscale (simple resize) is all that's needed. Just so other viewers don't apply their horrendous "optimized" upscalers that just make their upscale from thumbnails blurry and jpegy.

WASasquatch avatar May 19 '23 14:05 WASasquatch

I applied the result of latent_to_rgb as the default preview and added a text at the bottom explaining the format called "ComfyUI LATENT." Additionally, I applied the optimize option for slight size optimization. For latent_to_rgb, I limited the size within the range of 128 to 512 for preventing meaningless high-resolution previews or excessively small previews. When intentionally providing image_opt, it is structured in a way that users are responsible for resizing to allow high-resolution output. default-preview

ltdrdata avatar May 19 '23 15:05 ltdrdata

ComfyUI_00004_ latent

Changed the upscale method to nearest-exact in order to achieve a more pronounced feeling of latent rawness.

ltdrdata avatar May 19 '23 15:05 ltdrdata

I was not aware latent can be extracted like that without vae, frankly image looks quite amazing.

morphles avatar May 19 '23 20:05 morphles

I was not aware latent can be extracted like that without vae, frankly image looks quite amazing.

I was the same. It was possible with a very simple code provided by Comfy.

ltdrdata avatar May 20 '23 03:05 ltdrdata

This somehow just convinces me even more that my hi res/multi-sampling idea is good :) I though latients would be something more abstract, and not directly convert-able to pixel values, thus making that multi scale combine some weird as thing.

morphles avatar May 20 '23 07:05 morphles

Mind just making this a plugin that hijacks latent saving in ComfyUI? I don't think Comfy is interested unfortunately.

WASasquatch avatar May 23 '23 03:05 WASasquatch

Simplified based on this https://github.com/comfyanonymous/ComfyUI/pull/713 Remove image_opt from node and reduce size upper bound from 512 to 256.

ComfyUI_00012_ latent ComfyUI_00011_ latent

ltdrdata avatar Jun 08 '23 09:06 ltdrdata