ComfyScript 请教一下 real 模式下如何进行内存管理，避免输入参数没改变的节点仍然重新推理一遍呢？

我缺乏 Python 或者其它代码开发方面的基础知识，自己搜索了一下，看到有用@cached @lru_cache 等 decorator 去做缓存的，不知道这是否是文档中提到的自己管理缓存的正确方法呢？如果不是，能否麻烦给一下相关方法的关键词我去查一查，谢谢。

另外，感谢一下做这个repo，我之前用 huggingface 的 diffusers ，感觉对于我这种菜鸟来说，里面的 class function 设计的抽象化程度要么有点整要么有点过细，没有 comyui 的颗粒度适合我。之前就在想要是 comfyui 有 library 的话就好了，但是我自己无从下手，没想到真的有人做出来了。

Apr 12 '24 11:04 binarytahr

cache 和 lru_cache 应该是用 == 对新旧参数进行比较的，部分类型可能不支持。目前 real 模式会对 node 输出附加工作流信息到 _self_virtual_output 属性来实现自动保存工作流，理论上可以自定义下 cache，用附加的工作流进行比较。这几天我实现一下吧，现在可以先试试用 cache 或者手写比较。

Apr 12 '24 12:04 Chaoses-Ib

cache 和 lru_cache 应该是用 == 对新旧参数进行比较的，部分类型可能不支持。目前 real 模式会对 node 输出附加工作流信息到 _self_virtual_output 属性来实现自动保存工作流，理论上可以自定义下 cache，用附加的工作流进行比较。这几天我实现一下吧，现在可以先试试用 cache 或者手写比较。

谢谢解答，听起来是通过比较 _self_virtual_output 而非输入参数来实现，我先去学习一下 cache 的具体玩法。

另外，我发现同样的 fp16 模型在 kaggle 的 Tesla T4 GPU 上推理时，RAM VRAM 的占用以及推理所用时长都没 diffusers 表现的好，不知道这是正常的吗，或者是有什么地方我没设置好

Apr 12 '24 12:04 binarytahr

我之前看到的讨论都是说 ComfyUI 比 diffusers 要快的，配置有问题的可能性比较大。用 torch 2.0 以上，再安装下 xformers 可能会有帮助。另外也可以试试下面这些命令行参数，比如 VRAM 足够的话可以用 --highvram：

https://github.com/Chaoses-Ib/ComfyScript/blob/b5ddc484bcda5d7875ec9e00ffdb085fc0dfd0e3/src/comfy_script/runtime/init.py#L72-L147

用法是 load(args=ComfyUIArgs('--force-fp16', '--highvram')) 这样。

Apr 12 '24 12:04 Chaoses-Ib

我之前看到的讨论都是说 ComfyUI 比 diffusers 要快的，配置有问题的可能性比较大。用 torch 2.0 以上，再安装下 xformers 可能会有帮助。另外也可以试试下面这些命令行参数，比如 VRAM 足够的话可以用 --highvram：

https://github.com/Chaoses-Ib/ComfyScript/blob/b5ddc484bcda5d7875ec9e00ffdb085fc0dfd0e3/src/comfy_script/runtime/init.py#L72-L147

用法是 load(args=ComfyUIArgs('--force-fp16', '--highvram')) 这样。

谢谢～是的，我之前用图形界面的 comfyui 也感觉比 diffusers 要快。

我环境里的 torch 是 2.0 以上的，不过我忘记安装 xformers 了，因为 diffusers 在新 torch 里面会自动使用 scaled dot product attention ，就不需要开 xformers 的 flash attention 了。新换到 comfyui 我把 xformers 给忘记了。明天我试试加一下启动参数的效果。

Apr 12 '24 12:04 binarytahr

cache 和 lru_cache 应该是用 == 对新旧参数进行比较的，部分类型可能不支持。目前 real 模式会对 node 输出附加工作流信息到 _self_virtual_output 属性来实现自动保存工作流，理论上可以自定义下 cache，用附加的工作流进行比较。这几天我实现一下吧，现在可以先试试用 cache 或者手写比较。

我查了一下 cache 和 lru_cache ，发现它们都不能用于缓存 list 等可变容器，而 comfyui 很多 node 的输出都是 list，虽然可以自定义 lru_cache 但是仍然会有浅拷贝的问题。

另一个问题是 cache 和 lru_cache 是针对 function 的，在 real 模式下每行都相当于一个 function，想实现 comfyui 那种以 node 为最小单位的自动缓存，好像需要对每行单独包成 function。

我现在只能手动注释掉某行代码，或者在loop中让某些 node 只运行一次。

May 15 '24 10:05 binarytahr

先简单实现了下，还没仔细测试：

所有 node 使用同一 cache：

# Use `dict` (`{}`) for simple unbounded cache. For advanced cache, [cachetools](https://github.com/tkem/cachetools) or other libraries can be used.
cache = {}

with Workflow(cache=cache):
    model, clip, vae = CheckpointLoaderSimple(Checkpoints.sd_v1_4)
    conditioning = CLIPTextEncode('beautiful scenery nature glass bottle landscape, , purple galaxy bottle,', clip)
    conditioning2 = CLIPTextEncode('text, watermark', clip)
    latent = EmptyLatentImage(512, 512, 1)
    latent = KSampler(model, 2, 20, 8, 'euler', 'normal', conditioning, conditioning2, latent, 1)
    image = VAEDecode(latent, vae)
    image = SaveImage(image, 'ComfyUI')
    print(image)

with Workflow(cache=cache):
    model, clip, vae = CheckpointLoaderSimple(Checkpoints.sd_v1_4)
    conditioning = CLIPTextEncode('beautiful scenery nature glass bottle landscape, , purple galaxy bottle,', clip)
    conditioning2 = CLIPTextEncode('text, watermark', clip)
    latent = EmptyLatentImage(512, 512, 1)
    latent = KSampler(model, 2, 20, 8, 'euler', 'normal', conditioning, conditioning2, latent, 1)
    image = VAEDecode(latent, vae)
    image = SaveImage(image, 'ComfyUI')
    print(image)

del cache

每个 node 使用独立 cache：

node_cache = {}
cache = lambda node: node_cache.setdefault(node, {})

with Workflow(cache=cache):
    model, clip, vae = CheckpointLoaderSimple(Checkpoints.sd_v1_4)
    conditioning = CLIPTextEncode('beautiful scenery nature glass bottle landscape, , purple galaxy bottle,', clip)
    conditioning2 = CLIPTextEncode('text, watermark', clip)
    latent = EmptyLatentImage(512, 512, 1)
    latent = KSampler(model, 2, 20, 8, 'euler', 'normal', conditioning, conditioning2, latent, 1)
    image = VAEDecode(latent, vae)
    image = SaveImage(image, 'ComfyUI')
    print(image)

print(*node_cache.keys())
# CheckpointLoaderSimple CLIPTextEncode EmptyLatentImage KSampler VAEDecode SaveImage

with Workflow(cache=cache):
    model, clip, vae = CheckpointLoaderSimple(Checkpoints.sd_v1_4)
    conditioning = CLIPTextEncode('beautiful scenery nature glass bottle landscape, , purple galaxy bottle,', clip)
    conditioning2 = CLIPTextEncode('text, watermark', clip)
    latent = EmptyLatentImage(512, 512, 1)
    latent = KSampler(model, 2, 20, 8, 'euler', 'normal', conditioning, conditioning2, latent, 1)
    image = VAEDecode(latent, vae)
    image = SaveImage(image, 'ComfyUI')
    print(image)

del cache, node_cache

Note that for node output, any changes made by user code instead of nodes will be ignored.

May 15 '24 15:05 Chaoses-Ib

❤️ 非常感谢，你反馈的速度好快啊，我先学习一下你的实现方法

May 16 '24 02:05 binarytahr