ComfyUI icon indicating copy to clipboard operation
ComfyUI copied to clipboard

AMD Problem

Open ghost opened this issue 3 months ago • 8 comments

Custom Node Testing

Expected Behavior

Einen text to Video Workflow der funktioniert, trotz AMD. A text-to-video workflow that works, despite AMD.

Actual Behavior

Ich habe mir ein Wan 2.2 text to video Model von der Browser-Seite von Comfyui heruntergeladen und erhalte dann diesen Fehler: HIP error: invalid device function HIP kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing AMD_SERIALIZE_KERNEL=3 Compile with TORCH_USE_HIP_DSA to enable device-side assertions.

Habe die AMD portable Version von Comfyui. Bitte um Hilfe! Beim "Load Clip" Feld kann ich nur zwischen "default" und "cpu" wechseln.

In English: I downloaded a WAN 2.2 text-to-video model from the ComfyUI browser page and then I get this error: HIP kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing AMD_SERIALIZE_KERNEL=3 Compile with TORCH_USE_HIP_DSA to enable device-side assertions.

I have the AMD portable version of ComfyUI. Please help! In the "Load Clip" field, I can only switch between "default" and "cpu".

Clip Loader.pdf Fehler.pdf

Steps to Reproduce

??

Debug Logs

??

Other

No response

ghost avatar Nov 02 '25 15:11 ghost

I encounter the same problem. 7900XTX here.

ftwftw0 avatar Nov 10 '25 20:11 ftwftw0

Yup, same problem with 7800XT.

1Mjoelnir1 avatar Nov 22 '25 03:11 1Mjoelnir1

After reinstalling for the third time (Debian, Ubuntu Server, Fedora Server), trying with Python 3.11, 3.12 and 3.13, with ROCm 6.4, then 7.1 and those experimental https://rocm.nightlies.amd.com/v2/gfx1151/. I give up. It doesn't work:

On the ROCm 6.4, I get the: 👇 Click to expand!
got prompt
Using split attention in VAE
Using split attention in VAE
VAE load device: cuda:0, offload device: cpu, dtype: torch.bfloat16
Requested to load FluxClipModel_
loaded completely; 95367431640625005117571072.00 MB usable, 9319.23 MB loaded, full load: True
CLIP/text encoder model load device: cuda:0, offload device: cpu, current: cuda:0, dtype: torch.float16
clip missing: ['text_projection.weight']
!!! Exception during processing !!! HIP error: invalid device function
Search for `hipErrorInvalidDeviceFunction' in https://docs.nvidia.com/cuda/cuda-runtime-api/group__HIPRT__TYPES.html for more information.
HIP kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing AMD_SERIALIZE_KERNEL=3
Compile with `TORCH_USE_HIP_DSA` to enable device-side assertions.

Traceback (most recent call last):
  File "/home/ksmd/ComfyUI/execution.py", line 510, in execute
    output_data, output_ui, has_subgraph, has_pending_tasks = await get_output_data(prompt_id, unique_id, obj, input_data_all, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb, hidden_inputs=hidden_inputs)
                                                              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ksmd/ComfyUI/execution.py", line 324, in get_output_data
    return_values = await _async_map_node_over_list(prompt_id, unique_id, obj, input_data_all, obj.FUNCTION, allow_interrupt=True, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb, hidden_inputs=hidden_inputs)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ksmd/ComfyUI/execution.py", line 298, in _async_map_node_over_list
    await process_inputs(input_dict, i)
  File "/home/ksmd/ComfyUI/execution.py", line 286, in process_inputs
    result = f(**inputs)
  File "/home/ksmd/ComfyUI/nodes.py", line 74, in encode
    return (clip.encode_from_tokens_scheduled(tokens), )
            ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^
  File "/home/ksmd/ComfyUI/comfy/sd.py", line 177, in encode_from_tokens_scheduled
    pooled_dict = self.encode_from_tokens(tokens, return_pooled=return_pooled, return_dict=True)
  File "/home/ksmd/ComfyUI/comfy/sd.py", line 239, in encode_from_tokens
    o = self.cond_stage_model.encode_token_weights(tokens)
  File "/home/ksmd/ComfyUI/comfy/text_encoders/flux.py", line 53, in encode_token_weights
    t5_out, t5_pooled = self.t5xxl.encode_token_weights(token_weight_pairs_t5)
                        ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ksmd/ComfyUI/comfy/sd1_clip.py", line 45, in encode_token_weights
    o = self.encode(to_encode)
  File "/home/ksmd/ComfyUI/comfy/sd1_clip.py", line 291, in encode
    return self(tokens)
  File "/home/ksmd/.miniconda3/envs/comfy-rocm6.4/lib/python3.13/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
  File "/home/ksmd/.miniconda3/envs/comfy-rocm6.4/lib/python3.13/site-packages/torch/nn/modules/module.py", line 1786, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/ksmd/ComfyUI/comfy/sd1_clip.py", line 253, in forward
    embeds, attention_mask, num_tokens, embeds_info = self.process_tokens(tokens, device)
                                                      ~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^
  File "/home/ksmd/ComfyUI/comfy/sd1_clip.py", line 204, in process_tokens
    tokens_embed = self.transformer.get_input_embeddings()(tokens_embed, out_dtype=torch.float32)
  File "/home/ksmd/.miniconda3/envs/comfy-rocm6.4/lib/python3.13/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
  File "/home/ksmd/.miniconda3/envs/comfy-rocm6.4/lib/python3.13/site-packages/torch/nn/modules/module.py", line 1786, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/ksmd/ComfyUI/comfy/ops.py", line 355, in forward
    return self.forward_comfy_cast_weights(*args, **kwargs)
           ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
  File "/home/ksmd/ComfyUI/comfy/ops.py", line 347, in forward_comfy_cast_weights
    x = torch.nn.functional.embedding(input, weight, self.padding_idx, self.max_norm, self.norm_type, self.scale_grad_by_freq, self.sparse).to(dtype=output_dtype)
        ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ksmd/.miniconda3/envs/comfy-rocm6.4/lib/python3.13/site-packages/torch/nn/functional.py", line 2542, in embedding
    return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
           ~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
torch.AcceleratorError: HIP error: invalid device function
Search for `hipErrorInvalidDeviceFunction' in https://docs.nvidia.com/cuda/cuda-runtime-api/group__HIPRT__TYPES.html for more information.
HIP kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing AMD_SERIALIZE_KERNEL=3
Compile with `TORCH_USE_HIP_DSA` to enable device-side assertions.
On the ROCm 7.1, I get the: 👇 Click to expand!
got prompt
Using split attention in VAE
Using split attention in VAE
VAE load device: cuda:0, offload device: cpu, dtype: torch.bfloat16
Requested to load AutoencodingEngine
loaded completely; 88162.40 MB usable, 159.87 MB loaded, full load: True
Using scaled fp8: fp8 matrix mult: False, scale input: False
Requested to load FluxClipModel_
loaded completely; 95367431640625005117571072.00 MB usable, 4903.23 MB loaded, full load: True
CLIP/text encoder model load device: cuda:0, offload device: cpu, current: cuda:0, dtype: torch.float16
clip missing: ['text_projection.weight']
Using scaled fp8: fp8 matrix mult: False, scale input: True
model weight dtype torch.bfloat16, manual cast: None
model_type FLUX
Requested to load Flux
loaded completely; 90128.15 MB usable, 11350.09 MB loaded, full load: True
  0%|                                                                                                       | 0/20 [00:00<?, ?it/s]Kernel Name: attn_fwd
VGPU=0x7f973800de40 SWq=0x7f9941e0e000, HWq=0x7f973c400000, id=1
        Dispatch Header =0xb02 (type=2, barrier=1, acquire=1, release=1), setup=0
        grid=[16768, 24, 1], workgroup=[128, 1, 1]
        private_seg_size=640, group_seg_size=16384
        kernel_obj=0x7f973c5109c0, kernarg_address=0x0x7f973c23c800
        completion_signal=0x0, correlation_id=0
        rptr=4313, wptr=6567
:0:rocdevice.cpp            :3580: 7084711065 us:  Callback: Queue 0x7f973c400000 aborting with error : HSA_STATUS_ERROR_MEMORY_APERTURE_VIOLATION: The agent attempted to access memory beyond the largest legal address. code: 0x29
/home/ksmd/.miniconda3/envs/comfy/lib/python3.12/site-packages/torch/nn/functional.py:2954: UserWarning: HIP warning: an illegal memory access was encountered (Triggered internally at /pytorch/aten/src/ATen/hip/impl/HIPGuardImplMasqueradingAsCUDA.h:83.)
  return torch.rms_norm(input, normalized_shape, weight, eps)
  0%|                                                                                                       | 0/20 [00:02<?, ?it/s]
!!! Exception during processing !!! HIP error: an illegal memory access was encountered
Search for `hipErrorIllegalAddress' in https://rocm.docs.amd.com/projects/HIP/en/latest/index.html for more information.
HIP kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing AMD_SERIALIZE_KERNEL=3
Compile with `TORCH_USE_HIP_DSA` to enable device-side assertions.

Traceback (most recent call last):
  File "/home/ksmd/ComfyUI/execution.py", line 510, in execute
    output_data, output_ui, has_subgraph, has_pending_tasks = await get_output_data(prompt_id, unique_id, obj, input_data_all, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb, hidden_inputs=hidden_inputs)
                                                              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ksmd/ComfyUI/execution.py", line 324, in get_output_data
    return_values = await _async_map_node_over_list(prompt_id, unique_id, obj, input_data_all, obj.FUNCTION, allow_interrupt=True, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb, hidden_inputs=hidden_inputs)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ksmd/ComfyUI/execution.py", line 298, in _async_map_node_over_list
    await process_inputs(input_dict, i)
  File "/home/ksmd/ComfyUI/execution.py", line 286, in process_inputs
    result = f(**inputs)
             ^^^^^^^^^^^
  File "/home/ksmd/ComfyUI/nodes.py", line 1525, in sample
    return common_ksampler(model, seed, steps, cfg, sampler_name, scheduler, positive, negative, latent_image, denoise=denoise)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ksmd/ComfyUI/nodes.py", line 1492, in common_ksampler
    samples = comfy.sample.sample(model, noise, steps, cfg, sampler_name, scheduler, positive, negative, latent_image,
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ksmd/ComfyUI/comfy/sample.py", line 60, in sample
    samples = sampler.sample(noise, positive, negative, cfg=cfg, latent_image=latent_image, start_step=start_step, last_step=last_step, force_full_denoise=force_full_denoise, denoise_mask=noise_mask, sigmas=sigmas, callback=callback, disable_pbar=disable_pbar, seed=seed)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ksmd/ComfyUI/comfy/samplers.py", line 1163, in sample
    return sample(self.model, noise, positive, negative, cfg, self.device, sampler, sigmas, self.model_options, latent_image=latent_image, denoise_mask=denoise_mask, callback=callback, disable_pbar=disable_pbar, seed=seed)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ksmd/ComfyUI/comfy/samplers.py", line 1053, in sample
    return cfg_guider.sample(noise, latent_image, sampler, sigmas, denoise_mask, callback, disable_pbar, seed)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ksmd/ComfyUI/comfy/samplers.py", line 1035, in sample
    output = executor.execute(noise, latent_image, sampler, sigmas, denoise_mask, callback, disable_pbar, seed, latent_shapes=latent_shapes)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ksmd/ComfyUI/comfy/patcher_extension.py", line 112, in execute
    return self.original(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ksmd/ComfyUI/comfy/samplers.py", line 997, in outer_sample
    output = self.inner_sample(noise, latent_image, device, sampler, sigmas, denoise_mask, callback, disable_pbar, seed, latent_shapes=latent_shapes)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ksmd/ComfyUI/comfy/samplers.py", line 980, in inner_sample
    samples = executor.execute(self, sigmas, extra_args, callback, noise, latent_image, denoise_mask, disable_pbar)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ksmd/ComfyUI/comfy/patcher_extension.py", line 112, in execute
    return self.original(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ksmd/ComfyUI/comfy/samplers.py", line 752, in sample
    samples = self.sampler_function(model_k, noise, sigmas, extra_args=extra_args, callback=k_callback, disable=disable_pbar, **self.extra_options)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ksmd/.miniconda3/envs/comfy/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 124, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/home/ksmd/ComfyUI/comfy/k_diffusion/sampling.py", line 199, in sample_euler
    denoised = model(x, sigma_hat * s_in, **extra_args)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ksmd/ComfyUI/comfy/samplers.py", line 401, in __call__
    out = self.inner_model(x, sigma, model_options=model_options, seed=seed)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ksmd/ComfyUI/comfy/samplers.py", line 953, in __call__
    return self.outer_predict_noise(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ksmd/ComfyUI/comfy/samplers.py", line 960, in outer_predict_noise
    ).execute(x, timestep, model_options, seed)
      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ksmd/ComfyUI/comfy/patcher_extension.py", line 112, in execute
    return self.original(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ksmd/ComfyUI/comfy/samplers.py", line 963, in predict_noise
    return sampling_function(self.inner_model, x, timestep, self.conds.get("negative", None), self.conds.get("positive", None), self.cfg, model_options=model_options, seed=seed)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ksmd/ComfyUI/comfy/samplers.py", line 381, in sampling_function
    out = calc_cond_batch(model, conds, x, timestep, model_options)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ksmd/ComfyUI/comfy/samplers.py", line 206, in calc_cond_batch
    return _calc_cond_batch_outer(model, conds, x_in, timestep, model_options)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ksmd/ComfyUI/comfy/samplers.py", line 214, in _calc_cond_batch_outer
    return executor.execute(model, conds, x_in, timestep, model_options)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ksmd/ComfyUI/comfy/patcher_extension.py", line 112, in execute
    return self.original(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ksmd/ComfyUI/comfy/samplers.py", line 326, in _calc_cond_batch
    output = model.apply_model(input_x, timestep_, **c).chunk(batch_chunks)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ksmd/ComfyUI/comfy/model_base.py", line 161, in apply_model
    return comfy.patcher_extension.WrapperExecutor.new_class_executor(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ksmd/ComfyUI/comfy/patcher_extension.py", line 112, in execute
    return self.original(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ksmd/ComfyUI/comfy/model_base.py", line 203, in _apply_model
    model_output = self.diffusion_model(xc, t, context=context, control=control, transformer_options=transformer_options, **extra_conds)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ksmd/.miniconda3/envs/comfy/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1778, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ksmd/.miniconda3/envs/comfy/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1789, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ksmd/ComfyUI/comfy/ldm/flux/model.py", line 244, in forward
    return comfy.patcher_extension.WrapperExecutor.new_class_executor(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ksmd/ComfyUI/comfy/patcher_extension.py", line 112, in execute
    return self.original(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ksmd/ComfyUI/comfy/ldm/flux/model.py", line 290, in _forward
    out = self.forward_orig(img, img_ids, context, txt_ids, timestep, y, guidance, control, transformer_options, attn_mask=kwargs.get("attention_mask", None))
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ksmd/ComfyUI/comfy/ldm/flux/model.py", line 199, in forward_orig
    img = block(img, vec=vec, pe=pe, attn_mask=attn_mask, transformer_options=transformer_options)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ksmd/.miniconda3/envs/comfy/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1778, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ksmd/.miniconda3/envs/comfy/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1789, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ksmd/ComfyUI/comfy/ldm/flux/layers.py", line 286, in forward
    q, k = self.norm(q, k, v)
           ^^^^^^^^^^^^^^^^^^
  File "/home/ksmd/.miniconda3/envs/comfy/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1778, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ksmd/.miniconda3/envs/comfy/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1789, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ksmd/ComfyUI/comfy/ldm/flux/layers.py", line 78, in forward
    k = self.key_norm(k)
        ^^^^^^^^^^^^^^^^
  File "/home/ksmd/.miniconda3/envs/comfy/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1778, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ksmd/.miniconda3/envs/comfy/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1789, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ksmd/ComfyUI/comfy/ldm/flux/layers.py", line 67, in forward
    return comfy.ldm.common_dit.rms_norm(x, self.scale, 1e-6)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ksmd/ComfyUI/comfy/rmsnorm.py", line 21, in rms_norm
    return rms_norm_torch(x, weight.shape, weight=comfy.model_management.cast_to(weight, dtype=x.dtype, device=x.device), eps=eps)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ksmd/.miniconda3/envs/comfy/lib/python3.12/site-packages/torch/nn/functional.py", line 2954, in rms_norm
    return torch.rms_norm(input, normalized_shape, weight, eps)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
torch.AcceleratorError: HIP error: an illegal memory access was encountered
Search for `hipErrorIllegalAddress' in https://rocm.docs.amd.com/projects/HIP/en/latest/index.html for more information.
HIP kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing AMD_SERIALIZE_KERNEL=3
Compile with `TORCH_USE_HIP_DSA` to enable device-side assertions.


Prompt executed in 11.07 seconds
Exception in thread Thread-1 (prompt_worker):
 Traceback (most recent call last):
  File "/home/ksmd/.miniconda3/envs/comfy/lib/python3.12/threading.py", line 1075, in _bootstrap_inner
    self.run()
  File "/home/ksmd/.miniconda3/envs/comfy/lib/python3.12/threading.py", line 1012, in run
    self._target(*self._args, **self._kwargs)
  File "/home/ksmd/ComfyUI/main.py", line 242, in prompt_worker
    comfy.model_management.soft_empty_cache()
  File "/home/ksmd/ComfyUI/comfy/model_management.py", line 1479, in soft_empty_cache
    torch.cuda.empty_cache()
  File "/home/ksmd/.miniconda3/envs/comfy/lib/python3.12/site-packages/torch/cuda/memory.py", line 280, in empty_cache
    torch._C._cuda_emptyCache()
torch.AcceleratorError: HIP error: an illegal memory access was encountered
Search for `hipErrorIllegalAddress' in https://rocm.docs.amd.com/projects/HIP/en/latest/index.html for more information.
HIP kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing AMD_SERIALIZE_KERNEL=3
Compile with `TORCH_USE_HIP_DSA` to enable device-side assertions.

My current setup:

             .',;::::;,'.                 ksmd@rararabox
         .';:cccccccccccc:;,.             --------------
      .;cccccccccccccccccccccc;.          OS: Fedora Linux 43 (Server Edition) x86_64
    .:cccccccccccccccccccccccccc:.        Host: NucBox_EVO-X2 (Version 1.0)
  .;ccccccccccccc;.:dddl:.;ccccccc;.      Kernel: Linux 6.17.8-300.fc43.x86_64
 .:ccccccccccccc;OWMKOOXMWd;ccccccc:.     Uptime: 2 hours, 4 mins
.:ccccccccccccc;KMMc;cc;xMMc;ccccccc:.    Packages: 860 (rpm)
,cccccccccccccc;MMM.;cc;;WW:;cccccccc,    Shell: zsh 5.9
:cccccccccccccc;MMM.;cccccccccccccccc:    Terminal: /dev/pts/1
:ccccccc;oxOOOo;MMM000k.;cccccccccccc:    CPU: AMD RYZEN AI MAX+ 395 (32) @ 5.19 GHz
cccccc;0MMKxdd:;MMMkddc.;cccccccccccc;    GPU: AMD Radeon 8060S Graphics [Integrated]
ccccc;XMO';cccc;MMM.;cccccccccccccccc'    Memory: 3.63 GiB / 30.97 GiB (12%)
ccccc;MMo;ccccc;MMW.;ccccccccccccccc;     Swap: 2.02 GiB / 8.00 GiB (25%)
ccccc;0MNc.ccc.xMMd;ccccccccccccccc;      Disk (/): 331.25 GiB / 1.86 TiB (17%) - xfs
cccccc;dNMWXXXWM0:;cccccccccccccc:,       Disk (/mnt/tmp): 565.36 GiB / 5.46 TiB (10%) - exfat
cccccccc;.:odl:.;cccccccccccccc:,.        Local IP (eno1): 25.10.2.190/24
ccccccccccccccccccccccccccccc:'.          Locale: en_US.UTF-8
:ccccccccccccccccccccccc:;,..
 ':cccccccccccccccc::;,.                                          
                                                                  

k-mktr avatar Nov 24 '25 14:11 k-mktr

I am having the same issue, 7900XTX.

CLIPTextEncode

HIP error: invalid device function HIP kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing AMD_SERIALIZE_KERNEL=3 Compile with TORCH_USE_HIP_DSA to enable device-side assertions.

SemoreZZ avatar Nov 25 '25 02:11 SemoreZZ

I have run into the same detrimental chain of events. At first i started out with my regular distribution Debian 13 Trixie with its standard kernel about a week ago. As for the hint given on the webpages a 6.14 OEM kernel was needed (https://github.com/comfyanonymous/ComfyUI/issues/10859) and on advice by Google Gemini Free i took the Debian 13 Trixie backports kernel 6.16.12 as there was no apt-get install linux-oem-24.04c in my Debian 13 Trixie (AMD obviously thinks that Ubuntu apt packages work nicely on Debian ...) to have it at least from my standard repo consistent with apt. Then i fumbled around a little bit in ComfyUI and i could get ComfyUI working and achieved results like 1024 x 1024 with flux1.schnell-GGUF_Q2_K and with the help of Google Gemini free rendering in about 1 minute nicely and fastly in some basic workflow with GGUF custom nodes for the ClipLoader and the Model and otherwise latent image node, KSampler, two Text Nodes for the positive and negative prompt, VAE Decoder and Image Save. Then i tried it with the RealVisXV5.0 and for some reason i got the idea to still tune my system a little bit. Till that time, i had not installed the AMD GPU drivers for my Morefine M600 AMD Ryzen™ 9 7940HS w/ Radeon™ 780M Graphics × 16, 64GB, shared RAM with GPU (16GB max share according to the BIOS settings) Debian 13 Trixie. My problem was that i could install the backports kernel and then on the standard Debian 13 Trixie /boot partition with 488 MB space only the backports kernel and the previous kernel fitted (GRUB bootloader usually keeps the last two kernels). When i was trying to do/doing the install for the AMD amdgpu and ROCm7.1.0 drivers and libraries as needed for comfyUI, then the amdgpu-dkms package from the Ubuntu repository listed at the AMD site (https://rocm.docs.amd.com/projects/install-on-linux/en/latest/install/quick-start.html) did not install properly. It compiled the drivers for the backports kernel 6.16.12 and put them in the proper library directory, but it could not compile the new initramdisk, as the /boot partition could not accomodate it anymore ("no space left on device"). So, having installed the Ubuntu package https://repo.radeon.com/amdgpu-install/7.1/ubuntu/noble/amdgpu-install_7.1.70100-1_all.deb as told on the AMD website above, i had the configuration that was working for me, that produced on my iGPU AMD Radeon M780 shared VRAM setting quite fastly the 1024 x 1024 images in about 1 minutes time with the basic flux1.schnell-GGUF workflow. Then i also had to repair the apt package manager, because it was bricked with the half-installed amdgpu-dkms package. My final solution was to bracket out the very last passage of the mkinitramfs script before the exit 0 statement in /usr/sbin, that was throwing the error message, that the compiled and compressed initramdisk did not fit on the /boot partition. Like this the apt package manage got unbroken and unbricked, but at the next system boot, i got a kernel panic, because of the broken inintramdisk setting - for BOTH of the system kernels in GRUB. So i had no other choice but running a live image from an old 32 GB USB stick and repairing the /boot partition completely. As i did this, i removed the older kernel completely and put only the backports 6.16.12 kernel back on the /boot partition. Then of course, the 488 MB /boot partition was large enough to accomodate the amdgpu-dkms generated initramdisk and i could finish the install routine suggested by the AMD site. After that, my ComfyUI was finally broken and i let Google Gemini Free trick me into several days of unrewarding work on trying to fix the ComfyUI install using docker, conda, nightly build of PyTorch for ROCm7.1.0, conda ROCm6.4 ... . The working configuration i had used the nightly ROCm7.1.0 build of PyTorch, so ROCm7.1.0 seems to work, but either the amdgpu-dkms modules for 6.16.12 are broken or the amdgpu driver itself is broken and ruins ROCm7.1.0 performance.

Summary: On https://rocm.docs.amd.com/projects/radeon-ryzen/en/latest/docs/install/installryz/native_linux/install-ryzen.html the OEM 6.14 kernel is enforced, but on Debian 13 Trixie no such package exists (maybe in Ubuntu noble it exists ...). Then with the half-install of the amdgpu-dkms, which bricked my apt, i could run ComfyUI successfully and fastly on my hardware. Then after i had unbricked my apt package manager with my hack of the mkinitramfs script, i got a kernel panic with both kernels in the /boot partition. Then in order to have a working computer again i repaired /boot with the standard Debian 13 Trixie live image with Desktop and kept only one kernel in /boot -> the 6.16.12 backports kernel. For that kernel now the amdgpu-dkms package could be run fully and write its drivers into the initramdisk. After that i could not render a single image anymore in ComfyUI. So, it seems the problem is with the AMD GPU drivers, not with ROCm7.1.0. Despite having another severe apt package manager nuclear meltdown/Mahabaratha war/brand war when trying to install different versions or different package versions of ROCm7.1.0 and the generic packages, i got ROCm7.1.0 working perfectly. So, the problem seems to be in the offical AMD GPU driver versions rendered and compiled for the Debian 13 Trixie backports kernel 6.16.12 from about a week ago with the amdgpu-dkms package from the official AMD Ubuntu apt package https://repo.radeon.com/amdgpu-install/7.1/ubuntu/noble/amdgpu-install_7.1.70100-1_all.deb . With the standard Debian AMD GPU drivers it seemingly worked even quite efficiently. My only reason to trigger this whole chain of amdgpu-dkms modules for 6.16.12 was that on some page it was suggested that an 6.14 OEM kernel is needed and the official Debian 13 Trixie kernel is 6.12.3. So maybe the AMD GPU drivers that are INCLUDED by default in the backports kernel 6.16.12 are already doing the job nicely and the amdgpu-dkms compile broke the drivers.

Remark: Ollama has a similar problem using the ROCm7.1.0 interface, even if it is fully installed and working nicely. Ollama can still participate in hardware acceleration by using its experimental Vulkan interface, which is fully implemented in the AMD Radeon M780 iGPU (https://www.techpowerup.com/gpu-specs/radeon-780m.c4020).

ThomasKorimort avatar Nov 26 '25 08:11 ThomasKorimort

@ThomasKorimort is there any way to explain what you said in much simpler terms? Its a little above my expertise.

SemoreZZ avatar Nov 29 '25 00:11 SemoreZZ

Same issue here: running Instinct Mi50. ComfyUI - Stable Video Diffusion.

davidfchuck avatar Dec 03 '25 19:12 davidfchuck

There is a New Portable version for AMD https://github.com/comfyanonymous/ComfyUI Works perfect now on my 7800xt

1Mjoelnir1 avatar Dec 04 '25 08:12 1Mjoelnir1