Conv3D with bfloat16 raises Internal assert failed error on Ascend

Open Ading163 opened this issue 9 months ago • 0 comments

Expected Behavior

Use the official workflow When using nn.Conv3d with bfloat16 input on Ascend 910B, I encounter the following internal assertion error during runtime:

Actual Behavior

The Inner error is reported as above. The process exits for this inner error, and the current working operator name is Conv3D. Since the operator is called asynchronously, the stacktrace may be inaccurate. If you want to get the accurate stacktrace, pleace set the environment variable ASCEND_LAUNCH_BLOCKING=1. [ERROR] 2025-04-25-00:49:58 (PID:755653, Device:0, RankID:-1) ERR00100 PTA call acl api failed

Steps to Reproduce

Use the models and video workflow provided by the official repository

workflow ：[](https://huggingface.co/Comfy-Org/Wan_2.1_ComfyUI_repackaged/blob/main/example%20workflows_Wan2.1/text_to_video_wan.json)

Debug Logs

got prompt
Using pytorch attention in VAE
Using pytorch attention in VAE
VAE load device: npu:0, offload device: cpu, dtype: torch.bfloat16
CLIP/text encoder model load device: npu:0, offload device: cpu, current: cpu, dtype: torch.float16
FETCH ComfyRegistry Data: 60/82
Requested to load WanTEModel
FETCH ComfyRegistry Data: 65/82
FETCH ComfyRegistry Data: 70/82
FETCH ComfyRegistry Data: 75/82
loaded completely 48031.92890625 10835.4765625 True
model weight dtype torch.float16, manual cast: None
model_type FLOW
Requested to load WAN21
FETCH ComfyRegistry Data: 80/82
FETCH ComfyRegistry Data [DONE]
[ComfyUI-Manager] default cache updated: https://api.comfy.org/nodes
FETCH DATA from: https://raw.githubusercontent.com/ltdrdata/ComfyUI-Manager/main/custom-node-list.jsonloaded completely 37135.68952890625 2706.1788330078125 True

  0%|          | 0/10 [00:00<?, ?it/s]/apps/ComfyUI/comfy/ldm/wan/model.py:423: UserWarning: current tensor is running as_strided, don't perform inplace operations on the returned value. If you encounter this warning and have precision issues, you can try torch.npu.config.allow_internal_format = False to resolve precision issues. (Triggered internally at build/CMakeFiles/torch_npu.dir/compiler_depend.ts:123.)
  x = x.flatten(2).transpose(1, 2)
 [DONE]
[ComfyUI-Manager] All startup tasks have been completed.
[W425 00:26:06.132293628 compiler_depend.ts:387] Warning: EZ3002: 2025-04-25-00:26:06.885.384 Optype [Conv3D] of Ops kernel [AIcoreEngine] is unsupported. Reason: [tbe-custom]:op type Conv3D is not found in this op store.[tbe-custom]:op type Conv3D is not found in this op store.[Dynamic shape check]: data type DT_FLOAT of input [x] is not supported. All supported data type and format of tensor input0.x is: Data Type: {DT_FLOAT16,DT_INT8,DT_BFLOAT16}Format:{NDC1HWC0,NDC1HWC0,NDC1HWC0}.
        Possible Cause: The operator type is unsupported in the operator information library due to specification mismatch.
        Solution: Submit an issue to request for support at https://gitee.com/ascend, or remove this type of operators from your model.
        TraceBack (most recent call last):
        No supported Ops kernel and engine are found for [Conv3D1], optype [Conv3D].
        Assert ((SelectEngine(node_ptr, exclude_engines, is_check_support_success, op_info)) == ge::SUCCESS) failed[FUNC:operator()][FILE:engine_place.cc][LINE:144]
        build graph failed, graph id:0, ret:-1[FUNC:BuildModelWithGraphId][FILE:ge_generator.cc][LINE:1608]
        [Build][SingleOpModel]call ge interface generator.BuildSingleOpModel failed. ge result = 4294967295[FUNC:ReportCallError][FILE:log_inner.cpp][LINE:161]
        [Build][Op]Fail to build op model[FUNC:ReportInnerError][FILE:log_inner.cpp][LINE:145]
        build op model failed, result = 500002[FUNC:ReportInnerError][FILE:log_inner.cpp][LINE:145]
 (function ExecFunc)

  0%|          | 0/10 [00:01<?, ?it/s]
!!! Exception during processing !!! The Inner error is reported as above. The process exits for this inner error, and the current working operator name is Conv3D.
Since the operator is called asynchronously, the stacktrace may be inaccurate. If you want to get the accurate stacktrace, pleace set the environment variable ASCEND_LAUNCH_BLOCKING=1.
[ERROR] 2025-04-25-00:26:06 (PID:743850, Device:0, RankID:-1) ERR00100 PTA call acl api failed
Traceback (most recent call last):
  File "/apps/ComfyUI/execution.py", line 345, in execute
    output_data, output_ui, has_subgraph = get_output_data(obj, input_data_all, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb)
  File "/apps/ComfyUI/execution.py", line 220, in get_output_data
    return_values = _map_node_over_list(obj, input_data_all, obj.FUNCTION, allow_interrupt=True, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb)
  File "/apps/ComfyUI/execution.py", line 192, in _map_node_over_list
    process_inputs(input_dict, i)
  File "/apps/ComfyUI/execution.py", line 181, in process_inputs
    results.append(getattr(obj, func)(**inputs))
  File "/apps/ComfyUI/nodes.py", line 1522, in sample
    return common_ksampler(model, seed, steps, cfg, sampler_name, scheduler, positive, negative, latent_image, denoise=denoise)
  File "/apps/ComfyUI/nodes.py", line 1489, in common_ksampler
    samples = comfy.sample.sample(model, noise, steps, cfg, sampler_name, scheduler, positive, negative, latent_image,
  File "/apps/ComfyUI/comfy/sample.py", line 45, in sample
    samples = sampler.sample(noise, positive, negative, cfg=cfg, latent_image=latent_image, start_step=start_step, last_step=last_step, force_full_denoise=force_full_denoise, denoise_mask=noise_mask, sigmas=sigmas, callback=callback, disable_pbar=disable_pbar, seed=seed)
  File "/apps/ComfyUI/comfy/samplers.py", line 1133, in sample
    return sample(self.model, noise, positive, negative, cfg, self.device, sampler, sigmas, self.model_options, latent_image=latent_image, denoise_mask=denoise_mask, callback=callback, disable_pbar=disable_pbar, seed=seed)
  File "/apps/ComfyUI/comfy/samplers.py", line 1023, in sample
    return cfg_guider.sample(noise, latent_image, sampler, sigmas, denoise_mask, callback, disable_pbar, seed)
  File "/apps/ComfyUI/comfy/samplers.py", line 1008, in sample
    output = executor.execute(noise, latent_image, sampler, sigmas, denoise_mask, callback, disable_pbar, seed)
  File "/apps/ComfyUI/comfy/patcher_extension.py", line 111, in execute
    return self.original(*args, **kwargs)
  File "/apps/ComfyUI/comfy/samplers.py", line 976, in outer_sample
    output = self.inner_sample(noise, latent_image, device, sampler, sigmas, denoise_mask, callback, disable_pbar, seed)
  File "/apps/ComfyUI/comfy/samplers.py", line 959, in inner_sample
    samples = executor.execute(self, sigmas, extra_args, callback, noise, latent_image, denoise_mask, disable_pbar)
  File "/apps/ComfyUI/comfy/patcher_extension.py", line 111, in execute
    return self.original(*args, **kwargs)
  File "/apps/ComfyUI/comfy/samplers.py", line 738, in sample
    samples = self.sampler_function(model_k, noise, sigmas, extra_args=extra_args, callback=k_callback, disable=disable_pbar, **self.extra_options)
  File "/apps/ComfyUI/comfy/extra_samplers/uni_pc.py", line 868, in sample_unipc
    x = uni_pc.sample(noise, timesteps=timesteps, skip_type="time_uniform", method="multistep", order=order, lower_order_final=True, callback=callback, disable_pbar=disable)
  File "/apps/ComfyUI/comfy/extra_samplers/uni_pc.py", line 715, in sample
    model_prev_list = [self.model_fn(x, vec_t)]
  File "/apps/ComfyUI/comfy/extra_samplers/uni_pc.py", line 410, in model_fn
    return self.data_prediction_fn(x, t)
  File "/apps/ComfyUI/comfy/extra_samplers/uni_pc.py", line 394, in data_prediction_fn
    noise = self.noise_prediction_fn(x, t)
  File "/apps/ComfyUI/comfy/extra_samplers/uni_pc.py", line 388, in noise_prediction_fn
    return self.model(x, t)
  File "/apps/ComfyUI/comfy/extra_samplers/uni_pc.py", line 329, in model_fn
    return noise_pred_fn(x, t_continuous)
  File "/apps/ComfyUI/comfy/extra_samplers/uni_pc.py", line 297, in noise_pred_fn
    output = model(x, t_input, **model_kwargs)
  File "/apps/ComfyUI/comfy/extra_samplers/uni_pc.py", line 859, in <lambda>
    lambda input, sigma, **kwargs: predict_eps_sigma(model, input, sigma, **kwargs),
  File "/apps/ComfyUI/comfy/extra_samplers/uni_pc.py", line 843, in predict_eps_sigma
    return  (input - model(input, sigma_in, **kwargs)) / sigma
  File "/apps/ComfyUI/comfy/samplers.py", line 390, in __call__
    out = self.inner_model(x, sigma, model_options=model_options, seed=seed)
  File "/apps/ComfyUI/comfy/samplers.py", line 939, in __call__
    return self.predict_noise(*args, **kwargs)
  File "/apps/ComfyUI/comfy/samplers.py", line 942, in predict_noise
    return sampling_function(self.inner_model, x, timestep, self.conds.get("negative", None), self.conds.get("positive", None), self.cfg, model_options=model_options, seed=seed)
  File "/apps/ComfyUI/comfy/samplers.py", line 370, in sampling_function
    out = calc_cond_batch(model, conds, x, timestep, model_options)
  File "/apps/ComfyUI/comfy/samplers.py", line 206, in calc_cond_batch
    return executor.execute(model, conds, x_in, timestep, model_options)
  File "/apps/ComfyUI/comfy/patcher_extension.py", line 111, in execute
    return self.original(*args, **kwargs)
  File "/apps/ComfyUI/comfy/samplers.py", line 319, in _calc_cond_batch
    output = model.apply_model(input_x, timestep_, **c).chunk(batch_chunks)
  File "/apps/ComfyUI/comfy/model_base.py", line 138, in apply_model
    return comfy.patcher_extension.WrapperExecutor.new_class_executor(
  File "/apps/ComfyUI/comfy/patcher_extension.py", line 111, in execute
    return self.original(*args, **kwargs)
  File "/apps/ComfyUI/comfy/model_base.py", line 171, in _apply_model
    model_output = self.diffusion_model(xc, t, context=context, control=control, transformer_options=transformer_options, **extra_conds).float()
  File "/data/anaconda3/envs/comfyui_3.10.x/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/data/anaconda3/envs/comfyui_3.10.x/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
    return forward_call(*args, **kwargs)
  File "/apps/ComfyUI/comfy/ldm/wan/model.py", line 474, in forward
    return self.forward_orig(x, timestep, context, clip_fea=clip_fea, freqs=freqs, transformer_options=transformer_options)[:, :, :t, :h, :w]
  File "/apps/ComfyUI/comfy/ldm/wan/model.py", line 427, in forward_orig
    sinusoidal_embedding_1d(self.freq_dim, t).to(dtype=x[0].dtype))
  File "/apps/ComfyUI/comfy/ldm/wan/model.py", line 25, in sinusoidal_embedding_1d
    position, torch.pow(10000, -torch.arange(half).to(position).div(half)))
RuntimeError: The Inner error is reported as above. The process exits for this inner error, and the current working operator name is Conv3D.
Since the operator is called asynchronously, the stacktrace may be inaccurate. If you want to get the accurate stacktrace, pleace set the environment variable ASCEND_LAUNCH_BLOCKING=1.
[ERROR] 2025-04-25-00:26:06 (PID:743850, Device:0, RankID:-1) ERR00100 PTA call acl api failed

Other

No response

Apr 24 '25 16:04 Ading163