Conv3D with bfloat16 raises Internal assert failed error on Ascend
Expected Behavior
Use the official workflow When using nn.Conv3d with bfloat16 input on Ascend 910B, I encounter the following internal assertion error during runtime:
Actual Behavior
The Inner error is reported as above. The process exits for this inner error, and the current working operator name is Conv3D.
Since the operator is called asynchronously, the stacktrace may be inaccurate. If you want to get the accurate stacktrace, pleace set the environment variable ASCEND_LAUNCH_BLOCKING=1.
[ERROR] 2025-04-25-00:49:58 (PID:755653, Device:0, RankID:-1) ERR00100 PTA call acl api failed
Steps to Reproduce
Use the models and video workflow provided by the official repository
workflow :[](https://huggingface.co/Comfy-Org/Wan_2.1_ComfyUI_repackaged/blob/main/example%20workflows_Wan2.1/text_to_video_wan.json)
Debug Logs
got prompt
Using pytorch attention in VAE
Using pytorch attention in VAE
VAE load device: npu:0, offload device: cpu, dtype: torch.bfloat16
CLIP/text encoder model load device: npu:0, offload device: cpu, current: cpu, dtype: torch.float16
FETCH ComfyRegistry Data: 60/82
Requested to load WanTEModel
FETCH ComfyRegistry Data: 65/82
FETCH ComfyRegistry Data: 70/82
FETCH ComfyRegistry Data: 75/82
loaded completely 48031.92890625 10835.4765625 True
model weight dtype torch.float16, manual cast: None
model_type FLOW
Requested to load WAN21
FETCH ComfyRegistry Data: 80/82
FETCH ComfyRegistry Data [DONE]
[ComfyUI-Manager] default cache updated: https://api.comfy.org/nodes
FETCH DATA from: https://raw.githubusercontent.com/ltdrdata/ComfyUI-Manager/main/custom-node-list.jsonloaded completely 37135.68952890625 2706.1788330078125 True
0%| | 0/10 [00:00<?, ?it/s]/apps/ComfyUI/comfy/ldm/wan/model.py:423: UserWarning: current tensor is running as_strided, don't perform inplace operations on the returned value. If you encounter this warning and have precision issues, you can try torch.npu.config.allow_internal_format = False to resolve precision issues. (Triggered internally at build/CMakeFiles/torch_npu.dir/compiler_depend.ts:123.)
x = x.flatten(2).transpose(1, 2)
[DONE]
[ComfyUI-Manager] All startup tasks have been completed.
[W425 00:26:06.132293628 compiler_depend.ts:387] Warning: EZ3002: 2025-04-25-00:26:06.885.384 Optype [Conv3D] of Ops kernel [AIcoreEngine] is unsupported. Reason: [tbe-custom]:op type Conv3D is not found in this op store.[tbe-custom]:op type Conv3D is not found in this op store.[Dynamic shape check]: data type DT_FLOAT of input [x] is not supported. All supported data type and format of tensor input0.x is: Data Type: {DT_FLOAT16,DT_INT8,DT_BFLOAT16}Format:{NDC1HWC0,NDC1HWC0,NDC1HWC0}.
Possible Cause: The operator type is unsupported in the operator information library due to specification mismatch.
Solution: Submit an issue to request for support at https://gitee.com/ascend, or remove this type of operators from your model.
TraceBack (most recent call last):
No supported Ops kernel and engine are found for [Conv3D1], optype [Conv3D].
Assert ((SelectEngine(node_ptr, exclude_engines, is_check_support_success, op_info)) == ge::SUCCESS) failed[FUNC:operator()][FILE:engine_place.cc][LINE:144]
build graph failed, graph id:0, ret:-1[FUNC:BuildModelWithGraphId][FILE:ge_generator.cc][LINE:1608]
[Build][SingleOpModel]call ge interface generator.BuildSingleOpModel failed. ge result = 4294967295[FUNC:ReportCallError][FILE:log_inner.cpp][LINE:161]
[Build][Op]Fail to build op model[FUNC:ReportInnerError][FILE:log_inner.cpp][LINE:145]
build op model failed, result = 500002[FUNC:ReportInnerError][FILE:log_inner.cpp][LINE:145]
(function ExecFunc)
0%| | 0/10 [00:01<?, ?it/s]
!!! Exception during processing !!! The Inner error is reported as above. The process exits for this inner error, and the current working operator name is Conv3D.
Since the operator is called asynchronously, the stacktrace may be inaccurate. If you want to get the accurate stacktrace, pleace set the environment variable ASCEND_LAUNCH_BLOCKING=1.
[ERROR] 2025-04-25-00:26:06 (PID:743850, Device:0, RankID:-1) ERR00100 PTA call acl api failed
Traceback (most recent call last):
File "/apps/ComfyUI/execution.py", line 345, in execute
output_data, output_ui, has_subgraph = get_output_data(obj, input_data_all, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb)
File "/apps/ComfyUI/execution.py", line 220, in get_output_data
return_values = _map_node_over_list(obj, input_data_all, obj.FUNCTION, allow_interrupt=True, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb)
File "/apps/ComfyUI/execution.py", line 192, in _map_node_over_list
process_inputs(input_dict, i)
File "/apps/ComfyUI/execution.py", line 181, in process_inputs
results.append(getattr(obj, func)(**inputs))
File "/apps/ComfyUI/nodes.py", line 1522, in sample
return common_ksampler(model, seed, steps, cfg, sampler_name, scheduler, positive, negative, latent_image, denoise=denoise)
File "/apps/ComfyUI/nodes.py", line 1489, in common_ksampler
samples = comfy.sample.sample(model, noise, steps, cfg, sampler_name, scheduler, positive, negative, latent_image,
File "/apps/ComfyUI/comfy/sample.py", line 45, in sample
samples = sampler.sample(noise, positive, negative, cfg=cfg, latent_image=latent_image, start_step=start_step, last_step=last_step, force_full_denoise=force_full_denoise, denoise_mask=noise_mask, sigmas=sigmas, callback=callback, disable_pbar=disable_pbar, seed=seed)
File "/apps/ComfyUI/comfy/samplers.py", line 1133, in sample
return sample(self.model, noise, positive, negative, cfg, self.device, sampler, sigmas, self.model_options, latent_image=latent_image, denoise_mask=denoise_mask, callback=callback, disable_pbar=disable_pbar, seed=seed)
File "/apps/ComfyUI/comfy/samplers.py", line 1023, in sample
return cfg_guider.sample(noise, latent_image, sampler, sigmas, denoise_mask, callback, disable_pbar, seed)
File "/apps/ComfyUI/comfy/samplers.py", line 1008, in sample
output = executor.execute(noise, latent_image, sampler, sigmas, denoise_mask, callback, disable_pbar, seed)
File "/apps/ComfyUI/comfy/patcher_extension.py", line 111, in execute
return self.original(*args, **kwargs)
File "/apps/ComfyUI/comfy/samplers.py", line 976, in outer_sample
output = self.inner_sample(noise, latent_image, device, sampler, sigmas, denoise_mask, callback, disable_pbar, seed)
File "/apps/ComfyUI/comfy/samplers.py", line 959, in inner_sample
samples = executor.execute(self, sigmas, extra_args, callback, noise, latent_image, denoise_mask, disable_pbar)
File "/apps/ComfyUI/comfy/patcher_extension.py", line 111, in execute
return self.original(*args, **kwargs)
File "/apps/ComfyUI/comfy/samplers.py", line 738, in sample
samples = self.sampler_function(model_k, noise, sigmas, extra_args=extra_args, callback=k_callback, disable=disable_pbar, **self.extra_options)
File "/apps/ComfyUI/comfy/extra_samplers/uni_pc.py", line 868, in sample_unipc
x = uni_pc.sample(noise, timesteps=timesteps, skip_type="time_uniform", method="multistep", order=order, lower_order_final=True, callback=callback, disable_pbar=disable)
File "/apps/ComfyUI/comfy/extra_samplers/uni_pc.py", line 715, in sample
model_prev_list = [self.model_fn(x, vec_t)]
File "/apps/ComfyUI/comfy/extra_samplers/uni_pc.py", line 410, in model_fn
return self.data_prediction_fn(x, t)
File "/apps/ComfyUI/comfy/extra_samplers/uni_pc.py", line 394, in data_prediction_fn
noise = self.noise_prediction_fn(x, t)
File "/apps/ComfyUI/comfy/extra_samplers/uni_pc.py", line 388, in noise_prediction_fn
return self.model(x, t)
File "/apps/ComfyUI/comfy/extra_samplers/uni_pc.py", line 329, in model_fn
return noise_pred_fn(x, t_continuous)
File "/apps/ComfyUI/comfy/extra_samplers/uni_pc.py", line 297, in noise_pred_fn
output = model(x, t_input, **model_kwargs)
File "/apps/ComfyUI/comfy/extra_samplers/uni_pc.py", line 859, in <lambda>
lambda input, sigma, **kwargs: predict_eps_sigma(model, input, sigma, **kwargs),
File "/apps/ComfyUI/comfy/extra_samplers/uni_pc.py", line 843, in predict_eps_sigma
return (input - model(input, sigma_in, **kwargs)) / sigma
File "/apps/ComfyUI/comfy/samplers.py", line 390, in __call__
out = self.inner_model(x, sigma, model_options=model_options, seed=seed)
File "/apps/ComfyUI/comfy/samplers.py", line 939, in __call__
return self.predict_noise(*args, **kwargs)
File "/apps/ComfyUI/comfy/samplers.py", line 942, in predict_noise
return sampling_function(self.inner_model, x, timestep, self.conds.get("negative", None), self.conds.get("positive", None), self.cfg, model_options=model_options, seed=seed)
File "/apps/ComfyUI/comfy/samplers.py", line 370, in sampling_function
out = calc_cond_batch(model, conds, x, timestep, model_options)
File "/apps/ComfyUI/comfy/samplers.py", line 206, in calc_cond_batch
return executor.execute(model, conds, x_in, timestep, model_options)
File "/apps/ComfyUI/comfy/patcher_extension.py", line 111, in execute
return self.original(*args, **kwargs)
File "/apps/ComfyUI/comfy/samplers.py", line 319, in _calc_cond_batch
output = model.apply_model(input_x, timestep_, **c).chunk(batch_chunks)
File "/apps/ComfyUI/comfy/model_base.py", line 138, in apply_model
return comfy.patcher_extension.WrapperExecutor.new_class_executor(
File "/apps/ComfyUI/comfy/patcher_extension.py", line 111, in execute
return self.original(*args, **kwargs)
File "/apps/ComfyUI/comfy/model_base.py", line 171, in _apply_model
model_output = self.diffusion_model(xc, t, context=context, control=control, transformer_options=transformer_options, **extra_conds).float()
File "/data/anaconda3/envs/comfyui_3.10.x/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/data/anaconda3/envs/comfyui_3.10.x/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
return forward_call(*args, **kwargs)
File "/apps/ComfyUI/comfy/ldm/wan/model.py", line 474, in forward
return self.forward_orig(x, timestep, context, clip_fea=clip_fea, freqs=freqs, transformer_options=transformer_options)[:, :, :t, :h, :w]
File "/apps/ComfyUI/comfy/ldm/wan/model.py", line 427, in forward_orig
sinusoidal_embedding_1d(self.freq_dim, t).to(dtype=x[0].dtype))
File "/apps/ComfyUI/comfy/ldm/wan/model.py", line 25, in sinusoidal_embedding_1d
position, torch.pow(10000, -torch.arange(half).to(position).div(half)))
RuntimeError: The Inner error is reported as above. The process exits for this inner error, and the current working operator name is Conv3D.
Since the operator is called asynchronously, the stacktrace may be inaccurate. If you want to get the accurate stacktrace, pleace set the environment variable ASCEND_LAUNCH_BLOCKING=1.
[ERROR] 2025-04-25-00:26:06 (PID:743850, Device:0, RankID:-1) ERR00100 PTA call acl api failed
Other
No response