InvokeAI [bug]: RuntimeError: CUDA error: device-side assert triggered

Is there an existing issue for this?

[X] I have searched the existing issues

OS

Windows

GPU

cuda

VRAM

4GB

What happened?

Making a new iteration on a custom model, using the k_euler_a and k_dpmpp_2_a samplers, length of prompt ~476 characters. The error states the prompt is too big, but i have used this prompt before without problems. I have updated to the last InvokeAI version 2.2.4, i did this using the manual git pull method, and then running the reconfigure script.

Startup command: python scripts/invoke.py --web --no-nsfw_checker --model swpunk

>> Setting Sampler to k_euler_a
>> Prompt is 6 token(s) too long and has been truncated
>> Prompt is 2 token(s) too long and has been truncated
Generating:   0%|                                                                                | 0/1 [00:00<?, ?it/s]>> Ksampler using model noise schedule (steps >= 30)
>> Sampling with k_euler_ancestral starting at step 0 of 32 (32 new sampling steps)
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\IndexKernel.cu:91: block: [6,0,0], thread: [1,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\IndexKernel.cu:91: block: [6,0,0], thread: [2,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\IndexKernel.cu:91: block: [6,0,0], thread: [3,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\IndexKernel.cu:91: block: [6,0,0], thread: [4,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\IndexKernel.cu:91: block: [6,0,0], thread: [5,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\IndexKernel.cu:91: block: [8,0,0], thread: [32,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\IndexKernel.cu:91: block: [8,0,0], thread: [33,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\IndexKernel.cu:91: block: [8,0,0], thread: [34,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\IndexKernel.cu:91: block: [8,0,0], thread: [64,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\IndexKernel.cu:91: block: [8,0,0], thread: [65,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\IndexKernel.cu:91: block: [8,0,0], thread: [66,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\IndexKernel.cu:91: block: [8,0,0], thread: [67,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\IndexKernel.cu:91: block: [8,0,0], thread: [68,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\IndexKernel.cu:91: block: [13,0,0], thread: [64,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\IndexKernel.cu:91: block: [13,0,0], thread: [65,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\IndexKernel.cu:91: block: [13,0,0], thread: [66,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\IndexKernel.cu:91: block: [18,0,0], thread: [96,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\IndexKernel.cu:91: block: [18,0,0], thread: [97,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\IndexKernel.cu:91: block: [18,0,0], thread: [98,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\IndexKernel.cu:91: block: [10,0,0], thread: [96,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\IndexKernel.cu:91: block: [10,0,0], thread: [97,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\IndexKernel.cu:91: block: [13,0,0], thread: [96,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\IndexKernel.cu:91: block: [13,0,0], thread: [97,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\IndexKernel.cu:91: block: [13,0,0], thread: [98,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\IndexKernel.cu:91: block: [13,0,0], thread: [99,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\IndexKernel.cu:91: block: [13,0,0], thread: [100,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.

These errors go on for 1308 lines in total. And then this following exception is thrown:

Traceback (most recent call last):
  File "d:\ai\invokeai\ldm\generate.py", line 492, in prompt2image
    results = generator.generate(
  File "d:\ai\invokeai\ldm\invoke\generator\base.py", line 98, in generate
    image = make_image(x_T)
  File "C:\Users\username\anaconda3\envs\invokeai\lib\site-packages\torch\autograd\grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "d:\ai\invokeai\ldm\invoke\generator\txt2img.py", line 42, in make_image
    samples, _ = sampler.sample(
  File "C:\Users\username\anaconda3\envs\invokeai\lib\site-packages\torch\autograd\grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "d:\ai\invokeai\ldm\models\diffusion\ksampler.py", line 226, in sample
    K.sampling.__dict__[f'sample_{self.schedule}'](
  File "C:\Users\username\anaconda3\envs\invokeai\lib\site-packages\torch\autograd\grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "C:\Users\username\anaconda3\envs\invokeai\lib\site-packages\k_diffusion\sampling.py", line 145, in sample_euler_ancestral
    denoised = model(x, sigmas[i] * s_in, **extra_args)
  File "C:\Users\username\anaconda3\envs\invokeai\lib\site-packages\torch\nn\modules\module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "d:\ai\invokeai\ldm\models\diffusion\ksampler.py", line 52, in forward
    next_x = self.invokeai_diffuser.do_diffusion_step(x, sigma, uncond, cond, cond_scale)
  File "d:\ai\invokeai\ldm\models\diffusion\shared_invokeai_diffusion.py", line 107, in do_diffusion_step
    unconditioned_next_x, conditioned_next_x = self.apply_standard_conditioning(x, sigma, unconditioning, conditioning)
  File "d:\ai\invokeai\ldm\models\diffusion\shared_invokeai_diffusion.py", line 123, in apply_standard_conditioning
    unconditioned_next_x, conditioned_next_x = self.model_forward_callback(x_twice, sigma_twice,
  File "d:\ai\invokeai\ldm\models\diffusion\ksampler.py", line 38, in <lambda>
    model_forward_callback=lambda x, sigma, cond: self.inner_model(x, sigma, cond=cond))
  File "C:\Users\username\anaconda3\envs\invokeai\lib\site-packages\torch\nn\modules\module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "C:\Users\username\anaconda3\envs\invokeai\lib\site-packages\k_diffusion\external.py", line 114, in forward
    eps = self.get_eps(input * c_in, self.sigma_to_t(sigma), **kwargs)
  File "C:\Users\username\anaconda3\envs\invokeai\lib\site-packages\k_diffusion\external.py", line 140, in get_eps
    return self.inner_model.apply_model(*args, **kwargs)
  File "d:\ai\invokeai\ldm\models\diffusion\ddpm.py", line 1441, in apply_model
    x_recon = self.model(x_noisy, t, **cond)
  File "C:\Users\username\anaconda3\envs\invokeai\lib\site-packages\torch\nn\modules\module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "d:\ai\invokeai\ldm\models\diffusion\ddpm.py", line 2167, in forward
    out = self.diffusion_model(x, t, context=cc)
  File "C:\Users\username\anaconda3\envs\invokeai\lib\site-packages\torch\nn\modules\module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "d:\ai\invokeai\ldm\modules\diffusionmodules\openaimodel.py", line 806, in forward
    h = module(h, emb, context)
  File "C:\Users\username\anaconda3\envs\invokeai\lib\site-packages\torch\nn\modules\module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "d:\ai\invokeai\ldm\modules\diffusionmodules\openaimodel.py", line 88, in forward
    x = layer(x, context)
  File "C:\Users\username\anaconda3\envs\invokeai\lib\site-packages\torch\nn\modules\module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "d:\ai\invokeai\ldm\modules\attention.py", line 271, in forward
    x = block(x, context=context)
  File "C:\Users\username\anaconda3\envs\invokeai\lib\site-packages\torch\nn\modules\module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "d:\ai\invokeai\ldm\modules\attention.py", line 221, in forward
    return checkpoint(self._forward, (x, context), self.parameters(), self.checkpoint)
  File "d:\ai\invokeai\ldm\modules\diffusionmodules\util.py", line 159, in checkpoint
    return func(*inputs)
  File "d:\ai\invokeai\ldm\modules\attention.py", line 226, in _forward
    x += self.attn2(self.norm2(x.clone()), context=context)
  File "C:\Users\username\anaconda3\envs\invokeai\lib\site-packages\torch\nn\modules\module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "d:\ai\invokeai\ldm\modules\attention.py", line 199, in forward
    r = self.get_invokeai_attention_mem_efficient(q, k, v)
  File "d:\ai\invokeai\ldm\models\diffusion\cross_attention_control.py", line 291, in get_invokeai_attention_mem_efficient
    return self.einsum_op_cuda(q, k, v)
  File "d:\ai\invokeai\ldm\models\diffusion\cross_attention_control.py", line 285, in einsum_op_cuda
    return self.einsum_op_tensor_mem(q, k, v, mem_free_total / 3.3 / (1 << 20))
  File "d:\ai\invokeai\ldm\models\diffusion\cross_attention_control.py", line 264, in einsum_op_tensor_mem
    return self.einsum_lowest_level(q, k, v, None, None, None)
  File "d:\ai\invokeai\ldm\models\diffusion\cross_attention_control.py", line 229, in einsum_lowest_level
    self.attention_slice_calculated_callback(attention_slice, dim, offset, slice_size)
  File "d:\ai\invokeai\ldm\models\diffusion\shared_invokeai_diffusion.py", line 69, in <lambda>
    lambda slice, dim, offset, slice_size, key=key: callback(slice, dim, offset, slice_size, key))
  File "d:\ai\invokeai\ldm\models\diffusion\shared_invokeai_diffusion.py", line 61, in callback
    saver.add_attention_maps(slice, key)
  File "d:\ai\invokeai\ldm\models\diffusion\cross_attention_map_saving.py", line 39, in add_attention_maps
    self.collated_maps[key_and_size] += maps.cpu()
RuntimeError: CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.

>> Could not generate image.

Screenshots

No response

Additional context

No response

Contact Details

No response

Dec 14 '22 12:12 TheBarret

Been testing it on the command line to debug the tokens, it did go over (way over) 77 tokens, but apart from that i never seen such an error as reported above.

Then here it did go wrong

"(snthwve style)+ (nvinkpunk)- a drunk beautiful woman as delirium from sandman, (hallucinating colorful soap bubbles)+, by jeremy mann, by sandra chevrier, by dave mckean and richard avedon and maciej kuciara, punk rock, tank girl, high detailed, 8k, sharp focus, natural lighting, subsurface scattering, F2, 35mm" -s 32 -W 512 -H 512 -C 7 -A k_euler_a --log_tokenization
>> Parsed prompt to FlattenedPrompt:[Fragment:'snthwve style'@1.1, Fragment:'nvinkpunk'@0.9, Fragment:'a drunk beautiful woman as delirium from sandman,'@1.0, Fragment:'hallucinating colorful soap bubbles'@1.1, Fragment:', by jeremy mann, by sandra chevrier, by dave mckean and richard avedon and maciej kuciara, punk rock, tank girl, high detailed, 8k, sharp focus, natural lighting, subsurface scattering, F2, 35mm'@1.0]
>> Parsed negative prompt to FlattenedPrompt:[Fragment:''@1.0]
>> Prompt is 3 token(s) too long and has been truncated

>> Tokens (prompt) (77):
snthwve style nvinkpunk a drunk beautiful woman as delirium from sandman , hallucinating colorful soap bubbles , by jeremy mann , by sandra chevrier , by dave mckean and richard avedon and maciej kuciara , punk rock , tank girl , high detailed , 8 k , sharp focus , natural lighting , subsurface scattering , f 2 , 3 5
>> Tokens Discarded (1):
mm

>> Tokens (unconditioning) (0):

Generating:   0%|                                                                                                       | 0/1 [00:00<?, ?it/s]>> Ksampler using model noise schedule (steps >= 30)
>> Sampling with k_euler_ancestral starting at step 0 of 32 (32 new sampling steps)
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\IndexKernel.cu:91: block: [30,0,0], thread: [32,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
...(same errors as shown before)...
...(stacktrace as shown before)...
RuntimeError: CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.

So the prompt is too long, it will discard tokens but when you add multiple (...) functions it goes off the rails completely. This goes for all samplers, so it resides in the tokenization process/parsing if i had to guess.

Dec 14 '22 13:12 TheBarret

This is basically the same issue mentioned here #1908 there is a pull request in for a "fix", which is basically just to exit out before calling the function that is failing. I have instructions in this issue to basically do the same thing until the pull request is added to the main repo.

Dec 14 '22 20:12 JamesDooley

Fixed in #1999

Dec 17 '22 04:12 hipsterusername