KoboldAI-Client CUDA Error: device-side assert triggered

I just got an RTX 3060 today and have been playing with KoboldAI all day. At some point, I attempted to overclock my GPU using MSI Afterburner with reasonable settings, and now every time I try and generate, I get this error:

C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\ScatterGatherKernel.cu:145: block: [0,0,0], thread: [32,0,0] Assertion `idx_dim >= 0 && idx_dim < index_size && "index out of bounds"` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\ScatterGatherKernel.cu:145: block: [0,0,0], thread: [33,0,0] Assertion `idx_dim >= 0 && idx_dim < index_size && "index out of bounds"` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\ScatterGatherKernel.cu:145: block: [0,0,0], thread: [34,0,0] Assertion `idx_dim >= 0 && idx_dim < index_size && "index out of bounds"` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\ScatterGatherKernel.cu:145: block: [0,0,0], thread: [35,0,0] Assertion `idx_dim >= 0 && idx_dim < index_size && "index out of bounds"` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\ScatterGatherKernel.cu:145: block: [0,0,0], thread: [36,0,0] Assertion `idx_dim >= 0 && idx_dim < index_size && "index out of bounds"` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\ScatterGatherKernel.cu:145: block: [0,0,0], thread: [37,0,0] Assertion `idx_dim >= 0 && idx_dim < index_size && "index out of bounds"` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\ScatterGatherKernel.cu:145: block: [0,0,0], thread: [38,0,0] Assertion `idx_dim >= 0 && idx_dim < index_size && "index out of bounds"` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\ScatterGatherKernel.cu:145: block: [0,0,0], thread: [39,0,0] Assertion `idx_dim >= 0 && idx_dim < index_size && "index out of bounds"` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\ScatterGatherKernel.cu:145: block: [0,0,0], thread: [40,0,0] Assertion `idx_dim >= 0 && idx_dim < index_size && "index out of bounds"` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\ScatterGatherKernel.cu:145: block: [0,0,0], thread: [41,0,0] Assertion `idx_dim >= 0 && idx_dim < index_size && "index out of bounds"` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\ScatterGatherKernel.cu:145: block: [0,0,0], thread: [42,0,0] Assertion `idx_dim >= 0 && idx_dim < index_size && "index out of bounds"` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\ScatterGatherKernel.cu:145: block: [0,0,0], thread: [43,0,0] Assertion `idx_dim >= 0 && idx_dim < index_size && "index out of bounds"` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\ScatterGatherKernel.cu:145: block: [0,0,0], thread: [44,0,0] Assertion `idx_dim >= 0 && idx_dim < index_size && "index out of bounds"` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\ScatterGatherKernel.cu:145: block: [0,0,0], thread: [45,0,0] Assertion `idx_dim >= 0 && idx_dim < index_size && "index out of bounds"` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\ScatterGatherKernel.cu:145: block: [0,0,0], thread: [46,0,0] Assertion `idx_dim >= 0 && idx_dim < index_size && "index out of bounds"` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\ScatterGatherKernel.cu:145: block: [0,0,0], thread: [47,0,0] Assertion `idx_dim >= 0 && idx_dim < index_size && "index out of bounds"` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\ScatterGatherKernel.cu:145: block: [0,0,0], thread: [48,0,0] Assertion `idx_dim >= 0 && idx_dim < index_size && "index out of bounds"` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\ScatterGatherKernel.cu:145: block: [0,0,0], thread: [49,0,0] Assertion `idx_dim >= 0 && idx_dim < index_size && "index out of bounds"` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\ScatterGatherKernel.cu:145: block: [0,0,0], thread: [50,0,0] Assertion `idx_dim >= 0 && idx_dim < index_size && "index out of bounds"` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\ScatterGatherKernel.cu:145: block: [0,0,0], thread: [51,0,0] Assertion `idx_dim >= 0 && idx_dim < index_size && "index out of bounds"` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\ScatterGatherKernel.cu:145: block: [0,0,0], thread: [52,0,0] Assertion `idx_dim >= 0 && idx_dim < index_size && "index out of bounds"` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\ScatterGatherKernel.cu:145: block: [0,0,0], thread: [53,0,0] Assertion `idx_dim >= 0 && idx_dim < index_size && "index out of bounds"` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\ScatterGatherKernel.cu:145: block: [0,0,0], thread: [54,0,0] Assertion `idx_dim >= 0 && idx_dim < index_size && "index out of bounds"` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\ScatterGatherKernel.cu:145: block: [0,0,0], thread: [55,0,0] Assertion `idx_dim >= 0 && idx_dim < index_size && "index out of bounds"` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\ScatterGatherKernel.cu:145: block: [0,0,0], thread: [56,0,0] Assertion `idx_dim >= 0 && idx_dim < index_size && "index out of bounds"` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\ScatterGatherKernel.cu:145: block: [0,0,0], thread: [57,0,0] Assertion `idx_dim >= 0 && idx_dim < index_size && "index out of bounds"` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\ScatterGatherKernel.cu:145: block: [0,0,0], thread: [58,0,0] Assertion `idx_dim >= 0 && idx_dim < index_size && "index out of bounds"` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\ScatterGatherKernel.cu:145: block: [0,0,0], thread: [59,0,0] Assertion `idx_dim >= 0 && idx_dim < index_size && "index out of bounds"` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\ScatterGatherKernel.cu:145: block: [0,0,0], thread: [60,0,0] Assertion `idx_dim >= 0 && idx_dim < index_size && "index out of bounds"` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\ScatterGatherKernel.cu:145: block: [0,0,0], thread: [61,0,0] Assertion `idx_dim >= 0 && idx_dim < index_size && "index out of bounds"` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\ScatterGatherKernel.cu:145: block: [0,0,0], thread: [62,0,0] Assertion `idx_dim >= 0 && idx_dim < index_size && "index out of bounds"` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\ScatterGatherKernel.cu:145: block: [0,0,0], thread: [63,0,0] Assertion `idx_dim >= 0 && idx_dim < index_size && "index out of bounds"` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\ScatterGatherKernel.cu:145: block: [0,0,0], thread: [0,0,0] Assertion `idx_dim >= 0 && idx_dim < index_size && "index out of bounds"` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\ScatterGatherKernel.cu:145: block: [0,0,0], thread: [1,0,0] Assertion `idx_dim >= 0 && idx_dim < index_size && "index out of bounds"` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\ScatterGatherKernel.cu:145: block: [0,0,0], thread: [2,0,0] Assertion `idx_dim >= 0 && idx_dim < index_size && "index out of bounds"` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\ScatterGatherKernel.cu:145: block: [0,0,0], thread: [3,0,0] Assertion `idx_dim >= 0 && idx_dim < index_size && "index out of bounds"` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\ScatterGatherKernel.cu:145: block: [0,0,0], thread: [4,0,0] Assertion `idx_dim >= 0 && idx_dim < index_size && "index out of bounds"` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\ScatterGatherKernel.cu:145: block: [0,0,0], thread: [5,0,0] Assertion `idx_dim >= 0 && idx_dim < index_size && "index out of bounds"` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\ScatterGatherKernel.cu:145: block: [0,0,0], thread: [6,0,0] Assertion `idx_dim >= 0 && idx_dim < index_size && "index out of bounds"` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\ScatterGatherKernel.cu:145: block: [0,0,0], thread: [7,0,0] Assertion `idx_dim >= 0 && idx_dim < index_size && "index out of bounds"` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\ScatterGatherKernel.cu:145: block: [0,0,0], thread: [8,0,0] Assertion `idx_dim >= 0 && idx_dim < index_size && "index out of bounds"` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\ScatterGatherKernel.cu:145: block: [0,0,0], thread: [9,0,0] Assertion `idx_dim >= 0 && idx_dim < index_size && "index out of bounds"` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\ScatterGatherKernel.cu:145: block: [0,0,0], thread: [10,0,0] Assertion `idx_dim >= 0 && idx_dim < index_size && "index out of bounds"` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\ScatterGatherKernel.cu:145: block: [0,0,0], thread: [11,0,0] Assertion `idx_dim >= 0 && idx_dim < index_size && "index out of bounds"` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\ScatterGatherKernel.cu:145: block: [0,0,0], thread: [12,0,0] Assertion `idx_dim >= 0 && idx_dim < index_size && "index out of bounds"` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\ScatterGatherKernel.cu:145: block: [0,0,0], thread: [13,0,0] Assertion `idx_dim >= 0 && idx_dim < index_size && "index out of bounds"` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\ScatterGatherKernel.cu:145: block: [0,0,0], thread: [14,0,0] Assertion `idx_dim >= 0 && idx_dim < index_size && "index out of bounds"` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\ScatterGatherKernel.cu:145: block: [0,0,0], thread: [15,0,0] Assertion `idx_dim >= 0 && idx_dim < index_size && "index out of bounds"` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\ScatterGatherKernel.cu:145: block: [0,0,0], thread: [16,0,0] Assertion `idx_dim >= 0 && idx_dim < index_size && "index out of bounds"` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\ScatterGatherKernel.cu:145: block: [0,0,0], thread: [17,0,0] Assertion `idx_dim >= 0 && idx_dim < index_size && "index out of bounds"` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\ScatterGatherKernel.cu:145: block: [0,0,0], thread: [18,0,0] Assertion `idx_dim >= 0 && idx_dim < index_size && "index out of bounds"` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\ScatterGatherKernel.cu:145: block: [0,0,0], thread: [19,0,0] Assertion `idx_dim >= 0 && idx_dim < index_size && "index out of bounds"` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\ScatterGatherKernel.cu:145: block: [0,0,0], thread: [20,0,0] Assertion `idx_dim >= 0 && idx_dim < index_size && "index out of bounds"` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\ScatterGatherKernel.cu:145: block: [0,0,0], thread: [21,0,0] Assertion `idx_dim >= 0 && idx_dim < index_size && "index out of bounds"` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\ScatterGatherKernel.cu:145: block: [0,0,0], thread: [22,0,0] Assertion `idx_dim >= 0 && idx_dim < index_size && "index out of bounds"` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\ScatterGatherKernel.cu:145: block: [0,0,0], thread: [23,0,0] Assertion `idx_dim >= 0 && idx_dim < index_size && "index out of bounds"` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\ScatterGatherKernel.cu:145: block: [0,0,0], thread: [24,0,0] Assertion `idx_dim >= 0 && idx_dim < index_size && "index out of bounds"` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\ScatterGatherKernel.cu:145: block: [0,0,0], thread: [25,0,0] Assertion `idx_dim >= 0 && idx_dim < index_size && "index out of bounds"` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\ScatterGatherKernel.cu:145: block: [0,0,0], thread: [26,0,0] Assertion `idx_dim >= 0 && idx_dim < index_size && "index out of bounds"` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\ScatterGatherKernel.cu:145: block: [0,0,0], thread: [27,0,0] Assertion `idx_dim >= 0 && idx_dim < index_size && "index out of bounds"` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\ScatterGatherKernel.cu:145: block: [0,0,0], thread: [28,0,0] Assertion `idx_dim >= 0 && idx_dim < index_size && "index out of bounds"` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\ScatterGatherKernel.cu:145: block: [0,0,0], thread: [29,0,0] Assertion `idx_dim >= 0 && idx_dim < index_size && "index out of bounds"` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\ScatterGatherKernel.cu:145: block: [0,0,0], thread: [30,0,0] Assertion `idx_dim >= 0 && idx_dim < index_size && "index out of bounds"` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\ScatterGatherKernel.cu:145: block: [0,0,0], thread: [31,0,0] Assertion `idx_dim >= 0 && idx_dim < index_size && "index out of bounds"` failed.
[2023-07-25 19:33:50,013] ERROR in app: Exception on /api/v1/generate [POST]
Traceback (most recent call last):
  File "B:\python\lib\site-packages\flask\app.py", line 2528, in wsgi_app
    response = self.full_dispatch_request()
  File "B:\python\lib\site-packages\flask\app.py", line 1825, in full_dispatch_request
    rv = self.handle_user_exception(e)
  File "B:\python\lib\site-packages\flask_cors\extension.py", line 176, in wrapped_function
    return cors_after_request(app.make_response(f(*args, **kwargs)))
  File "B:\python\lib\site-packages\flask\app.py", line 1823, in full_dispatch_request
    rv = self.dispatch_request()
  File "B:\python\lib\site-packages\flask\app.py", line 1799, in dispatch_request
    return self.ensure_sync(self.view_functions[rule.endpoint])(**view_args)
  File "aiserver.py", line 843, in g
    return f(*args, **kwargs)
  File "aiserver.py", line 765, in decorated
    response = f(schema, *args, **kwargs)
  File "aiserver.py", line 748, in decorated
    raise e
  File "aiserver.py", line 739, in decorated
    return f(*args, **kwargs)
  File "aiserver.py", line 8443, in post_generate
    return _generate_text(body)
  File "aiserver.py", line 8308, in _generate_text
    genout = apiactionsubmit(body.prompt, use_memory=body.use_memory, use_story=body.use_story, use_world_info=body.use_world_info, use_authors_note=body.use_authors_note)
  File "aiserver.py", line 3572, in apiactionsubmit
    genout = apiactionsubmit_generate(tokens, minimum, maximum)
  File "aiserver.py", line 3463, in apiactionsubmit_generate
    _genout, already_generated = tpool.execute(model.core_generate, txt, set())
  File "B:\python\lib\site-packages\eventlet\tpool.py", line 132, in execute
    six.reraise(c, e, tb)
  File "B:\python\lib\site-packages\six.py", line 719, in reraise
    raise value
  File "B:\python\lib\site-packages\eventlet\tpool.py", line 86, in tworker
    rv = meth(*args, **kwargs)
  File "C:\KoboldAI\modeling\inference_model.py", line 342, in core_generate
    result = self.raw_generate(
  File "C:\KoboldAI\modeling\inference_model.py", line 589, in raw_generate
    result = self._raw_generate(
  File "C:\KoboldAI\modeling\inference_models\hf_torch.py", line 328, in _raw_generate
    genout = self.model.generate(
  File "B:\python\lib\site-packages\torch\utils\_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "B:\python\lib\site-packages\transformers\generation\utils.py", line 1572, in generate
    return self.sample(
  File "C:\KoboldAI\modeling\inference_models\hf_torch.py", line 260, in new_sample
    return new_sample.old_sample(self, *args, **kwargs)
  File "B:\python\lib\site-packages\transformers\generation\utils.py", line 2619, in sample
    outputs = self(
  File "B:\python\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "B:\python\lib\site-packages\accelerate\hooks.py", line 165, in new_forward
    output = old_forward(*args, **kwargs)
  File "B:\python\lib\site-packages\transformers\models\gptj\modeling_gptj.py", line 854, in forward
    transformer_outputs = self.transformer(
  File "B:\python\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "B:\python\lib\site-packages\accelerate\hooks.py", line 165, in new_forward
    output = old_forward(*args, **kwargs)
  File "B:\python\lib\site-packages\transformers\models\gptj\modeling_gptj.py", line 689, in forward
    outputs = block(
  File "B:\python\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "B:\python\lib\site-packages\accelerate\hooks.py", line 165, in new_forward
    output = old_forward(*args, **kwargs)
  File "B:\python\lib\site-packages\transformers\models\gptj\modeling_gptj.py", line 309, in forward
    attn_outputs = self.attn(
  File "B:\python\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "B:\python\lib\site-packages\accelerate\hooks.py", line 165, in new_forward
    output = old_forward(*args, **kwargs)
  File "B:\python\lib\site-packages\transformers\models\gptj\modeling_gptj.py", line 233, in forward
    k_rot = apply_rotary_pos_emb(k_rot, sin, cos)
  File "B:\python\lib\site-packages\transformers\models\gptj\modeling_gptj.py", line 77, in apply_rotary_pos_emb
    sin = torch.repeat_interleave(sin[:, :, None, :], 2, 3)
RuntimeError: CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

I reset my Afterburner to default settings and disabled it on Windows startup, updated my GPU drivers, and even reinstalled KoboldAI dependancies, and rebooted twice, but have had no luck.

Jul 26 '23 00:07 THEWHITEBOY503

Here's my kobold_debug.json on a PasteBin

Jul 26 '23 01:07 THEWHITEBOY503

I can get it to work if I turn the context below 2048. I also updated KoboldAI to the latest version. But, it works super slow (much slower than I used to be getting). When I run a prompt, I get this error before my prompt. I think it has something to do with my DSA error, since it talks about the index being out of bounds. Token indices sequence length is longer than the specified maximum sequence length for this model (1575 > 1024). Running this sequence through the model will result in indexing errors

Jul 27 '23 22:07 THEWHITEBOY503

The last error is normal for some architectures such as GPT-Neo and GPT-J based models. They use the gpt2 tokenizer which only supports up to 1024 tokens while the model supports higher context. So it warns you that its applying a workaround but its fine.

Most models do not support more than 2048 tokens.

Jul 27 '23 22:07 henk717

The last error is normal for some architectures such as GPT-Neo and GPT-J based models. They use the gpt2 tokenizer which only supports up to 1024 tokens while the model supports higher context. So it warns you that its applying a workaround but its fine.

Huh, interesting.

Most models do not support more than 2048 tokens.

Previously I was running like 2300 tokens context and IIRC generating around 20 tokens/second. Now I have to keep it below 2048 and I'm getting 2 tokens/second or less, otherwise it throws an error around 75% in.

Jul 28 '23 14:07 THEWHITEBOY503