InvokeAI test -H 1024 -W 1024 fail on MacOS(VENTURA) with NDArray

Describe your environment

GPU: mps
VRAM: unified 128G
CPU arch: arm
OS: macOS VENTURA
Python: Anaconda / Python 3.9.13
Branch: main / development

Describe the bug I just upgraded my MacOs to Ventura, but I dont know if the issue is related to Ventura because I didnt this test before the upgraded.

NOTE: using --hires_fix the generation start but then I get the same issue

>> python scripts/invoke.py --model sd-1.4 --outdir ../@Stuffs/images/samples
* Initializing, be patient...
NOTE: Redirects are currently not supported in Windows or MacOs.
>> GFPGAN Initialized
>> CodeFormer Initialized
>> ESRGAN Initialized
>> Using device_type mps
>> Loading sd-1.4 from /Users/ivano/Code/Ai/@Stuffs/models/SD/SD-OFFICIAL/sd-v1-4.ckpt
   | LatentDiffusion: Running in eps-prediction mode
   | DiffusionWrapper has 859.52 M params.
   | Making attention of type 'vanilla' with 512 in_channels
   | Working with z of shape (1, 4, 32, 32) = 4096 dimensions.
   | Making attention of type 'vanilla' with 512 in_channels
   | Using more accurate float32 precision
>> Model loaded in 5.28s
>> Setting Sampler to k_lms

* Initialization done! Awaiting your command (-h for help, 'q' to quit)

invoke> test -H 1024 -W 1024
/Users/ivano/Code/Ai/invoke/ldm/modules/embedding_manager.py:155: UserWarning: The operator 'aten::nonzero' is not currently supported on the MPS backend and will fall back to run on the CPU. This may have performance implications. (Triggered internally at  /Users/runner/work/_temp/anaconda/conda-bld/pytorch_1659484611838/work/aten/src/ATen/mps/MPSFallback.mm:11.)
  placeholder_idx = torch.where(
Generating:   0%|         | 0/1 [00:00<?, ?it/s]
>> Sampling with k_lms starting at step 0 of 50 (50 new sampling steps)
                                                                                                                                                                                                                                                                        /AppleInternal/Library/BuildRoots/f0468ab4-4115-11ed-8edc-7ef33c48bc85/Library/Caches/com.apple.xbs/Sources/MetalPerformanceShaders/MPSCore/Types/MPSNDArray.mm:724: failed
assertion `[MPSNDArray initWithDevice:descriptor:] Error: total bytes of NDArray > 2**32'

| 0/50 [00:00<?, ?it/s]

zsh: abort      python scripts/invoke.py --model sd-1.4 --outdir ../@Stuffs/images/samples
/Users/ivano/.miniconda/envs/invoke/lib/python3.9/multiprocessing/resource_tracker.py:216: UserWarning: resource_tracker: There appear to be 1 leaked semaphore objects to clean up at shutdown

  warnings.warn('resource_tracker: There appear to be %d '

Additional context

test -H960 -W960 is OK

pytorch                   1.12.1                  py3.9_0    pytorch
pytorch-lightning         1.7.7              pyhd8ed1ab_0    conda-forge
torchvision               0.13.1                 py39_cpu    pytorch

Oct 25 '22 17:10 i3oc9i

Have the same issue, thanks for raising @i3oc9i

Oct 26 '22 09:10 pauloportella

I'm running dev + https://github.com/invoke-ai/InvokeAI/pull/1243 w/ 64GB + MacOS Monterrey and I can run it.

Oct 26 '22 12:10 Any-Winter-4079

@Any-Winter-4079 , I will try your tomorrow dev + #1234, but I guess that the issue is related to VENTURA

Oct 26 '22 14:10 i3oc9i

@pauloportella

Have the same issue, thanks for raising @i3oc9i

Can you confirm that you are running VENTURA ?

Oct 26 '22 14:10 i3oc9i

Yes latest, only other difference is that I'm on python 3.10 and I have only 32gb :)

Oct 26 '22 14:10 pauloportella

@Vargol if you upgraded to VENTURA can you confirm this issue ?

Oct 26 '22 14:10 i3oc9i

assertion `[MPSNDArray initWithDevice:descriptor:] Error: total bytes of NDArray > 2**32'

I would've sworn it was 2^31 before. I guess there's some metal changes. Edit. Yeah, see Error: product of dimension sizes > 2**31' in https://github.com/invoke-ai/InvokeAI/issues/364

Oct 26 '22 14:10 Any-Winter-4079

I'm not upgraded yet, I use some software what had known issues with the beta's so not planing to upgrade until its confirmed they now work.

Oct 26 '22 14:10 Vargol

@i3oc9i @pauloportella In general for a temporary fix, I'd check attention.py and model.py (I think) and simply ensure the tensors are not >2^32 Specifically here, for the slice_size(s) Screenshot 2022-10-26 at 16 24 51

Oct 26 '22 14:10 Any-Winter-4079

Thinking about it, you could do a PR where you check the OS version, and then use a different slice size.

import platform
platform.platform()

'macOS-12.5.1-arm64-arm-64bit'

Oct 28 '22 01:10 Any-Winter-4079

Also happens at 64x64.

Dec 31 '22 01:12 whosawhatsis

I'm still getting this crash. M2 Max MBP running Ventura 13.2. The crash occurs immediately on starting a 1024x1024 generation whether or not --hires-fix is on (if on, the 512x512 pass finishes first). Note that generations larger than 1024x1024 work (at least square generations -- I haven't tried non-square). Python crash log.txt

Jan 30 '23 12:01 Adreitz

solved with Ventura 13.3 see following comment

https://github.com/invoke-ai/InvokeAI/issues/2444#issuecomment-1485891105

Mar 27 '23 21:03 i3oc9i

test -H 1024 -W 1024 fail on MacOS(VENTURA) with NDArray > 2**32