test -H 1024 -W 1024 fail on MacOS(VENTURA) with NDArray > 2**32
Describe your environment
- GPU: mps
- VRAM: unified 128G
- CPU arch: arm
- OS: macOS VENTURA
- Python: Anaconda / Python 3.9.13
- Branch: main / development
Describe the bug I just upgraded my MacOs to Ventura, but I dont know if the issue is related to Ventura because I didnt this test before the upgraded.
NOTE: using --hires_fix the generation start but then I get the same issue
>> python scripts/invoke.py --model sd-1.4 --outdir ../@Stuffs/images/samples
* Initializing, be patient...
NOTE: Redirects are currently not supported in Windows or MacOs.
>> GFPGAN Initialized
>> CodeFormer Initialized
>> ESRGAN Initialized
>> Using device_type mps
>> Loading sd-1.4 from /Users/ivano/Code/Ai/@Stuffs/models/SD/SD-OFFICIAL/sd-v1-4.ckpt
| LatentDiffusion: Running in eps-prediction mode
| DiffusionWrapper has 859.52 M params.
| Making attention of type 'vanilla' with 512 in_channels
| Working with z of shape (1, 4, 32, 32) = 4096 dimensions.
| Making attention of type 'vanilla' with 512 in_channels
| Using more accurate float32 precision
>> Model loaded in 5.28s
>> Setting Sampler to k_lms
* Initialization done! Awaiting your command (-h for help, 'q' to quit)
invoke> test -H 1024 -W 1024
/Users/ivano/Code/Ai/invoke/ldm/modules/embedding_manager.py:155: UserWarning: The operator 'aten::nonzero' is not currently supported on the MPS backend and will fall back to run on the CPU. This may have performance implications. (Triggered internally at /Users/runner/work/_temp/anaconda/conda-bld/pytorch_1659484611838/work/aten/src/ATen/mps/MPSFallback.mm:11.)
placeholder_idx = torch.where(
Generating: 0%| | 0/1 [00:00<?, ?it/s]
>> Sampling with k_lms starting at step 0 of 50 (50 new sampling steps)
/AppleInternal/Library/BuildRoots/f0468ab4-4115-11ed-8edc-7ef33c48bc85/Library/Caches/com.apple.xbs/Sources/MetalPerformanceShaders/MPSCore/Types/MPSNDArray.mm:724: failed
assertion `[MPSNDArray initWithDevice:descriptor:] Error: total bytes of NDArray > 2**32'
| 0/50 [00:00<?, ?it/s]
zsh: abort python scripts/invoke.py --model sd-1.4 --outdir ../@Stuffs/images/samples
/Users/ivano/.miniconda/envs/invoke/lib/python3.9/multiprocessing/resource_tracker.py:216: UserWarning: resource_tracker: There appear to be 1 leaked semaphore objects to clean up at shutdown
warnings.warn('resource_tracker: There appear to be %d '
Additional context
test -H960 -W960 is OK
pytorch 1.12.1 py3.9_0 pytorch
pytorch-lightning 1.7.7 pyhd8ed1ab_0 conda-forge
torchvision 0.13.1 py39_cpu pytorch
Have the same issue, thanks for raising @i3oc9i
I'm running dev + https://github.com/invoke-ai/InvokeAI/pull/1243 w/ 64GB + MacOS Monterrey and I can run it.
@Any-Winter-4079 , I will try your tomorrow dev + #1234, but I guess that the issue is related to VENTURA
@pauloportella
Have the same issue, thanks for raising @i3oc9i
Can you confirm that you are running VENTURA ?
Yes latest, only other difference is that I'm on python 3.10 and I have only 32gb :)
@Vargol if you upgraded to VENTURA can you confirm this issue ?
assertion `[MPSNDArray initWithDevice:descriptor:] Error: total bytes of NDArray > 2**32'
I would've sworn it was 2^31 before. I guess there's some metal changes.
Edit. Yeah, see
Error: product of dimension sizes > 2**31' in https://github.com/invoke-ai/InvokeAI/issues/364
I'm not upgraded yet, I use some software what had known issues with the beta's so not planing to upgrade until its confirmed they now work.
@i3oc9i @pauloportella In general for a temporary fix, I'd check attention.py and model.py (I think) and simply ensure the tensors are not >2^32
Specifically here, for the slice_size(s)

Thinking about it, you could do a PR where you check the OS version, and then use a different slice size.
import platform
platform.platform()
'macOS-12.5.1-arm64-arm-64bit'
Also happens at 64x64.
I'm still getting this crash. M2 Max MBP running Ventura 13.2. The crash occurs immediately on starting a 1024x1024 generation whether or not --hires-fix is on (if on, the 512x512 pass finishes first). Note that generations larger than 1024x1024 work (at least square generations -- I haven't tried non-square). Python crash log.txt
solved with Ventura 13.3 see following comment
https://github.com/invoke-ai/InvokeAI/issues/2444#issuecomment-1485891105