stabilityai/stable-diffusion-2-1: Errors generating images with Swift Package
My machine is a MacBook Air M1 with 16GB ram
When using Stable Diffusion 2.1 from stabilityai, I get the following error spammed to console, seemingly with each step (but not confirmed):
2022-12-22 17:18:26.961576-0500 Diffusor[13565:536479] [espresso] ANE Batch: 2 of the async requests being waited for returned errors. Only the first of these will be surfaced.
2022-12-22 17:18:26.961636-0500 Diffusor[13565:536479] [espresso] ANE Batch: Async request 10 returned error: code=5 err=Error Domain=com.apple.appleneuralengine Code=5 "processRequest:model:qos:qIndex:modelStringID:options:error:: 0xd: Program Inference overflow" UserInfo={NSLocalizedDescription=processRequest:model:qos:qIndex:modelStringID:options:error:: 0xd: Program Inference overflow}
2022-12-22 17:18:26.961655-0500 Diffusor[13565:536479] [espresso] [Espresso::overflow_error] :9
Then after a while, it sends the following error + stacktrace to console:
2022-12-22 17:19:42.715811-0500 Diffusor[13565:536721] [common] processRequest:model:qos:qIndex:modelStringID:options:error:: ANEProgramProcessRequestDirect() Failed with status=0xf : statusType=0x11 lModel=_ANEModel: { modelURL=file:///Users/turnereison/Developer/stable-diffusion/ml-stable-diffusion/models/Resources/VAEDecoder.mlmodelc/model.mil : key={"isegment":0,"inputs":{"z":{"shape":[96,96,1,4,1]}},"outputs":{"image":{"shape":[768,768,1,3,1]}}} : string_id=0x00000000 : program=_ANEProgramForEvaluation: { programHandle=443484529369 : intermediateBufferHandle=443486389195 : queueDepth=32 } : state=3 : programHandle=443484529369 : intermediateBufferHandle=443486389195 : queueDepth=32 : attr={
ANEFModelDescription = {
ANEFModelProcedures = (
{
ANEFModelInputSymbolIndexArray = (
0
);
ANEFModelOutputSymbolIndexArray = (
0
);
ANEFModelProcedureID = 0;
}
);
kANEFModelInputSymbolsArrayKey = (
z
);
kANEFModelOutputSymbolsArrayKey = (
"image@output"
);
kANEFModelProcedureNameToIDMapKey = {
net = 0;
};
};
NetworkStatusList = (
{
LiveInputList = (
{
BatchStride = 73728;
Batches = 1;
Channels = 4;
Depth = 1;
DepthStride = 73728;
Height = 96;
Interleave = 1;
Name = z;
PlaneCount = 4;
PlaneStride = 18432;
RowStride = 192;
Symbol = z;
Type = Float16;
Width = 96;
}
);
LiveOutputList = (
{
BatchStride = 3538944;
Batches = 1;
Channels = 3;
Depth = 1;
DepthStride = 3538944;
Height = 768;
Interleave = 1;
Name = "image@output";
PlaneCount = 3;
PlaneStride = 1179648;
RowStride = 1536;
Symbol = "image@output";
Type = Float16;
Width = 768;
}
);
Name = net;
}
);
} : perfStatsMask=0}
2022-12-22 17:19:43.720777-0500 Diffusor[13565:536479] -[NSNull featureNames]: unrecognized selector sent to instance 0x20c2556e8
2022-12-22 17:19:43.726682-0500 Diffusor[13565:536479] [General] -[NSNull featureNames]: unrecognized selector sent to instance 0x20c2556e8
2022-12-22 17:19:43.733922-0500 Diffusor[13565:536479] [General] (
0 CoreFoundation 0x00000001b019f3f8 __exceptionPreprocess + 176
1 libobjc.A.dylib 0x00000001afceaea8 objc_exception_throw + 60
2 CoreFoundation 0x00000001b0241c1c -[NSObject(NSObject) __retain_OA] + 0
3 CoreFoundation 0x00000001b0105670 ___forwarding___ + 1600
4 CoreFoundation 0x00000001b0104f70 _CF_forwarding_prep_0 + 96
5 Diffusor 0x00000001009e5968 $s15StableDiffusion7DecoderV6decodeySaySo10CGImageRefaGSay6CoreML13MLShapedArrayVySfGGKFAFSiXEfU1_ + 200
6 Diffusor 0x00000001009e70d0 $s15StableDiffusion7DecoderV6decodeySaySo10CGImageRefaGSay6CoreML13MLShapedArrayVySfGGKFAFSiXEfU1_TA + 28
7 libswiftCore.dylib 0x00000001bdde0038 $sSlsE3mapySayqd__Gqd__7ElementQzKXEKlF + 656
8 Diffusor 0x00000001009e4f24 $s15StableDiffusion7DecoderV6decodeySaySo10CGImageRefaGSay6CoreML13MLShapedArrayVySfGGKF + 792
9 Diffusor 0x00000001009ff6dc $s15StableDiffusion0aB8PipelineV14decodeToImages_13disableSafetySaySo10CGImageRefaSgGSay6CoreML13MLShapedArrayVySfGG_SbtKF + 104
10 Diffusor 0x00000001009fe540 $s15StableDiffusion0aB8PipelineV14generateImages6prompt14negativePrompt10imageCount04stepJ04seed13disableSafety9scheduler15progressHandlerSaySo10CGImageRefaSgGSS_SSS2is6UInt32VSbAA0aB9SchedulerOSbAC8ProgressVXEtKF + 3480
11 Diffusor 0x00000001009de878 $s8Diffusor11ContentViewV4bodyQrvgyycfU0_yyYaYbcfU_TY0_ + 1112
12 Diffusor 0x00000001009dfe8d $s8Diffusor11ContentViewV4bodyQrvgyycfU0_yyYaYbcfU_TATQ0_ + 1
13 Diffusor 0x00000001009e00f9 $sxIeghHr_xs5Error_pIegHrzo_s8SendableRzs5NeverORs_r0_lTRTQ0_ + 1
14 Diffusor 0x00000001009e0249 $sxIeghHr_xs5Error_pIegHrzo_s8SendableRzs5NeverORs_r0_lTRTATQ0_ + 1
15 libswift_Concurrency.dylib 0x00000002397648b5 _ZL23completeTaskWithClosurePN5swift12AsyncContextEPNS_10SwiftErrorE + 1
)
This happens whether using the chunked models or the not chunked one, reduced memory or not.
I can use the converted Stable Diffusion v2.1 models in Swift.
- MBA/M1/8GB memory, macOS 13.2, Xcode 14.2
- Xcode project: https://github.com/ynagatomo/ImgGenSD2
I converted the models with this instruction: % python -m python_coreml_stable_diffusion.torch2coreml --convert-unet --convert-text-encoder --convert-vae-decoder --convert-safety-checker -o sd2CoremlChunked --model-version stabilityai/stable-diffusion-2-1-base --bundle-resources-for-swift-cli --chunk-unet --attention-implementation SPLIT_EINSUM --compute-unit CPU_AND_NE
I had a similar issue using stabilityai/stable-diffusion-2-1 and then realized you need to use the base version. Using stabilityai/stable-diffusion-2-1-base similar to how @ynagatomo suggests works fine on my MacBook Pro M1 Max
What's the difference between the "base" version and normal Stable Diffusion v2.1?
- base: 512x512 image generation
- normal: 768 x768 image generation (needs more working memory)
I'm seeing this too with SD 2.1. Using @ynagatomo's suggestion above I tried generating the model with the --chunk-unet parameter and it does indeed seem to work and produce an image, albeit emitting this line at every step on the console: [espresso] [Espresso::overflow_error] /Users/ptsochantaris/[redacted]/UnetChunk2.mlmodelc/model.mil
Edit: It seems that the best way to run SD 2.0 and SD 2.1 is explicitly using the CPU_AND_GPU option, irrespective of chunking or not, and all errors go away. Otherwise there's something there that the NE doesn't like.
Same for me. Model coreml-stable-diffusion-2-1-base split_einsum for calculations on Neural Engine:
[espresso] [Espresso::overflow_error] /.../StableDiffusionV2.1/UnetChunk2.mlmodelc/model.mil