Kokoro-FastAPI icon indicating copy to clipboard operation
Kokoro-FastAPI copied to clipboard

gpu docker image "exec format error" on ARM64 with Nvidia Card

Open rfhold opened this issue 3 months ago • 5 comments

Describe the bug The kokoro-fastapi-gpu images (latest and v0.2.4) both log exec /opt/nvidia/nvidia_entrypoint.sh: exec format error on launch and exit. There was a time when I was able to build this project using torch 2.6.0 to get it to work, it looks like newer pytorch is not as friendly to ARM

Screenshots or console output docker logs are

exec /opt/nvidia/nvidia_entrypoint.sh: exec format error

Branch / Deployment used I've started trying to build the latest master branch but am having little luck. the latest and v0.2.4 docker images were tried.

Operating System K8s, Nvidia device plugin, Nvidia container runtime, Ampere Altra CPU, RTX A6000

Additional context Add any other context about the problem here.

rfhold avatar Oct 22 '25 04:10 rfhold

I'm seeing the same issue when running this on a dgx spark.

chris4prez avatar Nov 01 '25 14:11 chris4prez

I'm seeing the same issue when running this on a dgx spark.

Does my image work for you? https://github.com/users/rfhold/packages/container/package/kokoro-fastapi-gpu. built from https://github.com/remsky/Kokoro-FastAPI/pull/403

rfhold avatar Nov 01 '25 21:11 rfhold

I'm seeing the same issue when running this on a dgx spark.

Does my image work for you? https://github.com/users/rfhold/packages/container/package/kokoro-fastapi-gpu. built from #403

It got me further, but now seeing this:

ERROR | main:70 | Failed to initialize model: Warmup failed: Failed to load model: Failed to load Kokoro model: CUDA error: no kernel image is available for execution on the device

Search for `cudaErrorNoKernelImageForDevice' in https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html for more information.

CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.

For debugging consider passing CUDA_LAUNCH_BLOCKING=1

Compile with TORCH_USE_CUDA_DSA to enable device-side assert

chris4prez avatar Nov 01 '25 22:11 chris4prez

DGX probably needs to have a newer version of pytorch. I hope that it's existence helps make CUDA on ARM a little more standard

rfhold avatar Nov 01 '25 22:11 rfhold

Also seeing this same issue on DGX spark as of 2025-11-24.

Exonfang avatar Nov 24 '25 05:11 Exonfang