[Web] The YOLOv8 segmentation model with batching option is not runing on the GPU ?
Describe the issue
When I tried to run yolov8-seg.onnx with the batching option activated, this error appeared ..
ort-wasm-simd.jsep.js:54 2024-05-17 15:23:15.490199 [W:onnxruntime:, session_state.cc:1162 VerifyEachNodeIsAssignedToAnEp] Some nodes were not assigned to the preferred execution providers which may or may not have an negative impact on performance. e.g. ORT explicitly assigns shape related ops to CPU to improve perf
ort-wasm-simd.jsep.js:54 2024-05-17 15:23:15.492300 [W:onnxruntime:, session_state.cc:1164 VerifyEachNodeIsAssignedToAnEp] Rerunning with verbose output on a non-minimal build will show node assignments.
ERROR_MESSAGE: Non-zero status code returned while running Softmax node. Name:'/model.22/dfl/Softmax' Status Message: Failed to run JSEP kernel
To reproduce
To Produce
from ultralytics import YOLO
# Load a model
model = YOLO('yolov8n.pt') # load an official model
model = YOLO('path/to/best.pt') # load a custom trained model
# Export the model
model.export(format='onnx' ,dynmic=True)
Script
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8" />
<meta name="viewport" content="width=device-width, initial-scale=1.0, user-scalable=no"/>
<title>Yolov8 test</title>
<!-- ONNX RUNTIME -->
<script src="https://cdn.jsdelivr.net/npm/onnxruntime-web/dist/ort.webgpu.min.js"></script>
</head>
<body>
<script>
const modelName = "yolov8n-seg-batching.onnx";
const modelInputShape = [1, 3, 640, 640];
async function testModel() {
let model = await ort.InferenceSession.create(modelName,{ executionProviders: ["webgpu","cpu"] });
const tensor = new ort.Tensor("float32",new Float32Array(modelInputShape.reduce((a, b) => a * b)),modelInputShape);
await model.run({ images: tensor });
console.log(model);
}
testModel();
</script>
</body>
</html>
Urgency
It's somewhat urgent
ONNX Runtime Installation
Built from Source
ONNX Runtime Version or Commit ID
1.16.3
Execution Provider
'webgpu' (WebGPU)
Hi @xadupre,
I exported my model using the following code:
model.export(format='onnx', dynamic=True, simplify=True, opset=12)
However, the exported ONNX model currently runs on the GPU only when the batch size is 1. What should I do if I want to run the ONNX model on the GPU with a batch size greater than 1?
I need your assistance to complete this task. Thank you in advance!
If need opset 12, I recommend using torch.onnx.export. You can set the parameter dynamic_axes with a value similar to dynamic_axes={"x": {0: "my_custom_axis_name"}}.
This issue has been automatically marked as stale due to inactivity and will be closed in 30 days if no further activity occurs. If further support is needed, please provide an update and/or more details.
Really sorry for the late reply! Currently in WebGPU EP implementation, there is a limitation that axis in softmax has to the last. We will soon fix this.