onnxruntime [Web] The YOLOv8 segmentation model with batching option is not runing on the GPU ?

Describe the issue

When I tried to run yolov8-seg.onnx with the batching option activated, this error appeared ..

ort-wasm-simd.jsep.js:54 2024-05-17 15:23:15.490199 [W:onnxruntime:, session_state.cc:1162 VerifyEachNodeIsAssignedToAnEp] Some nodes were not assigned to the preferred execution providers which may or may not have an negative impact on performance. e.g. ORT explicitly assigns shape related ops to CPU to improve perf

ort-wasm-simd.jsep.js:54 2024-05-17 15:23:15.492300 [W:onnxruntime:, session_state.cc:1164 VerifyEachNodeIsAssignedToAnEp] Rerunning with verbose output on a non-minimal build will show node assignments.

ERROR_MESSAGE: Non-zero status code returned while running Softmax node. Name:'/model.22/dfl/Softmax' Status Message: Failed to run JSEP kernel

To reproduce

To Produce

from ultralytics import YOLO

# Load a model
model = YOLO('yolov8n.pt')  # load an official model
model = YOLO('path/to/best.pt')  # load a custom trained model

# Export the model
model.export(format='onnx' ,dynmic=True)

Here'is the mdel

Script

<!DOCTYPE html>
<html lang="en">
  <head>
    <meta charset="UTF-8" />
    <meta name="viewport" content="width=device-width, initial-scale=1.0, user-scalable=no"/>
    <title>Yolov8 test</title>

    <!-- ONNX RUNTIME -->
    <script src="https://cdn.jsdelivr.net/npm/onnxruntime-web/dist/ort.webgpu.min.js"></script>
  </head>
  <body>
   <script>
    const modelName = "yolov8n-seg-batching.onnx";
    const modelInputShape = [1, 3, 640, 640];
    async function testModel() {
    
        let model =  await ort.InferenceSession.create(modelName,{ executionProviders: ["webgpu","cpu"] });
        const tensor = new ort.Tensor("float32",new Float32Array(modelInputShape.reduce((a, b) => a * b)),modelInputShape);
        await model.run({ images: tensor });
        console.log(model);
    }
    testModel();
    </script>
  </body>
</html>

Urgency

It's somewhat urgent

ONNX Runtime Installation

Built from Source

ONNX Runtime Version or Commit ID

1.16.3

Execution Provider

'webgpu' (WebGPU)

May 17 '24 12:05 shimaamorsy

Hi @xadupre,

I exported my model using the following code:

model.export(format='onnx', dynamic=True, simplify=True, opset=12)

However, the exported ONNX model currently runs on the GPU only when the batch size is 1. What should I do if I want to run the ONNX model on the GPU with a batch size greater than 1?

I need your assistance to complete this task. Thank you in advance!

May 25 '24 17:05 shimaamorsy

If need opset 12, I recommend using torch.onnx.export. You can set the parameter dynamic_axes with a value similar to dynamic_axes={"x": {0: "my_custom_axis_name"}}.

May 27 '24 07:05 xadupre

This issue has been automatically marked as stale due to inactivity and will be closed in 30 days if no further activity occurs. If further support is needed, please provide an update and/or more details.

Jun 26 '24 15:06 github-actions[bot]

Really sorry for the late reply! Currently in WebGPU EP implementation, there is a limitation that axis in softmax has to the last. We will soon fix this.

Sep 14 '24 09:09 gyagp