OmniParser Too slow in macbook with m3Pro cpu，why，just no any result after waiting for very long time. Can anybody tell me why?

Too slow in macbook with m3Pro cpu，why，just no any result after waiting for very long time. Can anybody tell me why?

I run python gradio_demo.py and I got a UI page, uploading one phone screenshot, it cost very long time, but output nothing.

Feb 21 '25 02:02 hit710

Same problem happens to me. I am using MacBook Pro M3. Earlier version of OmniParser was working for me.

Feb 21 '25 05:02 aravind-manoj

I also encountered the same problem and waited for 12 hours but there was no result. However, I was using Windows system.

Feb 21 '25 05:02 Odland

@Odland Have you tried running omniparserserver.py

Feb 21 '25 05:02 aravind-manoj

same for me, but it shows very quick with demo.ipnb

Feb 21 '25 09:02 sincerity711

@Odland Have you tried running omniparserserver.py

I followed the readme and executed python gradio_demo.py. I did not find the omniparserserver.py file.

Feb 21 '25 09:02 Odland

@Odland Have you tried running omniparserserver.py

I followed the readme and executed python gradio_demo.py. I did not find the omniparserserver.py file.

This file: https://github.com/microsoft/OmniParser/blob/master/omnitool/omniparserserver/omniparserserver.py

Feb 21 '25 10:02 aravind-manoj

same here on an M2 Pro, i tried the changes from @kiyokiku 's PR locally https://github.com/microsoft/OmniParser/pull/195 but same result :(

Feb 22 '25 19:02 dragosMC91

some observations, if i uncheck the Use PaddleOCR option from the gradio gui, the image processing is no longer stuck. Also, the changes from https://github.com/microsoft/OmniParser/pull/195 improve speed by at least an order of magnitude: without PR 195 changes:

0: 736x1280 36 icons, 306.9ms
Speed: 4.7ms preprocess, 306.9ms inference, 0.7ms postprocess per image at shape (1, 3, 736, 1280)
len(filtered_boxes): 45 39
time to get parsed content: 10.686861038208008
finish processing

0: 736x1280 36 icons, 303.1ms
Speed: 4.8ms preprocess, 303.1ms inference, 0.8ms postprocess per image at shape (1, 3, 736, 1280)
len(filtered_boxes): 45 39
time to get parsed content: 10.888475894927979
finish processing

0: 736x1280 36 icons, 302.5ms
Speed: 4.7ms preprocess, 302.5ms inference, 0.9ms postprocess per image at shape (1, 3, 736, 1280)
len(filtered_boxes): 45 39
time to get parsed content: 10.936400175094604
finish processing

with PR 195 changes:

0: 736x1280 36 icons, 334.0ms
Speed: 6.0ms preprocess, 334.0ms inference, 0.6ms postprocess per image at shape (1, 3, 736, 1280)
len(filtered_boxes): 45 39
time to get parsed content: 0.677422046661377
finish processing

0: 736x1280 36 icons, 290.7ms
Speed: 4.3ms preprocess, 290.7ms inference, 0.8ms postprocess per image at shape (1, 3, 736, 1280)
len(filtered_boxes): 45 39
time to get parsed content: 0.20275473594665527
finish processing

0: 736x1280 36 icons, 287.7ms
Speed: 5.4ms preprocess, 287.7ms inference, 0.9ms postprocess per image at shape (1, 3, 736, 1280)
len(filtered_boxes): 45 39
time to get parsed content: 0.18100881576538086
finish processin

Feb 22 '25 20:02 dragosMC91

Yes, I did the same thing on Linux. After I unchecked the option use PaddleOCR, the program was able to run and get the expected results. However, after I checked Use PaddleOCR, the system crashed. I did not go into the cause of the problem. I think it has a lot to do with the hardware resource configuration.

Feb 22 '25 23:02 Odland

Same problem, I fixed it in this way. Try to add queue=False to the function

 submit_button_component.click(
        fn=process,
       ......
        queue=False  
    )

When using queue=True(default), Gradio handles requests in a new process, which prevents the CUDA context from being shared between processes, causing the GPU model inference to hang. Setting queue=False allows requests to be processed directly in the main process, avoiding this issue.

Feb 23 '25 14:02 JerryFanFan

Same problem, I fixed it in this way. Try to add queue=False to the function
 submit_button_component.click(
        fn=process,
       ......
        queue=False  
    )
When using queue=True(default), Gradio handles requests in a new process, which prevents the CUDA context from being shared between processes, causing the GPU model inference to hang. Setting queue=False allows requests to be processed directly in the main process, avoiding this issue.

this did not work for me, no effect and it's still stuck

Feb 23 '25 19:02 dragosMC91

same here on an M2 Pro, i tried the changes from @kiyokiku 's PR locally #195 but same result :(

Instead of using torch==2.4.0 and its dependencies, you can call mps on your Mac system for GPU acceleration, and you need to modify the code in utils.py。My device is a MacBook Pro with MacOS 15 and M1 chips

Feb 26 '25 09:02 zhenhuaplan

Same problem, I fixed it in this way. Try to add queue=False to the function
 submit_button_component.click(
        fn=process,
       ......
        queue=False  
    )
When using queue=True(default), Gradio handles requests in a new process, which prevents the CUDA context from being shared between processes, causing the GPU model inference to hang. Setting queue=False allows requests to be processed directly in the main process, avoiding this issue.

My device is cuda. This fix is pretty effective. Thx @JerryFanFan to solve my big problem that wasted my more time before adding this fix.

Mar 01 '25 08:03 jacky-neo

Yes, I did the same thing on Linux. After I unchecked the option use PaddleOCR, the program was able to run and get the expected results. However, after I checked Use PaddleOCR, the system crashed. I did not go into the cause of the problem. I think it has a lot to do with the hardware resource configuration.

I increased the memory to 48G and checked use PaddleOCR and the results were in line with expectations, which probably confirmed my guess.

Mar 03 '25 13:03 Odland

any update?

May 23 '25 11:05 agn-7