picotool icon indicating copy to clipboard operation
picotool copied to clipboard

Picotool coprodis unusably slow with large program (W11)

Open UKTailwind opened this issue 1 year ago • 9 comments

MMBasic creates an image which is just under a Megabyte (rp2040). elf2uf2 used to create the uf2 in a couple of seconds. picotool takes several minutes and a processor is running a maximum during this period (W11 PC with I7-12700). It does eventually complete and appears to create a valid uf2. I believe the problem is in the creation of the dis file which is 11.4Mbytes. If I kill picotool using the task manager the UF2 is created but the dis creation fails

UKTailwind avatar Aug 09 '24 17:08 UKTailwind

See in the C/C++ SDK book, you can pass set PICO_NO_COPRO_DIS=1 to cmake, or put it in your CMakeLists.txt, though it should still not be this slow

image

kilograham avatar Aug 09 '24 19:08 kilograham

@UKTailwind Can you provide your elf file, so I can run picotool myself to see where the bottleneck is?

will-v-pi avatar Aug 09 '24 20:08 will-v-pi

PicoMite.zip

UKTailwind avatar Aug 09 '24 21:08 UKTailwind

"you can pass set PICO_NO_COPRO_DIS=1 to cmake"

That fixes it so definitely a problem in the creation of the .dis file

UKTailwind avatar Aug 09 '24 22:08 UKTailwind

This appears to be both a compiler issue and a Windows vs Linux issue - on my 13900H laptop it takes:

  • ~5mins when using picotool compiled with MSVC 2022 on Windows
  • 35s using the precompiled binary at pico-sdk-tools (compiled using gcc in NSIS2) on Windows
  • 8s when compiled with gcc in WSL

So further investigation is definitely warranted, but I'd recommend switching to the precompiled binaries if you can (you can point pico-sdk at these by setting -Dpicotool_DIR=/path/to/picotool in your cmake), or even switching to WSL.

This is only necessary if you want the extra coprocessor dissassmebly functionality - if you're not planning on thoroughly reading the dissassembly, then just setting PICO_NO_COPRO_DIS=1 is probably the best. The extra coprocessor dissassembly just turns some mcr and similar instructions which send/receive from the coprocessors into more meaningful rcp, dcp or gpio instructions for readability.

will-v-pi avatar Aug 10 '24 12:08 will-v-pi

Thanks for looking at it. The strange thing is that elf2uf2 used to create the disassembly listing almost instantaneously. picotool appears to be a win32 application, is the linux version only 32-bit or could that be an issue?

UKTailwind avatar Aug 10 '24 13:08 UKTailwind

elf2uf2 didn’t do any disassembly, the disassembly was performed by the objdump in your toolchain and still is - picotool just modifies the coprocessor lines in that disassembly file.

The picotool I compiled was 64-bit in Windows and Linux so I don’t think that’s the issue. From a quick investigation, it looks like the regex search is just taking far longer on MSVC Windows - taking up 75% of the execution time, whereas on Linux it only takes 8% of the execution time

will-v-pi avatar Aug 10 '24 13:08 will-v-pi

From looking into this, it seems that the MSVC regex library is just ridiculously slow on Windows, so for larger programs I think the recommendataion would be to use a picotool compiled with a different regex library, such as the pre-compiled one in pico-sdk-tools. You could also use chocolatey to install gcc and use that to compile picotool, which is what the GitHub actions use to test pico-sdk and pico-examples - see this file for the shell commands

will-v-pi avatar Aug 28 '24 15:08 will-v-pi

yes, picotool.exe absurdly slow! it takes 5 min to make uf2.

liubinbj avatar Sep 11 '24 04:09 liubinbj

I hear reports that this is really slow on Pi 4 too. let's take another look at performance as it shouldn't take minutes to do this on anything!

kilograham avatar Feb 17 '25 16:02 kilograham