sql-server-language-extensions icon indicating copy to clipboard operation
sql-server-language-extensions copied to clipboard

Improve Python `restore-packages.cmd` execution time in pipeline + Fix Build break due to numpy 2.0+

Open seantleonard opened this issue 1 year ago • 0 comments

Why this change?

  1. Closes #52

    • which highlights the lengthy execution time of running language-extensions/python/build/windows/restore-packages.cmd in pipelines (and also locally). Pipelines have this step taking ~20 minutes to complete. Essentially, expanding the Boost archive takes a long time on Windows due to the count of files.
  2. Closes #54

    • which highlights that Numpy v2.0+ isn't supported in current Boost version, causing pipeline build failure for Python extension.

Background on zip file slowness

Anecdotal data points for :

  • https://github.com/Tudat/tudatBundle/issues/10
  • https://github.com/boostorg/boost/issues/876

Boost Recommendation not to use ZIP files:

We recommend downloading boost_1_79_0.7z and using 7-Zip to decompress it. We no longer recommend .zip files for Boost because they are twice as large as the equivalent .7z files. We don't recommend using Windows' built-in decompression as it can be painfully slow for large archives. Ref

Perf improvements:

  • Before: pipelines average of ~20minute execution time. image

  • Now (Improved to 5 minutes) image

What is this change

  1. Updates the URL used to fetch Boost 1.79.0 from sourceforge to archives.boost.io
  2. Downloads a 7z (7zip) archive instead of zip file because performance increase was drastic when testing locally with a 7z file on Windows. (And Boost docs recommend that we use 7z file)
  3. Output timestamps before extracting boost (which took 20 minutes) to point out that this was the culprit of slow execution.
  4. Added timestamps for building boost so that we can identify duration in pipeline.
  5. Hardcodes Numpy and Pandas dependencies for Python to match the hardcoded versions defined in Linux restore-packages.sh script. Support for Numpy 2.0+ in Boosts hasn't made it to official release yet, only in dev branches. more details in #54
    • Numpy: 1.22.3
    • Pandas: 1.4.2

7zip command reference

%ARCHIVE_TOOL_PATH% x -y -o"%PACKAGES_ROOT%" "boost_%BOOST_VERSION_IN_UNDERSCORE%.7z"
  • %ARCHIVE_TOOL_PATH% -> full path to 7zip exe. on pipeline it is a specific path. Local devs will need to update accordingly.
  • x This is a 7-Zip command that tells 7Zip to extract files from an archive with their full path
  • -y this option automatically answers yes to any prompts (such as overwrite confirmations).
  • -o option to specify output directory where the files will be extracted
  • "path" This is the path to the .7z archive that will be extracted.We recommend downloading boost_1_82_0.7z and using 7-Zip to decompress it. We no longer recommend .zip files for Boost because they are twice as large as the equivalent .7z files. We don't recommend using Windows' built-in decompression as it can be painfully slow for large archives.

How was this tested?

  • passed build pipelines

seantleonard avatar Jun 21 '24 19:06 seantleonard