Rebuild an unchanged package
I've tried the following:
- Build a CMake package with colcon (rclcpp)
- Rebuild the same package (I didn't change anything in the package) with colcon:
colcon --packages-select rclcpp--> the operation roughly took 5.6 s - Rebuild the same package by using the CMake command that colcon invoked (found in the
logdirectory):AMENT_PREFIX_PATH=.... /usr/bin/cmake --build <path_to_rclcpp> -- -j12 -l12--> the operation took only 900 ms
My question is: what did colcon do in addition to invoke the cmake command? How are the arguments to pass to AMENT_PREFIX_PATH found? I suspect that colcon scrolls through the package.xml files to populate the AMENT_PREFIX_PATH. Would it be possible to cache this path?
what did colcon do in addition to invoke the cmake command?
A fair amount of the ~5 seconds of overhead are probably spent crawling the source workspace, parsing all of the manifest files, and building the dependency graph of all of the discovered packages, processing any colcon extensions such as mixins, metadata, or defaults, as well as invoking the necessary environment hooks in upstream dependencies of rclcpp. If you were able to build rclcpp manually using the cmake command, it's likely that you had already sourced the environment setup, which colcon is processing each time. Note that the full set of environment variables used when invoking the build command can be found in the package's build subdirectory (i.e. build/rclcpp/*.env).
How are the arguments to pass to AMENT_PREFIX_PATH found?
I think that the ament packages themselves do this: https://github.com/ament/ament_cmake/blob/84719051ef2b26070de484e60b8f060ae7a70b06/ament_cmake_core/cmake/environment_hooks/environment/ament_prefix_path.sh
Would it be possible to cache this path?
It might be possible to cache various parts of this process along the way, but I'm afraid that there would be a significant amount of hashing necessary to guarantee that we would be able to use the cache.
I don't see any way to know when we can skip crawling the source workspace, but if we could hash the cumulative set of all of the package manifests that were found, we might be able to avoid parsing them and constructing the dependency graph. I'm not convinced all of that hashing would save us any time, and we'd incur that hashing cost on nearly all colcon operations. This needs to be tested.
Caching the environment hooks might be even harder because they're always laid on top of your current set of environment variables, which are pretty volatile (e.x. PWD, OLDPWD, etc). You'd have to hash all of the hook files in addition to the set of starting variables. It might be possible to scope which variables are considered an input to the hooks and only hash those, but you can see how this is a non-trivial operation.
Roughly speaking, you can get a measure on the overhead of package discovery/parsing by running colcon list in your workspace. This was the basis of some experiments around startup time which I posted about (with flamegraphs, yay!) in https://github.com/colcon/colcon-core/issues/398.
The short answer is that for a medium-large workspace (650 packages), various tricks were able to cut startup time by 50-60%, but it wasn't clear that the complexity, API changes, and potential small-workspace regressions associated with the changes were worth that savings. However, the commits are all still there if someone wants to give it a try and get it over the line.