robostack.github.io icon indicating copy to clipboard operation
robostack.github.io copied to clipboard

Investigate support for building RoboStack distributions for multiple Python minor versions

Open traversaro opened this issue 6 months ago • 6 comments

Until now, each RoboStack distribution (or more technically, each full rebuild/sync) has been built for a single Python version, to limit the maintainer load. While changing this requires discussion, in this issue I would like to at least investigate the technical changes that are required to permit (in theory) to build a given RoboStack full rebuild/sync for multiple Python versions.

Why I am interested in this? In my organization sometimes we are forced to use a specific Python version, that is typically given by a software that it is only available for a specific Python versions. Examples includes:

  • Blender, for which each version is tightly connected to a Python version (as of July 2025 Python 3.11)
  • Each version of IsaacSim/IsaacLab is tightly connected to a release of the (closed source) Omniverse Kit, that in turns is only available for a given Python version (as of July 2025 Python 3.11)

So for example, in this specific time I would like to have fresh releases of ROS packages available for Python 3.11, but on the other hand some users are interested in having packages built for newer Python versions (see https://github.com/RoboStack/robostack.github.io/issues/76). Compatibly with maintainers load, it would be great to have at least a couple Python versions available for each RoboStack's full rebuild/sync.

traversaro avatar Jul 05 '25 09:07 traversaro

So at the moment vinca generates all recipes with Python host dependency, probably for initial simplicity. So for the first step of implementing this, we would need to modify vinca to only add the python host dependency only if actually necessary.

To have a first idea on when this is actually necessary to depend on Python, I assembled a quick script to analyze the existing kilted packages.

click for script
#!/usr/bin/env python
"""
Analyze RoboStack kilted ROS 2 packages for their Python footprint.

This script fetches the list of packages from the robostack-kilted channel,
filters for those with ros2-distro-mutex dependency, downloads and extracts
each package, then classifies them based on their Python content:

* non-Python packages           - nothing in lib/python3.12/site-packages
* pure-Python ("noarch python") - only .py files there
* packages with compiled Python - at least one .so there

The classification is based on the actual file list inside each package
(from info/paths.json), not on metadata.
"""

import json
import os
import re
import tempfile
import urllib.request
import zipfile
import tarfile
from typing import Dict, List, Tuple

# Configuration
CHANNEL_URL = "https://conda.anaconda.org/robostack-kilted/linux-64/repodata.json"
BASE_URL = "https://conda.anaconda.org/robostack-kilted/linux-64/"
RESULTS_DIR = "results"
PY_VER = "3.12"
SITE_PACKAGES = f"lib/python{PY_VER}/site-packages"

# Ensure results directory exists
os.makedirs(RESULTS_DIR, exist_ok=True)

def fetch_repodata() -> Dict:
    """Fetch repodata.json from the robostack-kilted channel."""
    print("Fetching repodata...")
    with urllib.request.urlopen(CHANNEL_URL) as response:
        return json.loads(response.read())

def filter_ros2_packages(repodata: Dict) -> List[Tuple[str, Dict]]:
    """Filter packages that have ros2-distro-mutex dependency."""
    ros2_packages = []
    
    # Check both "packages" and "packages.conda" sections
    all_packages = {}
    if "packages" in repodata:
        all_packages.update(repodata["packages"])
    if "packages.conda" in repodata:
        all_packages.update(repodata["packages.conda"])
    
    for filename, pkg_info in all_packages.items():
        depends = pkg_info.get("depends", [])
        
        # Look for ros2-distro-mutex dependency with kilted variant
        ros2_mutex_pattern = r"ros2-distro-mutex\s+0\.9\.\*\s+kilted_.*"
        
        for dep in depends:
            if re.match(ros2_mutex_pattern, dep):
                ros2_packages.append((filename, pkg_info))
                break
    
    return ros2_packages

def download_and_extract_paths(filename: str) -> List[str]:
    """Download package and extract the list of files from info/paths.json."""
    url = BASE_URL + filename
    
    with tempfile.TemporaryDirectory() as temp_dir:
        # Download the package
        package_path = os.path.join(temp_dir, filename)
        urllib.request.urlretrieve(url, package_path)
        
        # Extract info/paths.json
        paths = []
        try:
            if filename.endswith('.conda'):
                # .conda files are zip archives containing tar.zst files
                with zipfile.ZipFile(package_path, 'r') as zip_file:
                    # Look for info-*.tar.zst file
                    info_files = [name for name in zip_file.namelist() if name.startswith('info-') and name.endswith('.tar.zst')]
                    if info_files:
                        # Extract the info tar.zst file
                        info_tar_path = os.path.join(temp_dir, info_files[0])
                        zip_file.extract(info_files[0], temp_dir)
                        
                        # Open the tar.zst file and extract paths.json
                        import subprocess
                        # Use zstd to decompress, then tar to extract
                        decompressed_path = info_tar_path.replace('.zst', '')
                        subprocess.run(['zstd', '-d', info_tar_path, '-o', decompressed_path], check=True)
                        
                        with tarfile.open(decompressed_path, 'r') as tar_file:
                            try:
                                paths_file = tar_file.extractfile('info/paths.json')
                                if paths_file:
                                    paths_data = json.loads(paths_file.read())
                                    paths = [item.get('_path', item.get('path', '')) for item in paths_data['paths']]
                            except KeyError:
                                pass  # No info/paths.json found
                    else:
                        pass  # No info-*.tar.zst found
            else:
                # .tar.bz2 files
                with tarfile.open(package_path, 'r:bz2') as tar_file:
                    try:
                        paths_file = tar_file.extractfile('info/paths.json')
                        if paths_file:
                            paths_data = json.loads(paths_file.read())
                            paths = [item.get('_path', item.get('path', '')) for item in paths_data['paths']]
                    except KeyError:
                        pass  # No info/paths.json found
        except Exception as e:
            pass  # Skip packages that can't be processed
            
        return paths

def classify_package(paths: List[str]) -> str:
    """Classify package based on files in lib/python3.12/site-packages."""
    python_files = [p for p in paths if p.startswith(SITE_PACKAGES + '/')]
    
    if not python_files:
        return "non-Python"
    
    # Check if any files are .so (compiled extensions)
    has_so = any(p.endswith('.so') for p in python_files)
    
    if has_so:
        return "compiled-Python"
    else:
        return "pure-Python"

def main():
    """Main function to analyze packages."""
    print("Starting RoboStack kilted ROS 2 package analysis...")
    
    # Fetch and filter packages
    repodata = fetch_repodata()
    ros2_packages = filter_ros2_packages(repodata)
    
    print(f"Found {len(ros2_packages)} ROS 2 packages")
    print("Processing all packages...")
    
    # Initialize result counters and lists
    results = {
        "non-Python": [],
        "pure-Python": [],
        "compiled-Python": []
    }
    
    # Process each package
    for i, (filename, pkg_info) in enumerate(ros2_packages, 1):
        print(f"[{i}/{len(ros2_packages)}] {filename}", end=" -> ")
        
        try:
            paths = download_and_extract_paths(filename)
            if paths:
                classification = classify_package(paths)
                python_files_count = len([p for p in paths if p.startswith(SITE_PACKAGES + '/')])
                results[classification].append({
                    "filename": filename,
                    "name": pkg_info.get("name", "unknown"),
                    "version": pkg_info.get("version", "unknown"),
                    "python_files_count": python_files_count
                })
                print(f"{classification} ({python_files_count} files)")
            else:
                print("skipped (no file list)")
        except Exception as e:
            print(f"error: {e}")
    
    # Write results to files
    for category, packages in results.items():
        output_file = os.path.join(RESULTS_DIR, f"{category.replace('-', '_')}_packages.txt")
        with open(output_file, 'w') as f:
            f.write(f"# {category.upper()} PACKAGES ({len(packages)} total)\n\n")
            for pkg in packages:
                f.write(f"{pkg['name']} {pkg['version']} ({pkg['filename']}) - {pkg['python_files_count']} Python files\n")
        print(f"Wrote {len(packages)} {category} packages to {output_file}")
    
    # Print summary
    print(f"\n=== SUMMARY ===")
    print(f"Non-Python packages: {len(results['non-Python'])}")
    print(f"Pure-Python packages: {len(results['pure-Python'])}")
    print(f"Compiled-Python packages: {len(results['compiled-Python'])}")
    print(f"Total processed: {sum(len(packages) for packages in results.values())}")

if __name__ == "__main__":
    main()

The output that out of a total of ~650 packages, the packages are divided as:

# NON-PYTHON PACKAGES (418 total)
# COMPILED-PYTHON PACKAGES (97 total)
# PURE-PYTHON PACKAGES (136 total)

this is a good news, as the the vast majority does not actually install any Python file, so those do not need to be rebuilt for different Python versions, while only ~100 only need to be rebuilt for different Python version, and the vast majority are -msgs packages that generated small pybind11 glue code.

The tricky part to manage is the PURE-PYTHON PACKAGES (136 total). While in theory this could be handled as noarch: python packages (to avoid rebuilding also them for multiple Python versions), in practice it is not trivial to do so as:

  • While they are pure python packages, all these packages install also files outside of the site-package directory, in particular the share/<pkg_name>/resource_index/packages/<pkg_name> and share/<pkg_name>/package.xml . noarch: python can't install files outside of the site-package folder.
  • If packages are converted to noarch: python, they also need to list their entry points in the recipe.yaml, and I am not sure if we can generate this info.

Anyhow, a good model of ROS's pure python package is https://github.com/conda-forge/urdfdom-py-feedstock, so probably working on https://github.com/conda-forge/urdfdom-py-feedstock/issues/8 could be a stepping stone for solving this issue.

traversaro avatar Jul 05 '25 10:07 traversaro

However, a possible preliminary step is just to change vinca to permit to compile for different Python versions only COMPILED-PYTHON PACKAGES and PURE-PYTHON PACKAGES , while avoding to recompiled for different python versions NON-PYTHON PACKAGES that are the vast majority.

traversaro avatar Jul 05 '25 10:07 traversaro

Anyhow, a good model of ROS's pure python package is https://github.com/conda-forge/urdfdom-py-feedstock, so probably working on conda-forge/urdfdom-py-feedstock#8 could be a stepping stone for solving this issue.

Actually I am afraid it may not be possible to model ros pure python packages a noarch: python packages, see https://github.com/conda-forge/urdfdom-py-feedstock/issues/8#issuecomment-3038680227 .

traversaro avatar Jul 05 '25 10:07 traversaro

Hacky suggestion: How about just using sccache for the caching compile outputs, and leave everything else as is? It's used in the pytorch feedstock for Windows at least, and works very well there.

Tobias-Fischer avatar Jul 08 '25 04:07 Tobias-Fischer

Quick bump - did you think more about this, @traversaro?

Tobias-Fischer avatar Aug 22 '25 04:08 Tobias-Fischer

+1 ! Our main problem is that FLIR/Teledyne only serves wheels for there camera control that are build with 3.8 or 3.10. We heavily depend on these cameras. Therefore, I am currently using the humble packages from robostack-staging...hope they don't disappear soon. Where can I finde the source for the ros packages on the channel and could I just rebuild them without vinca using pixi as described in https://pixi.sh/dev/build/ros/ ?

MaximilianHoffmann avatar Sep 27 '25 19:09 MaximilianHoffmann