algorithmic-efficiency icon indicating copy to clipboard operation
algorithmic-efficiency copied to clipboard

Publish Docker Image to Docker Hub to Support a Wider Set of Contributors

Open Naeemkh opened this issue 9 months ago • 0 comments

We frequently build the latest Docker image for this project as part of our workflow, and I’d like to suggest that we publish this image to Docker Hub to support a wider set of contributors.

Running the Docker image locally is straightforward, and users on SLURM-based clusters can easily convert it to a Singularity image. In contrast, building a Singularity image from scratch can be time-consuming and error-prone due to dependency mismatches, build tooling complexity, and GPU driver issues—especially since the Singularity image is not built as frequently as the Docker image in our workflow.

Publishing a prebuilt Docker image would simplify onboarding and usage significantly.

Here is a simple workflow:

  • Push Docker Image to Docker Hub (on any dev machine):
docker tag <local_image_name> <dockerhub_username>/<image_repo_name>
docker push <dockerhub_username>/<image_repo_name>
  • On an HPC/SLURM Cluster (with fakeroot support):
singularity build --fakeroot <image_repo_name>.sif docker://<dockerhub_username>/<image_repo_name>:latest
  • Run a single workflow test:
singularity exec --nv --bind $(pwd):/mnt \
  --env XLA_PYTHON_CLIENT_ALLOCATOR=platform \
  <image_repo_name>.sif \
  python -m tests.reference_algorithm_tests \
    --workload=imagenet_resnet \
    --framework=jax \
    --global_batch_size=16 \
    --log_file=/tmp/jax_log.pkl \
    --submission_path=tests/modeldiffs/vanilla_sgd_jax.py \
    --identical=True \
    --tuning_search_space=None \
    --num_train_steps=10
  • Run all train_diff tests:
singularity exec --nv --bind $(pwd):/mnt \
  --env XLA_PYTHON_CLIENT_ALLOCATOR=platform \
  <image_repo_name>.sif \
  python -m tests.test_traindiffs

Naeemkh avatar Apr 22 '25 01:04 Naeemkh