cloud-ml-examples icon indicating copy to clipboard operation
cloud-ml-examples copied to clipboard

MNMG cuML + XGB on GCP - cloudprovider

Open zronaghi opened this issue 4 years ago • 3 comments

Extending the dask-k8s examples to cloudproivder

zronaghi avatar Apr 22 '21 16:04 zronaghi

Depends on: https://github.com/dask/dask-cloudprovider/issues/281

drobison00 avatar Apr 22 '21 21:04 drobison00

Update: I was able to set up a K8s cluster on Google Cloud, install KubeFlow with the Dask operator, and get Optuna to run a toy HPO job in parallel using the KubeCluster. See https://github.com/hcho3/xgb-hpo-k8s-notebooks/blob/main/barebones-hpo-kubecluster-dask.ipynb

hcho3 avatar Sep 26 '22 17:09 hcho3

Update: I got parallel HPO working with XGBoost GPU: https://github.com/hcho3/xgb-hpo-k8s-notebooks/blob/main/barebones-hpo-kubecluster-dask.ipynb

hcho3 avatar Sep 28 '22 07:09 hcho3

Completed in https://github.com/rapidsai/cloud-ml-examples/pull/192

hcho3 avatar Oct 19 '22 16:10 hcho3

For the record, here is the command to launch a K8s cluster on GCP:

gcloud container clusters create phcho-rapids-gpu-kubeflow \
  --accelerator type=nvidia-tesla-p100,count=2 --machine-type n1-standard-4 \
  --zone us-central1-c --release-channel stable --num-nodes 5
  • Follow https://developer.nvidia.com/blog/accelerating-etl-on-kubeflow-with-rapids/ to install NVIDIA driver and Kubeflow. Make sure to use latest kustomize (4.5.7) and kubeflow/manifests (v1.6.0).
  • When installing the Dask Kubernetes operator, use Helm: https://kubernetes.dask.org/en/latest/operator_installation.html

hcho3 avatar Nov 16 '22 13:11 hcho3