MNMG cuML + XGB on GCP - cloudprovider
Extending the dask-k8s examples to cloudproivder
Depends on: https://github.com/dask/dask-cloudprovider/issues/281
Update: I was able to set up a K8s cluster on Google Cloud, install KubeFlow with the Dask operator, and get Optuna to run a toy HPO job in parallel using the KubeCluster. See https://github.com/hcho3/xgb-hpo-k8s-notebooks/blob/main/barebones-hpo-kubecluster-dask.ipynb
Update: I got parallel HPO working with XGBoost GPU: https://github.com/hcho3/xgb-hpo-k8s-notebooks/blob/main/barebones-hpo-kubecluster-dask.ipynb
Completed in https://github.com/rapidsai/cloud-ml-examples/pull/192
For the record, here is the command to launch a K8s cluster on GCP:
gcloud container clusters create phcho-rapids-gpu-kubeflow \
--accelerator type=nvidia-tesla-p100,count=2 --machine-type n1-standard-4 \
--zone us-central1-c --release-channel stable --num-nodes 5
- Follow https://developer.nvidia.com/blog/accelerating-etl-on-kubeflow-with-rapids/ to install NVIDIA driver and Kubeflow. Make sure to use latest
kustomize(4.5.7) andkubeflow/manifests(v1.6.0). - When installing the Dask Kubernetes operator, use Helm: https://kubernetes.dask.org/en/latest/operator_installation.html