initialization-actions
initialization-actions copied to clipboard
Initialization failed when action with `mlvm.sh`
Command to create a GPU cluster
$ gcloud dataproc clusters create cluster-b9b8-gpu \
--autoscaling-policy policy-d005-gpu \
--enable-component-gateway \
--region europe-west4 \
--zone europe-west4-a \
--master-machine-type n1-standard-8 \
--master-boot-disk-type pd-ssd \
--master-boot-disk-size 1000 \
--num-workers 2 \
--worker-machine-type n1-standard-8 \
--worker-boot-disk-type pd-ssd \
--worker-boot-disk-size 1000 \
--worker-accelerator type=nvidia-tesla-v100,count=1 \
--num-secondary-workers 2 \
--secondary-worker-boot-disk-type pd-ssd \
--secondary-worker-boot-disk-size 1000 \
--num-secondary-worker-local-ssds 0 \
--secondary-worker-accelerator type=nvidia-tesla-v100,count=1 \
--image-version 2.0-debian10 \
--metadata include-gpus=true \
--metadata gpu-driver-provider=NVIDIA \
--metadata init-actions-repo=gs://goog-dataproc-initialization-actions-europe-west4 \
--initialization-actions=gs://goog-dataproc-initialization-actions-europe-west4/mlvm/mlvm.sh \
--initialization-action-timeout=45m \
--properties dataproc:efm.spark.shuffle=primary-worker \
--optional-components JUPYTER \
--project ${PROJECT}
Output log
...
+ execute_with_retries '/opt/conda/miniconda3/envs/mamba/bin/mamba install -y r-dplyr=1.0 r-essentials=4.0 r-sparklyr=1.7 scikit-learn=0.24 pytorch=1.9 torchvision=0.9 xgboost=1.4 r-xgboost=1.4 -p /opt/conda/miniconda3'
+ local -r 'cmd=/opt/conda/miniconda3/envs/mamba/bin/mamba install -y r-dplyr=1.0 r-essentials=4.0 r-sparklyr=1.7 scikit-learn=0.24 pytorch=1.9 torchvision=0.9 xgboost=1.4 r-xgboost=1.4 -p /opt/conda/miniconda3'
+ (( i = 0 ))
+ (( i < 10 ))
+ eval '/opt/conda/miniconda3/envs/mamba/bin/mamba install -y r-dplyr=1.0 r-essentials=4.0 r-sparklyr=1.7 scikit-learn=0.24 pytorch=1.9 torchvision=0.9 xgboost=1.4 r-xgboost=1.4 -p /opt/conda/miniconda3'
++ /opt/conda/miniconda3/envs/mamba/bin/mamba install -y r-dplyr=1.0 r-essentials=4.0 r-sparklyr=1.7 scikit-learn=0.24 pytorch=1.9 torchvision=0.9 xgboost=1.4 r-xgboost=1.4 -p /opt/conda/miniconda3
__ __ __ __
/ \ / \ / \ / \
/ \/ \/ \/ \
███████████████/ /██/ /██/ /██/ /████████████████████████
/ / \ / \ / \ / \ \____
/ / \_/ \_/ \_/ \ o \__,
/ _/ \_____/ `
|/
███╗ ███╗ █████╗ ███╗ ███╗██████╗ █████╗
████╗ ████║██╔══██╗████╗ ████║██╔══██╗██╔══██╗
██╔████╔██║███████║██╔████╔██║██████╔╝███████║
██║╚██╔╝██║██╔══██║██║╚██╔╝██║██╔══██╗██╔══██║
██║ ╚═╝ ██║██║ ██║██║ ╚═╝ ██║██████╔╝██║ ██║
╚═╝ ╚═╝╚═╝ ╚═╝╚═╝ ╚═╝╚═════╝ ╚═╝ ╚═╝
mamba (0.22.1) supported by @QuantStack
GitHub: https://github.com/mamba-org/mamba
Twitter: https://twitter.com/QuantStack
█████████████████████████████████████████████████████████████
Looking for: ['r-dplyr=1.0', 'r-essentials=4.0', 'r-sparklyr=1.7', 'scikit-learn=0.24', 'pytorch=1.9', 'torchvision=0.9', 'xgboost=1.4', 'r-xgboost=1.4']
Pinned packages:
- python 3.8.*
- conda 4.9.*
- python 3.8.*
- r-base 4.0.*
- r-recommended 4.0.*
Encountered problems while solving:
- package torchvision-0.9.1-py38h9e2e28c_1_cpu requires pytorch-cpu, but none of the providers can be installed
In my case it happens something different but ends up in an error.
Here is the command I'm using:
REGION=us-central1
CLUSTER_NAME=default
INIT_ACTIONS_REPO=gs://defaultmultiregionus/dataproc/ephemeral/initactions
gcloud dataproc clusters create ${CLUSTER_NAME} \
--project myproject \
--service-account myserviceaccount \
--region ${REGION} \
--subnet us-central1 \
--master-machine-type n2d-standard-2 \
--worker-machine-type n2d-standard-4 \
--image-version 2.1-ubuntu20 \
--metadata gpu-driver-provider=NVIDIA \
--metadata rapids-runtime=SPARK \
--metadata include-gpus=false \
--metadata spark-bigquery-connector-version=0.29.0 \
--metadata PIP_PACKAGES="numpy==1.24.2 Pillow==9.3" \
--metadata init-actions-repo=${INIT_ACTIONS_REPO} \
--metadata cuda-version=11.8 \
--metadata cudnn-version=8.6.0.163 \
--optional-components JUPYTER \
--initialization-actions ${INIT_ACTIONS_REPO}/mlvm.sh \
--initialization-action-timeout=45m \
--enable-component-gateway
and here is the error I get
+ execute_with_retries '/opt/conda/miniconda3/envs/mamba/bin/mamba install -y r-dplyr=1.0 r-essentials=4.1 r-sparklyr=1.7 scikit-learn=0.24 pytorch=1.9 torchvision=0.9 xgboost=1.4 -p /opt/conda/miniconda3'
+ local -r 'cmd=/opt/conda/miniconda3/envs/mamba/bin/mamba install -y r-dplyr=1.0 r-essentials=4.1 r-sparklyr=1.7 scikit-learn=0.24 pytorch=1.9 torchvision=0.9 xgboost=1.4 -p /opt/conda/miniconda3'
+ (( i = 0 ))
+ (( i < 2 ))
+ eval '/opt/conda/miniconda3/envs/mamba/bin/mamba install -y r-dplyr=1.0 r-essentials=4.1 r-sparklyr=1.7 scikit-learn=0.24 pytorch=1.9 torchvision=0.9 xgboost=1.4 -p /opt/conda/miniconda3'
++ /opt/conda/miniconda3/envs/mamba/bin/mamba install -y r-dplyr=1.0 r-essentials=4.1 r-sparklyr=1.7 scikit-learn=0.24 pytorch=1.9 torchvision=0.9 xgboost=1.4 -p /opt/conda/miniconda3
Invalid spec, no package name found: <NULL>
# >>>>>>>>>>>>>>>>>>>>>> ERROR REPORT <<<<<<<<<<<<<<<<<<<<<<
Traceback (most recent call last):
File "/opt/conda/miniconda3/envs/mamba/lib/python3.11/site-packages/conda/exceptions.py", line 1124, in __call__
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/opt/conda/miniconda3/envs/mamba/lib/python3.11/site-packages/mamba/mamba.py", line 941, in exception_converter
raise e
File "/opt/conda/miniconda3/envs/mamba/lib/python3.11/site-packages/mamba/mamba.py", line 934, in exception_converter
exit_code = _wrapped_main(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/conda/miniconda3/envs/mamba/lib/python3.11/site-packages/mamba/mamba.py", line 892, in _wrapped_main
result = do_call(parsed_args, p)
^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/conda/miniconda3/envs/mamba/lib/python3.11/site-packages/mamba/mamba.py", line 754, in do_call
exit_code = install(args, parser, "install")
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/conda/miniconda3/envs/mamba/lib/python3.11/site-packages/mamba/mamba.py", line 560, in install
print(solver.explain_problems())
^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: Invalid spec, no package name found: <NULL>
`$ /opt/conda/miniconda3/envs/mamba/bin/mamba install -y r-dplyr=1.0 r-essentials=4.1 r-sparklyr=1.7 scikit-learn=0.24 pytorch=1.9 torchvision=0.9 xgboost=1.4 -p /opt/conda/miniconda3`
environment variables:
CIO_TEST=<not set>
CONDA_EXE=/opt/conda/miniconda3/bin/conda
CONDA_PYTHON_EXE=/opt/conda/miniconda3/bin/python
CONDA_ROOT=/opt/conda/miniconda3/envs/mamba
CONDA_SHLVL=0
CURL_CA_BUNDLE=<not set>
LD_PRELOAD=<not set>
PATH=/opt/conda/default/bin:/opt/conda/miniconda3/condabin:/usr/local/sbin:
/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/snap/bin
REQUESTS_CA_BUNDLE=<not set>
SSL_CERT_FILE=<not set>
active environment : None
shell level : 0
user config file : /root/.condarc
populated config files : /root/.condarc
conda version : 23.1.0
conda-build version : not installed
python version : 3.11.0.final.0
virtual packages : __archspec=1=x86_64
__glibc=2.31=0
__linux=5.15.0=0
__unix=0=0
base environment : /opt/conda/miniconda3/envs/mamba (writable)
conda av data dir : /opt/conda/miniconda3/envs/mamba/etc/conda
conda av metadata url : None
channel URLs : https://conda.anaconda.org/conda-forge/linux-64
https://conda.anaconda.org/conda-forge/noarch
https://conda.anaconda.org/pytorch/linux-64
https://conda.anaconda.org/pytorch/noarch
https://repo.anaconda.com/pkgs/main/linux-64
https://repo.anaconda.com/pkgs/main/noarch
https://repo.anaconda.com/pkgs/r/linux-64
https://repo.anaconda.com/pkgs/r/noarch
package cache : /opt/conda/miniconda3/envs/mamba/pkgs
/root/.conda/pkgs
envs directories : /opt/conda/miniconda3/envs/mamba/envs
/root/.conda/envs
platform : linux-64
user-agent : conda/23.1.0 requests/2.28.2 CPython/3.11.0 Linux/5.15.0-1030-gcp ubuntu/20.04.5 glibc/2.31
UID:GID : 0:0
netrc file : None
offline mode : False
An unexpected error has occurred. Conda has prepared the above report.
Looking for: ['r-dplyr=1.0', 'r-essentials=4.1', 'r-sparklyr=1.7', 'scikit-learn=0.24', 'pytorch=1.9', 'torchvision=0.9', 'xgboost=1.4']
Pinned packages:
- python 3.10.*
- conda 22.9.*
- python 3.10.*
- r-base 4.1.*
- r-recommended 4.1.*