initialization-actions icon indicating copy to clipboard operation
initialization-actions copied to clipboard

Initialization failed when action with `mlvm.sh`

Open mwbyeon opened this issue 3 years ago • 1 comments

Command to create a GPU cluster

$ gcloud dataproc clusters create cluster-b9b8-gpu \
        --autoscaling-policy policy-d005-gpu \
        --enable-component-gateway \
        --region europe-west4 \
        --zone europe-west4-a \
        --master-machine-type n1-standard-8 \
        --master-boot-disk-type pd-ssd \
        --master-boot-disk-size 1000 \
        --num-workers 2 \
        --worker-machine-type n1-standard-8 \
        --worker-boot-disk-type pd-ssd \
        --worker-boot-disk-size 1000 \
        --worker-accelerator type=nvidia-tesla-v100,count=1 \
        --num-secondary-workers 2 \
        --secondary-worker-boot-disk-type pd-ssd \
        --secondary-worker-boot-disk-size 1000 \
        --num-secondary-worker-local-ssds 0 \
        --secondary-worker-accelerator type=nvidia-tesla-v100,count=1 \
        --image-version 2.0-debian10 \
        --metadata include-gpus=true \
        --metadata gpu-driver-provider=NVIDIA \
        --metadata init-actions-repo=gs://goog-dataproc-initialization-actions-europe-west4 \
        --initialization-actions=gs://goog-dataproc-initialization-actions-europe-west4/mlvm/mlvm.sh \
        --initialization-action-timeout=45m \
        --properties dataproc:efm.spark.shuffle=primary-worker \
        --optional-components JUPYTER \
        --project ${PROJECT}

Output log

...

+ execute_with_retries '/opt/conda/miniconda3/envs/mamba/bin/mamba install -y r-dplyr=1.0 r-essentials=4.0 r-sparklyr=1.7 scikit-learn=0.24 pytorch=1.9 torchvision=0.9 xgboost=1.4 r-xgboost=1.4 -p /opt/conda/miniconda3'
+ local -r 'cmd=/opt/conda/miniconda3/envs/mamba/bin/mamba install -y r-dplyr=1.0 r-essentials=4.0 r-sparklyr=1.7 scikit-learn=0.24 pytorch=1.9 torchvision=0.9 xgboost=1.4 r-xgboost=1.4 -p /opt/conda/miniconda3'
+ (( i = 0 ))
+ (( i < 10 ))
+ eval '/opt/conda/miniconda3/envs/mamba/bin/mamba install -y r-dplyr=1.0 r-essentials=4.0 r-sparklyr=1.7 scikit-learn=0.24 pytorch=1.9 torchvision=0.9 xgboost=1.4 r-xgboost=1.4 -p /opt/conda/miniconda3'
++ /opt/conda/miniconda3/envs/mamba/bin/mamba install -y r-dplyr=1.0 r-essentials=4.0 r-sparklyr=1.7 scikit-learn=0.24 pytorch=1.9 torchvision=0.9 xgboost=1.4 r-xgboost=1.4 -p /opt/conda/miniconda3

                  __    __    __    __
                 /  \  /  \  /  \  /  \
                /    \/    \/    \/    \
███████████████/  /██/  /██/  /██/  /████████████████████████
              /  / \   / \   / \   / \  \____
             /  /   \_/   \_/   \_/   \    o \__,
            / _/                       \_____/  `
            |/
        ███╗   ███╗ █████╗ ███╗   ███╗██████╗  █████╗
        ████╗ ████║██╔══██╗████╗ ████║██╔══██╗██╔══██╗
        ██╔████╔██║███████║██╔████╔██║██████╔╝███████║
        ██║╚██╔╝██║██╔══██║██║╚██╔╝██║██╔══██╗██╔══██║
        ██║ ╚═╝ ██║██║  ██║██║ ╚═╝ ██║██████╔╝██║  ██║
        ╚═╝     ╚═╝╚═╝  ╚═╝╚═╝     ╚═╝╚═════╝ ╚═╝  ╚═╝

        mamba (0.22.1) supported by @QuantStack

        GitHub:  https://github.com/mamba-org/mamba
        Twitter: https://twitter.com/QuantStack

█████████████████████████████████████████████████████████████


Looking for: ['r-dplyr=1.0', 'r-essentials=4.0', 'r-sparklyr=1.7', 'scikit-learn=0.24', 'pytorch=1.9', 'torchvision=0.9', 'xgboost=1.4', 'r-xgboost=1.4']


Pinned packages:
  - python 3.8.*
  - conda 4.9.*
  - python 3.8.*
  - r-base 4.0.*
  - r-recommended 4.0.*


Encountered problems while solving:
  - package torchvision-0.9.1-py38h9e2e28c_1_cpu requires pytorch-cpu, but none of the providers can be installed

mwbyeon avatar Apr 11 '22 16:04 mwbyeon

In my case it happens something different but ends up in an error.

Here is the command I'm using:

REGION=us-central1
CLUSTER_NAME=default
INIT_ACTIONS_REPO=gs://defaultmultiregionus/dataproc/ephemeral/initactions

gcloud dataproc clusters create ${CLUSTER_NAME} \
    --project myproject \
    --service-account myserviceaccount \
    --region ${REGION} \
    --subnet us-central1 \
    --master-machine-type n2d-standard-2 \
    --worker-machine-type n2d-standard-4 \
    --image-version 2.1-ubuntu20 \
    --metadata gpu-driver-provider=NVIDIA \
    --metadata rapids-runtime=SPARK \
    --metadata include-gpus=false \
    --metadata spark-bigquery-connector-version=0.29.0 \
    --metadata PIP_PACKAGES="numpy==1.24.2 Pillow==9.3" \
    --metadata init-actions-repo=${INIT_ACTIONS_REPO} \
    --metadata cuda-version=11.8 \
    --metadata cudnn-version=8.6.0.163 \
    --optional-components JUPYTER \
    --initialization-actions ${INIT_ACTIONS_REPO}/mlvm.sh \
    --initialization-action-timeout=45m \
    --enable-component-gateway 

and here is the error I get

+ execute_with_retries '/opt/conda/miniconda3/envs/mamba/bin/mamba install -y r-dplyr=1.0 r-essentials=4.1 r-sparklyr=1.7 scikit-learn=0.24 pytorch=1.9 torchvision=0.9 xgboost=1.4 -p /opt/conda/miniconda3'
+ local -r 'cmd=/opt/conda/miniconda3/envs/mamba/bin/mamba install -y r-dplyr=1.0 r-essentials=4.1 r-sparklyr=1.7 scikit-learn=0.24 pytorch=1.9 torchvision=0.9 xgboost=1.4 -p /opt/conda/miniconda3'
+ (( i = 0 ))
+ (( i < 2 ))
+ eval '/opt/conda/miniconda3/envs/mamba/bin/mamba install -y r-dplyr=1.0 r-essentials=4.1 r-sparklyr=1.7 scikit-learn=0.24 pytorch=1.9 torchvision=0.9 xgboost=1.4 -p /opt/conda/miniconda3'
++ /opt/conda/miniconda3/envs/mamba/bin/mamba install -y r-dplyr=1.0 r-essentials=4.1 r-sparklyr=1.7 scikit-learn=0.24 pytorch=1.9 torchvision=0.9 xgboost=1.4 -p /opt/conda/miniconda3
Invalid spec, no package name found: <NULL>

# >>>>>>>>>>>>>>>>>>>>>> ERROR REPORT <<<<<<<<<<<<<<<<<<<<<<

    Traceback (most recent call last):
      File "/opt/conda/miniconda3/envs/mamba/lib/python3.11/site-packages/conda/exceptions.py", line 1124, in __call__
        return func(*args, **kwargs)
               ^^^^^^^^^^^^^^^^^^^^^
      File "/opt/conda/miniconda3/envs/mamba/lib/python3.11/site-packages/mamba/mamba.py", line 941, in exception_converter
        raise e
      File "/opt/conda/miniconda3/envs/mamba/lib/python3.11/site-packages/mamba/mamba.py", line 934, in exception_converter
        exit_code = _wrapped_main(*args, **kwargs)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      File "/opt/conda/miniconda3/envs/mamba/lib/python3.11/site-packages/mamba/mamba.py", line 892, in _wrapped_main
        result = do_call(parsed_args, p)
                 ^^^^^^^^^^^^^^^^^^^^^^^
      File "/opt/conda/miniconda3/envs/mamba/lib/python3.11/site-packages/mamba/mamba.py", line 754, in do_call
        exit_code = install(args, parser, "install")
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      File "/opt/conda/miniconda3/envs/mamba/lib/python3.11/site-packages/mamba/mamba.py", line 560, in install
        print(solver.explain_problems())
              ^^^^^^^^^^^^^^^^^^^^^^^^^
    RuntimeError: Invalid spec, no package name found: <NULL>

`$ /opt/conda/miniconda3/envs/mamba/bin/mamba install -y r-dplyr=1.0 r-essentials=4.1 r-sparklyr=1.7 scikit-learn=0.24 pytorch=1.9 torchvision=0.9 xgboost=1.4 -p /opt/conda/miniconda3`

  environment variables:
                 CIO_TEST=<not set>
                CONDA_EXE=/opt/conda/miniconda3/bin/conda
         CONDA_PYTHON_EXE=/opt/conda/miniconda3/bin/python
               CONDA_ROOT=/opt/conda/miniconda3/envs/mamba
              CONDA_SHLVL=0
           CURL_CA_BUNDLE=<not set>
               LD_PRELOAD=<not set>
                     PATH=/opt/conda/default/bin:/opt/conda/miniconda3/condabin:/usr/local/sbin:
                          /usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/snap/bin
       REQUESTS_CA_BUNDLE=<not set>
            SSL_CERT_FILE=<not set>

     active environment : None
            shell level : 0
       user config file : /root/.condarc
 populated config files : /root/.condarc
          conda version : 23.1.0
    conda-build version : not installed
         python version : 3.11.0.final.0
       virtual packages : __archspec=1=x86_64
                          __glibc=2.31=0
                          __linux=5.15.0=0
                          __unix=0=0
       base environment : /opt/conda/miniconda3/envs/mamba  (writable)
      conda av data dir : /opt/conda/miniconda3/envs/mamba/etc/conda
  conda av metadata url : None
           channel URLs : https://conda.anaconda.org/conda-forge/linux-64
                          https://conda.anaconda.org/conda-forge/noarch
                          https://conda.anaconda.org/pytorch/linux-64
                          https://conda.anaconda.org/pytorch/noarch
                          https://repo.anaconda.com/pkgs/main/linux-64
                          https://repo.anaconda.com/pkgs/main/noarch
                          https://repo.anaconda.com/pkgs/r/linux-64
                          https://repo.anaconda.com/pkgs/r/noarch
          package cache : /opt/conda/miniconda3/envs/mamba/pkgs
                          /root/.conda/pkgs
       envs directories : /opt/conda/miniconda3/envs/mamba/envs
                          /root/.conda/envs
               platform : linux-64
             user-agent : conda/23.1.0 requests/2.28.2 CPython/3.11.0 Linux/5.15.0-1030-gcp ubuntu/20.04.5 glibc/2.31
                UID:GID : 0:0
             netrc file : None
           offline mode : False


An unexpected error has occurred. Conda has prepared the above report.

Looking for: ['r-dplyr=1.0', 'r-essentials=4.1', 'r-sparklyr=1.7', 'scikit-learn=0.24', 'pytorch=1.9', 'torchvision=0.9', 'xgboost=1.4']


Pinned packages:
  - python 3.10.*
  - conda 22.9.*
  - python 3.10.*
  - r-base 4.1.*
  - r-recommended 4.1.*

yamigil avatar Mar 28 '23 16:03 yamigil