ck icon indicating copy to clipboard operation
ck copied to clipboard

Portable CM script failed (name = build-docker-image, return code = 256)

Open Agalakdak opened this issue 1 year ago • 1 comments

I followed this guide: https://access.cknowledge.org/playground/?action=install

And then i use: cm pull repo mlcommons@cm4mlops --branch=dev

Ran this command: cmr "run-mlperf inference _find-performance _full _r4.1"
--model=bert-99
--implementation=nvidia
--framework=tensorrt
--category=datacenter
--scenario=Offline
--execution_mode=test
--device=cuda
--docker
--docker_cm_repo=mlcommons@cm4mlops
--docker_cm_repo_flags="--branch=mlperf-inference"
--test_query_count=100
--quiet

And after about 30 minutes I got the issue:

1 warning found (use docker --debug to expand):

  • SecretsUsedInArgOrEnv: Do not use ARG or ENV instructions for sensitive data (ARG "CM_GH_TOKEN") (line 14) mlperf-inference:mlpinf-v4.0-cuda12.2-cudnn8.9-x86_64-ubuntu20.04-public.Dockerfile:45

43 |
44 | # Run commands 45 | >>> RUN cm run script --tags=app,mlperf,inference,generic,_nvidia,_bert-99,_tensorrt,_cuda,_test,_r4.1_default,_offline --quiet=true --env.CM_QUIET=yes --env.CM_MLPERF_IMPLEMENTATION=nvidia --env.CM_MLPERF_MODEL=bert-99 --env.CM_MLPERF_RUN_STYLE=test --env.CM_MLPERF_SUBMISSION_SYSTEM_TYPE=datacenter --env.CM_MLPERF_DEVICE=cuda --env.CM_MLPERF_USE_DOCKER=True --env.CM_MLPERF_BACKEND=tensorrt --env.CM_MLPERF_LOADGEN_SCENARIO=Offline --env.CM_TEST_QUERY_COUNT=100 --env.CM_MLPERF_FIND_PERFORMANCE_MODE=yes --env.CM_MLPERF_LOADGEN_ALL_MODES=no --env.CM_MLPERF_LOADGEN_MODE=performance --env.CM_MLPERF_RESULT_PUSH_TO_GITHUB=False --env.CM_MLPERF_SUBMISSION_GENERATION_STYLE=full --env.CM_MLPERF_SKIP_SUBMISSION_GENERATION=yes --env.CM_MLPERF_INFERENCE_VERSION=4.1 --env.CM_RUN_MLPERF_INFERENCE_APP_DEFAULTS=r4.1_default --env.CM_MLPERF_LAST_RELEASE=v4.0 --env.CM_SUT_DESC_CACHE=no --env.CM_SUT_META_EXISTS=yes --env.CM_MODEL=bert-99 --env.CM_MLPERF_LOADGEN_COMPLIANCE=no --env.CM_MLPERF_LOADGEN_EXTRA_OPTIONS= --env.CM_MLPERF_LOADGEN_SCENARIOS,=Offline --env.CM_MLPERF_LOADGEN_MODES,=performance --env.CM_OUTPUT_FOLDER_NAME=test_results --add_deps_recursive.coco2014-original.tags=_full --add_deps_recursive.coco2014-preprocessed.tags=_full --add_deps_recursive.imagenet-original.tags=_full --add_deps_recursive.imagenet-preprocessed.tags=_full --add_deps_recursive.openimages-original.tags=_full --add_deps_recursive.openimages-preprocessed.tags=_full --add_deps_recursive.openorca-original.tags=_full --add_deps_recursive.openorca-preprocessed.tags=_full --v=False --print_env=False --print_deps=False --dump_version_info=True --quiet --fake_run --env.CM_RUN_STATE_DOCKER=True
46 |


ERROR: failed to solve: process "/bin/bash -c cm run script --tags=app,mlperf,inference,generic,_nvidia,_bert-99,_tensorrt,_cuda,_test,_r4.1_default,_offline --quiet=true --env.CM_QUIET=yes --env.CM_MLPERF_IMPLEMENTATION=nvidia --env.CM_MLPERF_MODEL=bert-99 --env.CM_MLPERF_RUN_STYLE=test --env.CM_MLPERF_SUBMISSION_SYSTEM_TYPE=datacenter --env.CM_MLPERF_DEVICE=cuda --env.CM_MLPERF_USE_DOCKER=True --env.CM_MLPERF_BACKEND=tensorrt --env.CM_MLPERF_LOADGEN_SCENARIO=Offline --env.CM_TEST_QUERY_COUNT=100 --env.CM_MLPERF_FIND_PERFORMANCE_MODE=yes --env.CM_MLPERF_LOADGEN_ALL_MODES=no --env.CM_MLPERF_LOADGEN_MODE=performance --env.CM_MLPERF_RESULT_PUSH_TO_GITHUB=False --env.CM_MLPERF_SUBMISSION_GENERATION_STYLE=full --env.CM_MLPERF_SKIP_SUBMISSION_GENERATION=yes --env.CM_MLPERF_INFERENCE_VERSION=4.1 --env.CM_RUN_MLPERF_INFERENCE_APP_DEFAULTS=r4.1_default --env.CM_MLPERF_LAST_RELEASE=v4.0 --env.CM_SUT_DESC_CACHE=no --env.CM_SUT_META_EXISTS=yes --env.CM_MODEL=bert-99 --env.CM_MLPERF_LOADGEN_COMPLIANCE=no --env.CM_MLPERF_LOADGEN_EXTRA_OPTIONS= --env.CM_MLPERF_LOADGEN_SCENARIOS,=Offline --env.CM_MLPERF_LOADGEN_MODES,=performance --env.CM_OUTPUT_FOLDER_NAME=test_results --add_deps_recursive.coco2014-original.tags=_full --add_deps_recursive.coco2014-preprocessed.tags=_full --add_deps_recursive.imagenet-original.tags=_full --add_deps_recursive.imagenet-preprocessed.tags=_full --add_deps_recursive.openimages-original.tags=_full --add_deps_recursive.openimages-preprocessed.tags=_full --add_deps_recursive.openorca-original.tags=_full --add_deps_recursive.openorca-preprocessed.tags=_full --v=False --print_env=False --print_deps=False --dump_version_info=True --quiet --fake_run --env.CM_RUN_STATE_DOCKER=True" did not complete successfully: exit code: 2

CM error: Portable CM script failed (name = build-docker-image, return code = 256)

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Note that it is often a portability issue of a third-party tool or a native script wrapped and unified by this CM script (automation recipe). Please re-run this script with --repro flag and report this issue with the original command line, cm-repro directory and full log here:

https://github.com/mlcommons/cm4mlops/issues

The CM concept is to collaboratively fix such issues inside portable CM scripts to make existing tools and native scripts more portable, interoperable and deterministic. Thank you!

Do you need information about my system? If so, let me know here.

Agalakdak avatar Jul 31 '24 08:07 Agalakdak

/home/user INFO:root: ! call "postprocess" from /home/user/CM/repos/mlcommons@cm4mlops/script/get-cuda-devices/customize.py GPU Device ID: 0 GPU Name: Quadro RTX 5000 GPU compute capability: 7.5 CUDA driver version: 12.4 CUDA runtime version: 11.8 Global memory: 16892952576 Max clock rate: 1815.000000 MHz Total amount of shared memory per block: 49152 Total number of registers available per block: 65536 Warp size: 32 Maximum number of threads per multiprocessor: 1024 Maximum number of threads per block: 1024 Max dimension size of a thread block X: 1024 Max dimension size of a thread block Y: 1024 Max dimension size of a thread block Z: 64 Max dimension size of a grid size X: 2147483647 Max dimension size of a grid size Y: 65535 Max dimension size of a grid size Z: 65535

Agalakdak avatar Jul 31 '24 08:07 Agalakdak

Followed up here.

arjunsuresh avatar Sep 18 '24 12:09 arjunsuresh