dstack
dstack copied to clipboard
[Bug]: Cannot `dstack run` if `disk` constraint's lower bound is too low
Steps to reproduce
- Create a configuration with
resources.diskset to5GB..> cat hello.dstack.yml type: task commands: - python -c 'print("Hello dstack!")' resources: disk: 5GB.. - Run the configuration and continue with the offers
> dstack run . -f hello.dstack.yml Configuration hello.dstack.yml Project jvstme User jvstme Pool name default-pool Min resources 2..xCPU, 8GB.., 5GB.. (disk) Max price - Max duration 72h Spot policy auto Retry policy no Creation policy reuse-or-create Termination policy destroy-after-idle Termination idle time 300s # BACKEND REGION INSTANCE RESOURCES SPOT PRICE 1 gcp us-west4 e2-standard-2 2xCPU, 8GB, 5GB (disk) yes $0.009104 2 gcp europe-central2 e2-standard-2 2xCPU, 8GB, 5GB (disk) yes $0.011424 3 azure westeurope Standard_D2s_v3 2xCPU, 8GB, 5GB (disk) yes $0.012 ... Shown 3 of 2644 offers, $56.6266 max Continue? [y/n]: y
Expected behaviour
dstack provisions an instance with the minimum allowed disk size that falls into the 5GB.. range. In this example with GCP it is 20 GB. For other clouds it will be different.
Actual behaviour
Provisioning fails. The reason is not clear from the error message in dstack CLI.
sharp-impala-1 provisioning completed (failed)
All provisioning attempts failed. This is likely due to cloud providers not having enough capacity. Check
CLI and server logs for more details.
dstack version
0.17.0
Server logs
ERROR 2024-03-28T16:27:50.459 dstack._internal.server.background.tasks.process_submitted_jobs job(3b02aa)sharp-impala-1-0-0: got exception when launching e2-standard-2 in gcp/northamerica-northeast1
Traceback (most recent call last):
File "/usr/local/lib/python3.11/site-packages/dstack/_internal/server/background/tasks/process_submitted_jobs.py", line 227, in _run_job
launched_instance_info: LaunchedInstanceInfo = await run_async(
^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/dstack/_internal/server/utils/common.py", line 13, in run_async
return await asyncio.get_running_loop().run_in_executor(None, func_with_args)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/concurrent/futures/thread.py", line 58, in run
result = self.fn(*self.args, **self.kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/dstack/dstack_cloud/services/backends/dstack/compute.py", line 69, in run_job
launched_instance_info = backend.compute().run_job(
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/dstack/_internal/core/backends/gcp/compute.py", line 167, in run_job
launched_instance_info = self.create_instance(instance_offer, instance_config)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/dstack/_internal/core/backends/gcp/compute.py", line 127, in create_instance
operation = self.instances_client.insert(request=request)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/google/cloud/compute_v1/services/instances/client.py", line 4164, in insert
response = rpc(
^^^^
File "/usr/local/lib/python3.11/site-packages/google/api_core/gapic_v1/method.py", line 131, in __call__
return wrapped_func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/google/api_core/grpc_helpers.py", line 76, in error_remapped_callable
return callable_(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/google/cloud/compute_v1/services/instances/transports/rest.py", line 3197, in __call__
raise core_exceptions.from_http_response(response)
google.api_core.exceptions.BadRequest: 400 POST https://compute.googleapis.com/compute/v1/projects/dstack/zones/northamerica-northeast1-a/instances: Invalid value for field 'resource.disks[0].initializeParams.diskSizeGb': '5'. Requested disk size cannot be smaller than the image size (20 GB)
Additional information
Notice that the 5 GB disk is shown in the offers too, which is incorrect, because it is not possible to create such instances. But it is not clear whether this should be fixed in gpuhunt or in dstack, because gpuhunt does not seem to set the disk size at all.
>>> items = gpuhunt.query(min_disk_size=5, min_memory=8)
>>> print(*items[:3], sep="\n")
CatalogItem(instance_name='e2-standard-2', location='us-west4-c', price=0.009104, cpu=2, memory=8.0, gpu_count=0, gpu_name=None, gpu_memory=None, spot=True, disk_size=None, provider='gcp')
CatalogItem(instance_name='e2-standard-2', location='us-west4-a', price=0.009104, cpu=2, memory=8.0, gpu_count=0, gpu_name=None, gpu_memory=None, spot=True, disk_size=None, provider='gcp')
CatalogItem(instance_name='e2-standard-2', location='us-west4-b', price=0.009104, cpu=2, memory=8.0, gpu_count=0, gpu_name=None, gpu_memory=None, spot=True, disk_size=None, provider='gcp')