dstack icon indicating copy to clipboard operation
dstack copied to clipboard

[Bug]: Cannot `dstack run` if `disk` constraint's lower bound is too low

Open jvstme opened this issue 1 year ago • 0 comments

Steps to reproduce

  1. Create a configuration with resources.disk set to 5GB..
    > cat hello.dstack.yml 
    type: task
    
    commands:
      - python -c 'print("Hello dstack!")'
    
    resources:
      disk: 5GB..
    
  2. Run the configuration and continue with the offers
    > dstack run . -f hello.dstack.yml 
     Configuration          hello.dstack.yml             
     Project                jvstme                       
     User                   jvstme                       
     Pool name              default-pool                 
     Min resources          2..xCPU, 8GB.., 5GB.. (disk) 
     Max price              -                            
     Max duration           72h                          
     Spot policy            auto                         
     Retry policy           no                           
     Creation policy        reuse-or-create              
     Termination policy     destroy-after-idle           
     Termination idle time  300s                         
    
     #  BACKEND  REGION           INSTANCE         RESOURCES               SPOT  PRICE       
     1  gcp      us-west4         e2-standard-2    2xCPU, 8GB, 5GB (disk)  yes   $0.009104   
     2  gcp      europe-central2  e2-standard-2    2xCPU, 8GB, 5GB (disk)  yes   $0.011424   
     3  azure    westeurope       Standard_D2s_v3  2xCPU, 8GB, 5GB (disk)  yes   $0.012      
        ...                                                                                  
     Shown 3 of 2644 offers, $56.6266 max
    
    Continue? [y/n]: y
    

Expected behaviour

dstack provisions an instance with the minimum allowed disk size that falls into the 5GB.. range. In this example with GCP it is 20 GB. For other clouds it will be different.

Actual behaviour

Provisioning fails. The reason is not clear from the error message in dstack CLI.

sharp-impala-1 provisioning completed (failed)
All provisioning attempts failed. This is likely due to cloud providers not having enough capacity. Check 
CLI and server logs for more details.

dstack version

0.17.0

Server logs

ERROR 2024-03-28T16:27:50.459 dstack._internal.server.background.tasks.process_submitted_jobs job(3b02aa)sharp-impala-1-0-0: got exception when launching e2-standard-2 in gcp/northamerica-northeast1
Traceback (most recent call last):
  File "/usr/local/lib/python3.11/site-packages/dstack/_internal/server/background/tasks/process_submitted_jobs.py", line 227, in _run_job
    launched_instance_info: LaunchedInstanceInfo = await run_async(
                                                   ^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/dstack/_internal/server/utils/common.py", line 13, in run_async
    return await asyncio.get_running_loop().run_in_executor(None, func_with_args)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/concurrent/futures/thread.py", line 58, in run
    result = self.fn(*self.args, **self.kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/dstack/dstack_cloud/services/backends/dstack/compute.py", line 69, in run_job
    launched_instance_info = backend.compute().run_job(
                             ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/dstack/_internal/core/backends/gcp/compute.py", line 167, in run_job
    launched_instance_info = self.create_instance(instance_offer, instance_config)
                             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/dstack/_internal/core/backends/gcp/compute.py", line 127, in create_instance
    operation = self.instances_client.insert(request=request)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/google/cloud/compute_v1/services/instances/client.py", line 4164, in insert
    response = rpc(
               ^^^^
  File "/usr/local/lib/python3.11/site-packages/google/api_core/gapic_v1/method.py", line 131, in __call__
    return wrapped_func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/google/api_core/grpc_helpers.py", line 76, in error_remapped_callable
    return callable_(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/google/cloud/compute_v1/services/instances/transports/rest.py", line 3197, in __call__
    raise core_exceptions.from_http_response(response)
google.api_core.exceptions.BadRequest: 400 POST https://compute.googleapis.com/compute/v1/projects/dstack/zones/northamerica-northeast1-a/instances: Invalid value for field 'resource.disks[0].initializeParams.diskSizeGb': '5'. Requested disk size cannot be smaller than the image size (20 GB)

Additional information

Notice that the 5 GB disk is shown in the offers too, which is incorrect, because it is not possible to create such instances. But it is not clear whether this should be fixed in gpuhunt or in dstack, because gpuhunt does not seem to set the disk size at all.

>>> items = gpuhunt.query(min_disk_size=5, min_memory=8)
>>> print(*items[:3], sep="\n")
CatalogItem(instance_name='e2-standard-2', location='us-west4-c', price=0.009104, cpu=2, memory=8.0, gpu_count=0, gpu_name=None, gpu_memory=None, spot=True, disk_size=None, provider='gcp')
CatalogItem(instance_name='e2-standard-2', location='us-west4-a', price=0.009104, cpu=2, memory=8.0, gpu_count=0, gpu_name=None, gpu_memory=None, spot=True, disk_size=None, provider='gcp')
CatalogItem(instance_name='e2-standard-2', location='us-west4-b', price=0.009104, cpu=2, memory=8.0, gpu_count=0, gpu_name=None, gpu_memory=None, spot=True, disk_size=None, provider='gcp')

jvstme avatar Mar 28 '24 16:03 jvstme