[Azure] Some issues during create
I just wanted to report the following issues happened in the cloudypad create azure process:
- With
dynamicip the installation fails (it seems) for mismatched params somewhere:
Params:
You are about to provision Azure machine with the following details:
Azure subscription: **************
Azure location: italynorth
Instance name: cloudypad
SSH key: *******
VM Size: Standard_NV6ads_A10_v5
Spot instance: true
Public IP Type: dynamic
Disk size: 50
Error:
azure-native:compute:VirtualMachine cloudydev-vm created (47s)
+ pulumi:pulumi:Stack CloudyPad-Azure-cloudydev creating (66s) error: Expected a single IP, got: [{"etag":"W/\"***********************\"","id":"/subscriptions/**********************/resourceGroups/CloudyPad-cloudydev/providers/Microsoft.Network/networkInterfaces/cloudydev-network-interface********/ipConfigurations/cloudydev-ipcfg","name":"cloudydev-ipcfg","primary":true,"privateIPAddress":"10.0.0.4","privateIPAddressVersion":"IPv4","privateIPAllocationMethod":"Dynamic","provisioningState":"Succeeded","subnet":{"id":"/subscriptions/*****************/resourceGroups/CloudyPad-cloudydev/providers/Microsoft.Network/virtualNetworks/cloudydev-vnet/subnets/cloudydev-subnet"},"type":"Microsoft.Network/networkInterfaces/ipConfigurations"}]
- When choosing NV6ads A10 v5 as instance type, NVIDIA drivers fail to install:
Params:
You are about to provision Azure machine with the following details:
Azure subscription: ***************
Azure location: italynorth
Instance name: mypad
SSH key: ***********
VM Size: Standard_NV6ads_A10_v5
Spot instance: true
Public IP Type: static
Disk size: 60
Error:
[ 105.914213] nvidia-nvlink: Nvlink Core is being initialized, major device number 236
[ 105.915500] nvidia 0002:00:00.0: enabling device (0000 -> 0002)
[ 105.918834] NVRM: The NVIDIA GPU 0002:00:00.0 (PCI ID: 10de:2236)
NVRM: installed in this system is not supported by the
NVRM: NVIDIA 550.127.05 driver release.
NVRM: Please see 'Appendix A - Supported NVIDIA GPU Products'
NVRM: in this release's README, available on the operating system
NVRM: specific graphics driver download page at www.nvidia.com.
[ 105.919146] nvidia: probe of 0002:00:00.0 failed with error -1
[ 105.919177] NVRM: The NVIDIA probe routine failed for 1 device(s).
[ 105.919180] NVRM: None of the NVIDIA devices were initialized.
If you prefer I can create an issue for each problem. If more logs are needed I will try to replicate and share.
Thanks for reporting these issues !
-
NV6ads A10 v5is using Nvidia A10 GPU, I'll check why driver installed don't support it - Dynamic IP bug should be straightforward to fix, I'll look into it asap
Dynamic IP bug will be fixed in next release: https://github.com/PierreBeucher/cloudypad/pull/92
Enquired a bit about A10 instance issue, looks like Cloudy Pad should use data center driver in some situation instead of default one. This is a bit more tricky as depending on instance types some other driver should be used, I'll have to map supported instance types to proper driver - not a small feat !
In the meantime I'll remove these instance from the list.
hey, @PierreBeucher , any updates on nvidia driver installation issue? the A10 instance still fails to install nvidia drivers and it's still in the list
I haven't had time to look into this driver issue, I should remove this instance from the list asap.
Same issue. I looked for the cheapest Nvidia GPU on Azure, and it seems to be the NV6ads A10 v5.
It seems that it’s necessary to install the Nvidia GRID driver for this instance type (https://learn.microsoft.com/fr-fr/azure/virtual-machines/linux/n-series-driver-setup
).
I’ll test it, it could be economically interesting to use the NV6ads A10 v5 instead of the NC4as T4 v3.
Unsupported machines have been removed from the list. We do now support Datacenter drivers which allowed most Datacenter GPUs to be supported in AWS, GCP and others - except for Azure A10 as they require custom Azure GRID Drivers. Will need to implement a custom driver selection specifically for Azure A10 instances.
See https://github.com/PierreBeucher/cloudypad/blob/2063b2115b27d9c74d5768a552388a44a46e3d80/src/providers/azure/cli.ts#L61 and https://github.com/PierreBeucher/cloudypad/pull/277
Closing as unsupported Azure instances were removed from listing