[ISSUE] Issue with `databricks_mws_ncc_private_endpoint_rule` resource
Hello,
I have a problem with 'databricks_mws_ncc_private_endpoint_rule' module due to timeout, is there any way to pass timeout condition along with the module ?
Configuration
resource "databricks_mws_ncc_private_endpoint_rule" "pe_rule" {
network_connectivity_config_id = var.ncc_id
resource_id = var.resource_id
group_id = var.subresource_type
lifecycle {
ignore_changes = [
connection_state
]
}
}
Expected Behavior
module.datalakehouse.module.storage_account_adls_services["storage7"].module.ncc_rule_blob[0].databricks_mws_ncc_private_endpoint_rule.pe_rule: Creation complete after 56s [id=bece2dd2-4d68-4a7e-g583-4908bd57cdc0/f9c40a6d-gf5r-4e96-bfd1-b62eb9d7634t]
Actual Behavior
Error: cannot create mws ncc private endpoint rule: Post "https://accounts.azuredatabricks.net/api/2.0/accounts/xxxx/network-connectivity-configs/bedw2dd2-4d68-4a7e-a983-4908bd57cdf0/private-endpoint-rules": request timed out after 1m0s of inactivity │ │ with module.datalakehouse.module.storage_account_adls_services["storage5"].module.ncc_rule_blob[0].databricks_mws_ncc_private_endpoint_rule.pe_rule, │ on .terraform/modules/datalakehouse/databricks/databricks-ncc-rule/main.tf line 1, in resource "databricks_mws_ncc_private_endpoint_rule" "pe_rule": │ 1: resource "databricks_mws_ncc_private_endpoint_rule" "pe_rule" {
Steps to Reproduce
Terraform and provider versions
1.9.4
This defaultProvisionTimeout could solve.
https://github.com/databricks/terraform-provider-databricks/blob/main/mws/resource_mws_ncc_private_endpoint_rule.go
package mws
import (
"context"
"time"
"log"
"github.com/databricks/databricks-sdk-go/service/settings"
"github.com/databricks/terraform-provider-databricks/common"
"github.com/hashicorp/terraform-plugin-sdk/v2/helper/schema"
)
const defaultProvisionTimeout = 5 * time.Minute
func ResourceMwsNccPrivateEndpointRule() common.Resource {
s := common.StructToSchema(settings.NccAzurePrivateEndpointRule{}, func(m map[string]*schema.Schema) map[string]*schema.Schema {
for _, p := range []string{"network_connectivity_config_id", "group_id", "resource_id"} {
common.CustomizeSchemaPath(m, p).SetRequired().SetForceNew()
}
for _, p := range []string{"rule_id", "endpoint_name", "connection_state", "creation_time", "updated_time"} {
common.CustomizeSchemaPath(m, p).SetComputed()
}
return m
})
p := common.NewPairSeparatedID("network_connectivity_config_id", "rule_id", "/")
return common.Resource{
Schema: s,
Create: func(ctx context.Context, d *schema.ResourceData, c *common.DatabricksClient) error {
var create settings.CreatePrivateEndpointRuleRequest
common.DataToStructPointer(d, s, &create)
create.NetworkConnectivityConfigId = d.Get("network_connectivity_config_id").(string)
acc, err := c.AccountClient()
if err != nil {
return err
}
rule, err := acc.NetworkConnectivity.CreatePrivateEndpointRule(ctx, create)
if err != nil {
return err
}
common.StructToData(rule, s, d)
p.Pack(d)
return nil
},
Read: func(ctx context.Context, d *schema.ResourceData, c *common.DatabricksClient) error {
nccId, ruleId, err := p.Unpack(d)
if err != nil {
return err
}
acc, err := c.AccountClient()
if err != nil {
return err
}
rule, err := acc.NetworkConnectivity.GetPrivateEndpointRuleByNetworkConnectivityConfigIdAndPrivateEndpointRuleId(ctx, nccId, ruleId)
if err != nil {
return err
}
return common.StructToData(rule, s, d)
},
Delete: func(ctx context.Context, d *schema.ResourceData, c *common.DatabricksClient) error {
nccId, ruleId, err := p.Unpack(d)
if err != nil {
return err
}
acc, err := c.AccountClient()
if err != nil {
return err
}
_, err = acc.NetworkConnectivity.DeletePrivateEndpointRuleByNetworkConnectivityConfigIdAndPrivateEndpointRuleId(ctx, nccId, ruleId)
return err
},
Timeouts: &schema.ResourceTimeout{
Create: schema.DefaultTimeout(DefaultProvisionTimeout),
Read: schema.DefaultTimeout(DefaultProvisionTimeout),
Delete: schema.DefaultTimeout(DefaultProvisionTimeout),
},
}
}
There are three timeouts in play in the TF provider:
- The resource timeout (this is what you added above). It ensures that the context passed to the CRUD methods has a timeout set. The default resource timeout for most of our resources is 20m, so the proposed change would actually decrease this timeout. If this timeout is exceeded, users will see a message like
context: deadline exceeded. Resources must enable this timeout resource by resource. Once present, users can modify this timeout as needed on a per-resource basis. - The HTTP client timeout. This controls how long the underlying SDK's HTTP client waits for a response for a single API call. This is controlled with
http_timeout_secondsin the provider configuration. If this timeout is exceeded, users see a message likerequest timed out after 1m0s of inactivity. Users can increase this timeout as needed, and it will apply to all API requests made by the TF provider. - The API proxy timeout. This is a server-side timeout that needs to be configured by each API team at Databricks for their API endpoints. If this timeout is exceeded, users see a message like
he service at /api/2.0/... is taking too long to process your request. Please try again later or try a faster operation.. Users cannot modify this timeout: it needs to be configured by a backend Databricks team.
Individual API calls need to complete within the timeouts of 2 & 3. All API calls & work done in the provider for a CRUD operation must complete within the timeout of 1.
I had the same issue with request timed out after 1m0s of inactivity . As described on https://registry.terraform.io/providers/databricks/databricks/1.72.0/docs/guides/troubleshooting (problem 2 at the bottom), I increased http_timeout_seconds to 1200 and then for most executions it's working, but then after many attempts I received another error taking too long to process your request. Please try again later or try a faster operation described as problem 3 at the bottom on https://registry.terraform.io/providers/databricks/databricks/1.72.0/docs/guides/troubleshooting. For this it's opened issue #4559 .
How can we solve problem with timeouts for databricks_mws_ncc_private_endpoint_rule resource ?
I also encounter the similar issue despite setting the http_timeout_seconds to 10 minutes in the account provider level
provider "databricks" {
alias = "account"
host = "https://accounts.azuredatabricks.net"
account_id = var.DATABRICKS_ACCOUNT_ID
http_timeout_seconds = 600
}
I've tested new provider 1.74.0 and behaviour is still the same.
The issue seems to have been fixed on the Databricks side.
Recently I received information from Databricks, that it's solved on their side. I've done multiple tests and there is no problem.
We noticed this timeout issue today. databricks_mws_ncc_private_endpoint_rule kept timing out after 1 minute 5 seconds. After 3-4 times trying the deployment with the same result, we increased http_timeout_seconds to 120. On our latest Terraform run, both rules were successfully created after 30-60 seconds.
I don't know if http_timeout_seconds helped or not -- seems like a transient issue as we did not experience this issue in deployment of a different environment yesterday.
I'm seeing this issue again.. has there been a regression?