terraform-provider-databricks icon indicating copy to clipboard operation
terraform-provider-databricks copied to clipboard

[ISSUE] Issue with `databricks_mws_ncc_private_endpoint_rule` resource

Open emailisabu opened this issue 1 year ago • 9 comments

Hello,

I have a problem with 'databricks_mws_ncc_private_endpoint_rule' module due to timeout, is there any way to pass timeout condition along with the module ?

Configuration

resource "databricks_mws_ncc_private_endpoint_rule" "pe_rule" {
  network_connectivity_config_id = var.ncc_id
  resource_id                    = var.resource_id
  group_id                       = var.subresource_type
  lifecycle {
    ignore_changes = [
      connection_state
    ]
  }
}

Expected Behavior

module.datalakehouse.module.storage_account_adls_services["storage7"].module.ncc_rule_blob[0].databricks_mws_ncc_private_endpoint_rule.pe_rule: Creation complete after 56s [id=bece2dd2-4d68-4a7e-g583-4908bd57cdc0/f9c40a6d-gf5r-4e96-bfd1-b62eb9d7634t]

Actual Behavior

Error: cannot create mws ncc private endpoint rule: Post "https://accounts.azuredatabricks.net/api/2.0/accounts/xxxx/network-connectivity-configs/bedw2dd2-4d68-4a7e-a983-4908bd57cdf0/private-endpoint-rules": request timed out after 1m0s of inactivity │ │ with module.datalakehouse.module.storage_account_adls_services["storage5"].module.ncc_rule_blob[0].databricks_mws_ncc_private_endpoint_rule.pe_rule, │ on .terraform/modules/datalakehouse/databricks/databricks-ncc-rule/main.tf line 1, in resource "databricks_mws_ncc_private_endpoint_rule" "pe_rule": │ 1: resource "databricks_mws_ncc_private_endpoint_rule" "pe_rule" {

Steps to Reproduce

Terraform and provider versions

1.9.4

emailisabu avatar Oct 22 '24 09:10 emailisabu

This defaultProvisionTimeout could solve.

https://github.com/databricks/terraform-provider-databricks/blob/main/mws/resource_mws_ncc_private_endpoint_rule.go

package mws

import (
	"context"
	"time"
	"log"

	"github.com/databricks/databricks-sdk-go/service/settings"
	"github.com/databricks/terraform-provider-databricks/common"

	"github.com/hashicorp/terraform-plugin-sdk/v2/helper/schema"
)

const defaultProvisionTimeout = 5 * time.Minute

func ResourceMwsNccPrivateEndpointRule() common.Resource {
	s := common.StructToSchema(settings.NccAzurePrivateEndpointRule{}, func(m map[string]*schema.Schema) map[string]*schema.Schema {
		for _, p := range []string{"network_connectivity_config_id", "group_id", "resource_id"} {
			common.CustomizeSchemaPath(m, p).SetRequired().SetForceNew()
		}
		for _, p := range []string{"rule_id", "endpoint_name", "connection_state", "creation_time", "updated_time"} {
			common.CustomizeSchemaPath(m, p).SetComputed()
		}
		return m
	})
	p := common.NewPairSeparatedID("network_connectivity_config_id", "rule_id", "/")
	return common.Resource{
		Schema: s,
		Create: func(ctx context.Context, d *schema.ResourceData, c *common.DatabricksClient) error {
			var create settings.CreatePrivateEndpointRuleRequest
			common.DataToStructPointer(d, s, &create)
			create.NetworkConnectivityConfigId = d.Get("network_connectivity_config_id").(string)
			acc, err := c.AccountClient()
			if err != nil {
				return err
			}
			rule, err := acc.NetworkConnectivity.CreatePrivateEndpointRule(ctx, create)
			if err != nil {
				return err
			}
			common.StructToData(rule, s, d)
			p.Pack(d)
			return nil
		},
		Read: func(ctx context.Context, d *schema.ResourceData, c *common.DatabricksClient) error {
			nccId, ruleId, err := p.Unpack(d)
			if err != nil {
				return err
			}
			acc, err := c.AccountClient()
			if err != nil {
				return err
			}
			rule, err := acc.NetworkConnectivity.GetPrivateEndpointRuleByNetworkConnectivityConfigIdAndPrivateEndpointRuleId(ctx, nccId, ruleId)
			if err != nil {
				return err
			}
			return common.StructToData(rule, s, d)
		},
		Delete: func(ctx context.Context, d *schema.ResourceData, c *common.DatabricksClient) error {
			nccId, ruleId, err := p.Unpack(d)
			if err != nil {
				return err
			}
			acc, err := c.AccountClient()
			if err != nil {
				return err
			}
			_, err = acc.NetworkConnectivity.DeletePrivateEndpointRuleByNetworkConnectivityConfigIdAndPrivateEndpointRuleId(ctx, nccId, ruleId)
			return err
		},
		Timeouts: &schema.ResourceTimeout{
			Create: schema.DefaultTimeout(DefaultProvisionTimeout),
			Read:   schema.DefaultTimeout(DefaultProvisionTimeout),
			Delete: schema.DefaultTimeout(DefaultProvisionTimeout),
		},
	}
}

emailisabu avatar Oct 30 '24 09:10 emailisabu

There are three timeouts in play in the TF provider:

  1. The resource timeout (this is what you added above). It ensures that the context passed to the CRUD methods has a timeout set. The default resource timeout for most of our resources is 20m, so the proposed change would actually decrease this timeout. If this timeout is exceeded, users will see a message like context: deadline exceeded. Resources must enable this timeout resource by resource. Once present, users can modify this timeout as needed on a per-resource basis.
  2. The HTTP client timeout. This controls how long the underlying SDK's HTTP client waits for a response for a single API call. This is controlled with http_timeout_seconds in the provider configuration. If this timeout is exceeded, users see a message like request timed out after 1m0s of inactivity. Users can increase this timeout as needed, and it will apply to all API requests made by the TF provider.
  3. The API proxy timeout. This is a server-side timeout that needs to be configured by each API team at Databricks for their API endpoints. If this timeout is exceeded, users see a message like he service at /api/2.0/... is taking too long to process your request. Please try again later or try a faster operation.. Users cannot modify this timeout: it needs to be configured by a backend Databricks team.

Individual API calls need to complete within the timeouts of 2 & 3. All API calls & work done in the provider for a CRUD operation must complete within the timeout of 1.

mgyucht avatar Feb 11 '25 08:02 mgyucht

I had the same issue with request timed out after 1m0s of inactivity . As described on https://registry.terraform.io/providers/databricks/databricks/1.72.0/docs/guides/troubleshooting (problem 2 at the bottom), I increased http_timeout_seconds to 1200 and then for most executions it's working, but then after many attempts I received another error taking too long to process your request. Please try again later or try a faster operation described as problem 3 at the bottom on https://registry.terraform.io/providers/databricks/databricks/1.72.0/docs/guides/troubleshooting. For this it's opened issue #4559 .

How can we solve problem with timeouts for databricks_mws_ncc_private_endpoint_rule resource ?

sebastianczech avatar Apr 14 '25 06:04 sebastianczech

I also encounter the similar issue despite setting the http_timeout_seconds to 10 minutes in the account provider level

provider "databricks" {
  alias                = "account"
  host                = "https://accounts.azuredatabricks.net"
  account_id     = var.DATABRICKS_ACCOUNT_ID
  http_timeout_seconds = 600
}

ramnarayan-code avatar Apr 14 '25 11:04 ramnarayan-code

I've tested new provider 1.74.0 and behaviour is still the same.

sebastianczech avatar Apr 17 '25 13:04 sebastianczech

The issue seems to have been fixed on the Databricks side.

antonbricks avatar May 10 '25 00:05 antonbricks

Recently I received information from Databricks, that it's solved on their side. I've done multiple tests and there is no problem.

sebastianczech avatar May 21 '25 08:05 sebastianczech

We noticed this timeout issue today. databricks_mws_ncc_private_endpoint_rule kept timing out after 1 minute 5 seconds. After 3-4 times trying the deployment with the same result, we increased http_timeout_seconds to 120. On our latest Terraform run, both rules were successfully created after 30-60 seconds.

I don't know if http_timeout_seconds helped or not -- seems like a transient issue as we did not experience this issue in deployment of a different environment yesterday.

amasover avatar Jun 25 '25 15:06 amasover

I'm seeing this issue again.. has there been a regression?

m1nkeh avatar Nov 04 '25 03:11 m1nkeh