jitsu icon indicating copy to clipboard operation
jitsu copied to clipboard

Sources: respect third-party APIs quotas.

Open absorbb opened this issue 3 years ago • 1 comments

Problem

Some API that Jitsu uses to pull sources data applies quota limits for allowed requests per period of time. If quota limit get exceeded during source sync Jitsu just fail sync job. And consequent tries will probably also fail for the same reason.

Solution

  • For applicable source add advanced parameter: Requests rate limit.
  • Respect provided rate limit during syncs. Calculate current rate from all nodes in cluser.
  • (Optional) For specific sources we can detect Quota exceeded error during sync run and enforce rate limit even when Requests rate limit is not explicitly provided

Quota exceeded error for Google Analytics:

Error 429: Quota exceeded for quota metric 'Requests' and limit 'Requests per minute per user' of service 'analyticsreporting.googleapis.com' for consumer 'project_number:XXXXXXXXX'.
Details:
[
  {
    "@type": "type.googleapis.com/google.rpc.ErrorInfo",
    "domain": "googleapis.com",
    "metadata": {
      "consumer": "projects/XXXXXXXXX",
      "quota_limit": "AnalyticsDefaultRequestsPerMinutePerUser",
      "quota_limit_value": "600",
      "quota_location": "global",
      "quota_metric": "analyticsreporting.googleapis.com/analytics_default_requests",
      "service": "analyticsreporting.googleapis.com"
    },
    "reason": "RATE_LIMIT_EXCEEDED"
  },
  {
    "@type": "type.googleapis.com/google.rpc.Help",
    "links": [
      {
        "description": "Request a higher quota limit.",
        "url": "https://cloud.google.com/docs/quota#requesting_higher_quota"
      }
    ]
  }
]
, rateLimitExceeded

absorbb avatar Aug 05 '22 13:08 absorbb

Quotas should be managed inside a connector code, not internally. Different APIs has a different subject for quotas, it can be IP address, API key, etc. Runtime should give connector code a lever to manage limits.

Here's a Guava's RateLimiter inspired interface:

  /**
   * Creates a rate limiter which prohibits making more than `permits` requests per `opts.periodSeconds` seconds
   * to `resource`.
   *
   * Returns an interface with `acquire()` method that blocks if necessary until permit becomes available
   *
   * opts.periodSecords is optional and is 1 by default
   */
  getRateLimiter(resource: string, permits: string, opts?: { periodSeconds: number }): { acquire: () => Promise<void> };

It should be a part of StreamSink interface, and implemented via Redis (maybe, this pattern?)

(In this particular case resource should be project_id from configuration)

Alternative solution

Alternative solution would be back off strategy on driver level. E.g. if request returned 429, wait for X seconds, then wait for 2X seconds.

vklimontovich avatar Aug 05 '22 16:08 vklimontovich