cortex icon indicating copy to clipboard operation
cortex copied to clipboard

"Ruler API Not supported" after upgrading to v1.15.0

Open friedrichg opened this issue 2 years ago • 10 comments

Describe the bug Grafana 8, 9 and 10 provide an alerting ui to edit alerts and recording rules in cortex. After upgrading to https://github.com/cortexproject/cortex/commit/0a1c112048b65b832c74d6611148181b424a7126 Grafana says the Ruler API is not supported and it's not possible to edit recording rules and alerts in cortex anymore

Screenshot 2023-07-07 at 9 28 05 PM

To Reproduce Steps to reproduce the behavior:

  1. Start Minio
docker run --network=host minio/minio:RELEASE.2021-10-13T00-23-17Z server /shared
  1. Create buckets "blocks" and "ruler" in minio using minio admin interface with minioadmin/minioadmin
  2. Start cortex
$ cat mini.yaml
auth_enabled: false
server:
  http_listen_port: 9009
blocks_storage:
  s3:
    access_key_id: minioadmin
    bucket_name: blocks
    endpoint: 127.0.0.1:9000
    insecure: true
    secret_access_key: minioadmin
ingester:
  lifecycler:
    address: 127.0.0.1
    join_after: 0
    min_ready_duration: 0s
    ring:
      kvstore:
        store: inmemory
      replication_factor: 1
querier:
  store_gateway_addresses: http://127.0.0.1:9009
ruler:
  enable_api: true
ruler_storage:
  s3:
    access_key_id: minioadmin
    bucket_name: ruler
    endpoint: 127.0.0.1:9000
    insecure: true
    secret_access_key: minioadmin
target: query-frontend,querier,ruler,store-gateway,compactor,ingester,distributor
$ docker run --network=host --rm -v $PWD/mini.yaml:/mini.yaml cortexproject/cortex:master-0a1c112 -config.file=/mini.yaml
  1. Start Grafana
docker run --rm --network=host -e GF_LOG_LEVEL=debug grafana/grafana:9.5.5
  1. Open grafana on http://127.0.0.1:3000
  2. Login with admin/admin
  3. Add Prometheus Datasource to grafana with url http://127.0.0.1:9009/prometheus
  4. Click Save & Test

Expected behavior Ruler API should be enabled with Cortex v1.15.0 After clicking Save & Test it should say:

Screenshot 2023-07-07 at 10 03 27 PM

As it says when repeating the steps using the previous commit https://github.com/cortexproject/cortex/commit/b1cd7b2abfe0eb6ac9b1efb5162ab781069086ad

docker run --network=host --rm -v $PWD/mini.yaml:/mini.yaml cortexproject/cortex:master-b1cd7b2 -config.file=/mini.yaml

Additional Context This used to return 404:

GET path=/api/datasources/proxy/uid/b98430e5-c3bc-4be3-a3ed-22e908c01b82/api/v1/status/buildinfo status=404 remote_addr=[::1] time_ms=6 duration=6.483347ms size=19 referer=http://127.0.0.1:3000/datasources/edit/b98430e5-c3bc-4be3-a3ed-22e908c01b82 handler=/api/datasources/proxy/uid/:uid/*

friedrichg avatar Jul 07 '23 20:07 friedrichg

Oh.. nice find.

Do you know what cortex were returning with that commit? 404?

alanprot avatar Jul 07 '23 20:07 alanprot

@friedrichg Can you check browser console and let me know what's the response content of the build info API?

yeya24 avatar Jul 07 '23 20:07 yeya24

Do you know what cortex were returning with that commit? 404?

yes, it was returning 404. And yes, if we return 404 everything works again

Can you check browser console and let me know what's the response content of the build info API?

Screenshot 2023-07-10 at 11 05 45 AM

> Referer: http://127.0.0.1:3000/connections/your-connections/datasources/edit/efe8729b-3094-4483-91e5-9fcd5b47a981
> Sec-Fetch-Dest: empty
> Sec-Fetch-Mode: cors
> Sec-Fetch-Site: same-origin
> User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/114.0.0.0 Safari/537.36
> accept: application/json, text/plain, */*
> sec-ch-ua: "Not.A/Brand";v="8", "Chromium";v="114", "Google Chrome";v="114"
> sec-ch-ua-mobile: ?0
> sec-ch-ua-platform: "macOS"
> x-grafana-nocache: true
> x-grafana-org-id: 1
>
< HTTP/1.1 200 OK
< Content-Encoding: deflate
< Content-Length: 120
< Content-Security-Policy: sandbox
< Content-Type: application/json
< Date: Mon, 10 Jul 2023 09:07:02 GMT
< Results-Cache-Gen-Number:
< X-Content-Type-Options: nosniff
< X-Frame-Options: deny
< X-Xss-Protection: 1; mode=block
<
* Connection #0 to host 127.0.0.1 left intact
{"status":"success","data":{"version":"1.13.0","revision":"0a1c112","branch":"master","buildUser":"","buildDate":"","goVersion":"go1.19"}}

friedrichg avatar Jul 10 '23 09:07 friedrichg

+1 This is blocking me from upgrading. And timeline on a fix or known workaround?

philiptrovato avatar Aug 18 '23 14:08 philiptrovato

A simpler solution would be to have a config to enable the build API?

Seems grafana is "Enable a feature flag if we return 2xx?

@yeya24

alanprot avatar Aug 18 '23 18:08 alanprot

This is the code used by Grafana to detect whether the Prometheus datasource supports specific APIs https://github.com/grafana/grafana/blob/main/public/app/features/alerting/unified/api/buildInfo.ts#L54.

  // we are dealing with a Cortex or Loki datasource since the response for buildinfo came up empty
  if (!hasBuildInfo) {
    // check if we can fetch rules via the prometheus compatible api
    const promRulesSupported = await hasPromRulesSupport(name);
    if (!promRulesSupported) {
      throw new Error(`Unable to fetch alert rules. Is the ${name} data source properly configured?`);
    }

    // check if the ruler is enabled
    const rulerSupported = await hasRulerSupport(name);

    return {
      application: PromApplication.Cortex,
      features: {
        rulerApiEnabled: rulerSupported,
      },
    };
  }

  // if no features are reported but buildinfo was returned we're talking to Prometheus
  const { features } = buildInfoResponse.data;
  if (!features) {
    return {
      application: PromApplication.Prometheus,
      features: {
        rulerApiEnabled: false,
      },
    };
  }

Grafana still thinks Cortex doesn't support buildInfo API. If no buildInfo response it will query rules to see if it is supported. But now since Cortex supports buildInfo API, Grafana will think it is talking to Prometheus so just set ruler API as disabled.

I think we need to update Grafana to maybe also check the rules endpoint. Even if I overwrite datasource to Cortex manually, it is still showing Ruler API not supported.

image

I also tried to remove the buildinfo handler and I think what I got is still Ruler API not supported. @friedrichg How did you test it and got ruler API enabled? If this works maybe we can add a flag in Cortex to disable the build info API

yeya24 avatar Aug 18 '23 20:08 yeya24

Yeah, returning 404 again fixes the problem. But why not make an issue in grafana to have them fix it?

friedrichg avatar Aug 18 '23 20:08 friedrichg

I opened https://github.com/grafana/grafana/issues/73526, not sure why returning 404 doesn't work for me. But if it works, I propose we can add a flag to disable build info API, and make it as default to not break UX.

yeya24 avatar Aug 18 '23 20:08 yeya24

Added https://github.com/cortexproject/cortex/pull/5533 to fix the issue. I will probably cut a new release for 1.15.x

yeya24 avatar Aug 30 '23 19:08 yeya24

@yeya24 you can also mark the issue as a known issue https://github.com/cortexproject/cortex/blob/master/CHANGELOG.md#known-issues and do instead the release for v1.16.0

friedrichg avatar Sep 05 '23 11:09 friedrichg

Let me close this issue as I believe it is already solved in v1.16.0

yeya24 avatar Mar 11 '24 08:03 yeya24