"Ruler API Not supported" after upgrading to v1.15.0
Describe the bug Grafana 8, 9 and 10 provide an alerting ui to edit alerts and recording rules in cortex. After upgrading to https://github.com/cortexproject/cortex/commit/0a1c112048b65b832c74d6611148181b424a7126 Grafana says the Ruler API is not supported and it's not possible to edit recording rules and alerts in cortex anymore
To Reproduce Steps to reproduce the behavior:
- Start Minio
docker run --network=host minio/minio:RELEASE.2021-10-13T00-23-17Z server /shared
- Create buckets "blocks" and "ruler" in minio using minio admin interface with minioadmin/minioadmin
- Start cortex
$ cat mini.yaml
auth_enabled: false
server:
http_listen_port: 9009
blocks_storage:
s3:
access_key_id: minioadmin
bucket_name: blocks
endpoint: 127.0.0.1:9000
insecure: true
secret_access_key: minioadmin
ingester:
lifecycler:
address: 127.0.0.1
join_after: 0
min_ready_duration: 0s
ring:
kvstore:
store: inmemory
replication_factor: 1
querier:
store_gateway_addresses: http://127.0.0.1:9009
ruler:
enable_api: true
ruler_storage:
s3:
access_key_id: minioadmin
bucket_name: ruler
endpoint: 127.0.0.1:9000
insecure: true
secret_access_key: minioadmin
target: query-frontend,querier,ruler,store-gateway,compactor,ingester,distributor
$ docker run --network=host --rm -v $PWD/mini.yaml:/mini.yaml cortexproject/cortex:master-0a1c112 -config.file=/mini.yaml
- Start Grafana
docker run --rm --network=host -e GF_LOG_LEVEL=debug grafana/grafana:9.5.5
- Open grafana on http://127.0.0.1:3000
- Login with admin/admin
- Add Prometheus Datasource to grafana with url http://127.0.0.1:9009/prometheus
- Click Save & Test
Expected behavior Ruler API should be enabled with Cortex v1.15.0 After clicking Save & Test it should say:
As it says when repeating the steps using the previous commit https://github.com/cortexproject/cortex/commit/b1cd7b2abfe0eb6ac9b1efb5162ab781069086ad
docker run --network=host --rm -v $PWD/mini.yaml:/mini.yaml cortexproject/cortex:master-b1cd7b2 -config.file=/mini.yaml
Additional Context This used to return 404:
GET path=/api/datasources/proxy/uid/b98430e5-c3bc-4be3-a3ed-22e908c01b82/api/v1/status/buildinfo status=404 remote_addr=[::1] time_ms=6 duration=6.483347ms size=19 referer=http://127.0.0.1:3000/datasources/edit/b98430e5-c3bc-4be3-a3ed-22e908c01b82 handler=/api/datasources/proxy/uid/:uid/*
Oh.. nice find.
Do you know what cortex were returning with that commit? 404?
@friedrichg Can you check browser console and let me know what's the response content of the build info API?
Do you know what cortex were returning with that commit? 404?
yes, it was returning 404. And yes, if we return 404 everything works again
Can you check browser console and let me know what's the response content of the build info API?
> Referer: http://127.0.0.1:3000/connections/your-connections/datasources/edit/efe8729b-3094-4483-91e5-9fcd5b47a981
> Sec-Fetch-Dest: empty
> Sec-Fetch-Mode: cors
> Sec-Fetch-Site: same-origin
> User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/114.0.0.0 Safari/537.36
> accept: application/json, text/plain, */*
> sec-ch-ua: "Not.A/Brand";v="8", "Chromium";v="114", "Google Chrome";v="114"
> sec-ch-ua-mobile: ?0
> sec-ch-ua-platform: "macOS"
> x-grafana-nocache: true
> x-grafana-org-id: 1
>
< HTTP/1.1 200 OK
< Content-Encoding: deflate
< Content-Length: 120
< Content-Security-Policy: sandbox
< Content-Type: application/json
< Date: Mon, 10 Jul 2023 09:07:02 GMT
< Results-Cache-Gen-Number:
< X-Content-Type-Options: nosniff
< X-Frame-Options: deny
< X-Xss-Protection: 1; mode=block
<
* Connection #0 to host 127.0.0.1 left intact
{"status":"success","data":{"version":"1.13.0","revision":"0a1c112","branch":"master","buildUser":"","buildDate":"","goVersion":"go1.19"}}
+1 This is blocking me from upgrading. And timeline on a fix or known workaround?
A simpler solution would be to have a config to enable the build API?
Seems grafana is "Enable a feature flag if we return 2xx?
@yeya24
This is the code used by Grafana to detect whether the Prometheus datasource supports specific APIs https://github.com/grafana/grafana/blob/main/public/app/features/alerting/unified/api/buildInfo.ts#L54.
// we are dealing with a Cortex or Loki datasource since the response for buildinfo came up empty
if (!hasBuildInfo) {
// check if we can fetch rules via the prometheus compatible api
const promRulesSupported = await hasPromRulesSupport(name);
if (!promRulesSupported) {
throw new Error(`Unable to fetch alert rules. Is the ${name} data source properly configured?`);
}
// check if the ruler is enabled
const rulerSupported = await hasRulerSupport(name);
return {
application: PromApplication.Cortex,
features: {
rulerApiEnabled: rulerSupported,
},
};
}
// if no features are reported but buildinfo was returned we're talking to Prometheus
const { features } = buildInfoResponse.data;
if (!features) {
return {
application: PromApplication.Prometheus,
features: {
rulerApiEnabled: false,
},
};
}
Grafana still thinks Cortex doesn't support buildInfo API. If no buildInfo response it will query rules to see if it is supported. But now since Cortex supports buildInfo API, Grafana will think it is talking to Prometheus so just set ruler API as disabled.
I think we need to update Grafana to maybe also check the rules endpoint. Even if I overwrite datasource to Cortex manually, it is still showing Ruler API not supported.
I also tried to remove the buildinfo handler and I think what I got is still Ruler API not supported. @friedrichg How did you test it and got ruler API enabled? If this works maybe we can add a flag in Cortex to disable the build info API
Yeah, returning 404 again fixes the problem. But why not make an issue in grafana to have them fix it?
I opened https://github.com/grafana/grafana/issues/73526, not sure why returning 404 doesn't work for me. But if it works, I propose we can add a flag to disable build info API, and make it as default to not break UX.
Added https://github.com/cortexproject/cortex/pull/5533 to fix the issue. I will probably cut a new release for 1.15.x
@yeya24 you can also mark the issue as a known issue https://github.com/cortexproject/cortex/blob/master/CHANGELOG.md#known-issues and do instead the release for v1.16.0
Let me close this issue as I believe it is already solved in v1.16.0