vector icon indicating copy to clipboard operation
vector copied to clipboard

gcp_stackdriver_logs: 401 Unauthorised each hour

Open garethpelly opened this issue 2 years ago • 4 comments

A note for the community

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment

Problem

Each ~hour we are observing "Http status: 401 Unauthorized" in our vector logs coming from the gcp_stackdriver_logs sink. Vector is running in GKE as a headless service and consuming logs from kafka, we do not utilise the credentials_path and instead rely on the Service Account to authenticate. The problem resolves itself after a 2/3 minute period however my understanding is that these logs are not retried and are therefore discarded.

Configuration

type = "gcp_stackdriver_logs"
inputs = ["final_cleanup"]
log_id = "{{ type }}"
project_id = "centralized-logging"
severity_key = "log_level"

batch.max_events = 1000
batch.max_bytes = 9900000

resource.type = "{{ resource_type }}"
resource.project_id = "{{ gcp_project_id }}"
resource.instance_id = "{{ hostname }}"

Version

0.34.1-distroless-libc

Debug Output

No response

Example Data

No response

Additional Context

No response

References

  • https://github.com/vectordotdev/vector/issues/17559
  • https://github.com/vectordotdev/vector/issues/8616

garethpelly avatar Jan 12 '24 18:01 garethpelly

You are correct, it seems like those requests are not retried. I'd argue they should be (per https://github.com/vectordotdev/vector/issues/10870), in addition to refreshing the token before it expires.

Retry logic:

https://github.com/vectordotdev/vector/blob/131ab453d4611699e6f6989546c4b5d289e8768a/src/sinks/util/http.rs#L517-L531

jszwedko avatar Jan 12 '24 22:01 jszwedko

Coming back to this, it seems as though the root issue relates to running more than 1 gcp_stackdriver_logs sink (we had a separate sink sending a subset of logs to a different GCP project). Vector's handling of the authentication token refreshes seems to (perhaps) have a timing/race issue when more than one sink is in play, when we removed the additional sink the 401s were no longer observed.

garethpelly avatar May 10 '24 09:05 garethpelly

Update: The 401s have returned since we scaled back to a single gcp_stackdriver_logs sink.

garethpelly avatar May 23 '24 09:05 garethpelly

@jszwedko I've taken a stab at changing how the token is refreshed in https://github.com/vectordotdev/vector/pull/20574.

garethpelly avatar May 30 '24 11:05 garethpelly

Closing. Fixed in https://github.com/vectordotdev/vector/pull/20574

garethpelly avatar Jul 03 '24 08:07 garethpelly