"skipping update of position for a file which does not currently exist" grafana agent consuming lot of memory
What's wrong?
I have deployed Static mode Kubernetes operator. And I am using this for pushing the pod logs to loki.
Recently I found out that grafana-agent-logs pod is taking a lot of memory and it is also not able to push the logs.
After checking the logs of the config-reloader container I found some error messages.
level=error ts=2024-03-26T06:33:34.799302016Z caller=runutil.go:100 msg="function failed. Retrying in next tick" err="trigger reload: reload request failed: Post \"http://127.0.0.1:8080/-/reload\": context deadline exceeded"
level=error ts=2024-03-26T06:33:34.799311483Z caller=reloader.go:384 msg="Failed to trigger reload. Retrying." err="trigger reload: reload request failed: Post \"http://127.0.0.1:8080/-/reload\": context deadline exceeded"
level=error ts=2024-03-26T06:34:04.803619827Z caller=runutil.go:100 msg="function failed. Retrying in next tick" err="trigger reload: reload request failed: Post \"http://127.0.0.1:8080/-/reload\": context deadline exceeded (Client.Timeout exceeded while awaiting headers)"
After force deleting this pod the error msg in the config-reloader container is gone and grafana-agent-logs pod’s memory also comes down, it also starts pushing logs to loki.
Can you help me with the reason for the issue and with a better solution, as the current solution is not very optimal
Steps to reproduce
didn't find the reason of this bug
System information
No response
Software version
grafana agent v0.39.0
Configuration
No response
Logs
level=error ts=2024-03-26T06:33:34.799302016Z caller=runutil.go:100 msg="function failed. Retrying in next tick" err="trigger reload: reload request failed: Post \"http://127.0.0.1:8080/-/reload\": context deadline exceeded"
level=error ts=2024-03-26T06:33:34.799311483Z caller=reloader.go:384 msg="Failed to trigger reload. Retrying." err="trigger reload: reload request failed: Post \"http://127.0.0.1:8080/-/reload\": context deadline exceeded"
level=error ts=2024-03-26T06:34:04.803619827Z caller=runutil.go:100 msg="function failed. Retrying in next tick" err="trigger reload: reload request failed: Post \"http://127.0.0.1:8080/-/reload\": context deadline exceeded (Client.Timeout exceeded while awaiting headers)"
observed the same issue once again. on checking grafana agent logs getting
ts=2024-04-02T06:49:12.155205805Z caller=tailer.go:207 level=info component=logs logs_config=namespace/kubernetes-pod-logs component=tailer msg="skipping update of position for a file which does not currently exist" path=/var/log/pods/0.log
ts=2024-04-02T06:49:18.380674964Z caller=entrypoint.go:339 level=info msg="reload of config file requested"
ts=2024-04-02T06:49:18.988442727Z caller=tailer.go:207 level=info component=logs logs_config=namespace/kubernetes-pod-logs component=tailer msg="skipping update of position for a file which does not currently exist" path=/var/log/pods/6.log
It seems the issue is same as issue 3884
I have grafana agent version v0.39.0
This issue has not had any activity in the past 30 days, so the needs-attention label has been added to it.
If the opened issue is a bug, check to see if a newer release fixed your issue. If it is no longer relevant, please feel free to close this issue.
The needs-attention label signals to maintainers that something has fallen through the cracks. No action is needed by you; your issue will be kept open and you do not have to respond to this comment. The label will be removed the next time this job runs if there is new activity.
Thank you for your contributions!
Any update on this? This was suppossed to be fixed by #3885 but it's still coming up in grafana/agent:v0.39.1.
This issue has not had any activity in the past 30 days, so the needs-attention label has been added to it.
If the opened issue is a bug, check to see if a newer release fixed your issue. If it is no longer relevant, please feel free to close this issue.
The needs-attention label signals to maintainers that something has fallen through the cracks. No action is needed by you; your issue will be kept open and you do not have to respond to this comment. The label will be removed the next time this job runs if there is new activity.
Thank you for your contributions!
Having the same issue after upgrading from helm chart V5 to V6 (after migrating from boltdb-shipper to a new tsdb store). No idea what the problem is.
This issue has not had any activity in the past 30 days, so the needs-attention label has been added to it.
If the opened issue is a bug, check to see if a newer release fixed your issue. If it is no longer relevant, please feel free to close this issue.
The needs-attention label signals to maintainers that something has fallen through the cracks. No action is needed by you; your issue will be kept open and you do not have to respond to this comment. The label will be removed the next time this job runs if there is new activity.
Thank you for your contributions!