Runs of scheduled-feeds-srv are running out of memory - investigate possible memory leak
I saw OOM errors occurring at 256Mb and bumped the memory limit to 512Mb to see if that would mitigate the issue.
While the errors are less frequent, they still occur at 512Mb, suggesting a problem.
Example error I can see in the GCP Console:
{
"textPayload": "The request failed because either the HTTP response was malformed or connection to the instance had an error.\nWhile handling this request, the container instance was found to be using too much memory and was terminated. This is likely to cause a new container instance to be used for the next request to this revision. If you see this message frequently, you may have a memory leak in your code or may need more memory. Consider creating a new revision with more memory.",
"insertId": "...",
"httpRequest": {
"requestMethod": "POST",
"requestUrl": "...",
"requestSize": "1255",
"status": 503,
"responseSize": "976",
"userAgent": "Google-Cloud-Scheduler",
"remoteIp": "...",
"serverIp": "...",
"latency": "7.576963527s",
"protocol": "HTTP/1.1"
},
"resource": {
"type": "cloud_run_revision",
"labels": {
"project_id": "ossf-malware-analysis",
"service_name": "scheduled-feeds-srv",
"revision_name": "scheduled-feeds-srv-00052-hol",
"location": "us-central1",
"configuration_name": "scheduled-feeds-srv"
}
},
"timestamp": "2021-11-05T02:40:08.268263Z",
"severity": "ERROR",
"labels": {
"instanceId": "..."
},
"logName": "projects/ossf-malware-analysis/logs/run.googleapis.com%2Frequests",
"trace": "projects/ossf-malware-analysis/traces/...",
"receiveTimestamp": "2021-11-05T02:40:08.276603966Z"
}
After some analysis of the scheduler running on my local machine using net/http/pprof for debugging, I can see that memory jumps substantially while pulling feeds - largely from io.ReadAll and json decoding.
I suspect that the goroutines are running a bunch of http requests and JSON/XML deserialization simultaneously.
For instance:
-
FeedGroupsHandler.ServeHTTP()callsFeedGroup.pollAndPublish()simultaneously. -
FeedGroup.pollAndPublish()callsFeedGroup.poll()which triggers simultaneous calls tofeed.Latest()for each feed. -
feed.Latest()depends on the feed implementation, butnpm,pypiboth fan out go routines for processing feeds.
The IO and JSON objects are also escaping onto the heap, so they have to wait until GC before the memory is reclaimed.
I will mitigate this issue for the time being by bumping the memory limit in GCP to 1GB
#153 has helped with memory consumption a bit and I haven't seen a memory limit hit for over a day now.
I still think it worth trying to lower the number of concurrent requests that are in-flight though as a way to further limit the amount of memory being consumed.