Subject: Stackdriver-nozzle stops sending logs after exactly two hours
We have installed the Stackdriver Nozzle tile for PCF, version 2.0.1 on our GCP foundation, running on stemcell version 3468.71. We are using PAS v2.1.17. Immediately, we can see logs going to Stackdriver in the GCP Logs Viewer. However, exactly 2 hours later, it stops. Eventually (around 18 hours later), monit restarts the daemon and we get another 2 hours of logs.
As soon as log stops, I we see this in the stackdriver-nozzle.stderr.log:
{"timestamp":"1544765647.654373646","source":"stackdriver-nozzle","message":"stackdriver-nozzle.firehose","log_level":2,"data":{"error":"Error getting token: oauth2: cannot fetch token: 401 Unauthorized\nResponse: {\"error\":\"unauthorized\",\"error_description\":\"No client authentication found. Remember to put a filter upstream of the TokenEndpointAuthenticationFilter.\"}"}}
{"timestamp":"1544765648.812379837","source":"stackdriver-nozzle","message":"stackdriver-nozzle.firehose","log_level":2,"data":{"error":"Error getting token: oauth2: cannot fetch token: 401 Unauthorized\nResponse: {\"error\":\"unauthorized\",\"error_description\":\"No client authentication found. Remember to put a filter upstream of the TokenEndpointAuthenticationFilter.\"}"}}
{"timestamp":"1544765650.957725286","source":"stackdriver-nozzle","message":"stackdriver-nozzle.firehose","log_level":2,"data":{"error":"Error getting token: oauth2: cannot fetch token: 401 Unauthorized\nResponse: {\"error\":\"unauthorized\",\"error_description\":\"No client authentication found. Remember to put a filter upstream of the TokenEndpointAuthenticationFilter.\"}"}}
{"timestamp":"1544765655.124458313","source":"stackdriver-nozzle","message":"stackdriver-nozzle.firehose","log_level":2,"data":{"error":"Error getting token: oauth2: cannot fetch token: 401 Unauthorized\nResponse: {\"error\":\"unauthorized\",\"error_description\":\"No client authentication found. Remember to put a filter upstream of the TokenEndpointAuthenticationFilter.\"}"}}
This continues for the entire time it is broken.
This exact thing is happening on all 3 of our identical foundations in 3 different GCP projects. I have currently put in an hourly cron job to restart the nozzle, and that is keeping us from losing logs, but of course this is not a great solution.
We could use help in narrowing down where this error is coming from, and what we can try in order to remedy it.
Thanks, @ryanaross. Matt's working on an issue w/ similar symptoms. I know a major component of the resolution there involves a dependency on PAS 2.3. Do you have plans to do the 2.1->2.3 upgrade soonish?
Thanks for the reply @evandbrown , we unfortunately cannot upgrade right now, but maybe this will help push that along.