aws-lambda-web-adapter icon indicating copy to clipboard operation
aws-lambda-web-adapter copied to clipboard

Advise on how to debug Lambda metric errors that don't show up in Logs

Open andreas-venturini opened this issue 2 years ago • 8 comments

This is following a discussion over at https://github.com/awslabs/aws-lambda-rust-runtime/issues/786 where we were initially advised to enable response streaming for our Lambda function url to work around the bug from that issue.

After changing the function invoke mode to response streaming (and setting AWS_LWA_INVOKE_MODE to response_stream) our Lambda function continued to work normally and there were no errors reported in either CloudWatch or X-Ray. However, Lambda metric reports suddenly started showing an error count (on the chart one can clearly see when buffered mode was changed to response streaming and back).

image

Nothing else but the invoke mode was changed, also these errors are not related to the problematic source file(s) that triggered the bug reported in the linked issue.

We were advised to open an issue about this here.

Any pointers on how we might gain visibility into these errors would be appreciated. We searched our CloudWatch logs using multiple regex patterns, e.g. filter @message LIKE /ERROR/ etc. to no avail.

andreas-venturini avatar Jan 29 '24 11:01 andreas-venturini

Some info from my side:

I'm digging the same issue. I've done some load testing using a single source file. I've configured JSON logs for my function and set the most verbose log levels for both application and system logs. I didn't find any errors in the log yet the function's monitoring reports errors.

The only error-like log records I see are readiness check failures during the function's initialization. Yet these records aren't treated like errors my monitoring, and, in fact, this is a normal behavior.

DarthSim avatar Jan 30 '24 14:01 DarthSim

I made a couple more tests.

  1. I removed the Lambda adapter from the Docker image and added a test native support for Lambda to the software. The errors didn't disappear.

  2. I built a Docker image with a sample program that just anwers OK to every request. Errors didn't disappear. The whole test program code is:

package main

import (
	"net/http"
	"time"

	"github.com/aws/aws-lambda-go/lambdaurl"
)

func main() {
	lambdaurl.Start(http.HandlerFunc(func(rw http.ResponseWriter, req *http.Request) {
		time.Sleep(100 * time.Millisecond)
		rw.Header().Set("Content-Type", "text/plain")
		rw.WriteHeader(200)
		rw.Write([]byte("OK"))
	}))
}

Hense, the Lambda adapter nor our software are not causing that errors.

DarthSim avatar Jan 30 '24 16:01 DarthSim

Thanks for this information. I will do some tests to verify.

In the meantime, from your test results, it seems like a Lambda service issue. Could you please open a ticket with AWS support?

bnusunny avatar Jan 31 '24 01:01 bnusunny

@andreas-venturini @DarthSim It almost recovered. I got 1 or 2 errors out of thousands of invokes. Could you please check if you see the same?

image

bnusunny avatar Feb 01 '24 14:02 bnusunny

Unfortunately, nothing changed in my case. I noticed that the bigger the response the larger the error rate. A function with the code I posted above indeed causes only a couple of errors for thousands of requests. Yet the software that responds with images of a few kilobytes causes tons of errors.

DarthSim avatar Feb 01 '24 15:02 DarthSim

Indeed. I see the same. I'm following up with Lambda team.

bnusunny avatar Feb 02 '24 14:02 bnusunny

@bnusunny has there been any feedback from the Lambda team so far? Thanks

andreas-venturini avatar Apr 30 '24 13:04 andreas-venturini

Lambda team has identified the cause. This should be fixed soon. I will update here when the fixes are rolled out.

bnusunny avatar Apr 30 '24 15:04 bnusunny

@bnusunny anymore information on this? I'm experiencing a similar issue

henriwoodcock avatar Jan 08 '25 12:01 henriwoodcock

@henriwoodcock There was some issue with the rollout.

But this is actually an issue with Lambda Function URL, not with this project. Could you please open a ticket with AWS support? That is the right channel to get this issue fixed.

I will close this one.

bnusunny avatar Jan 08 '25 13:01 bnusunny

@bnusunny thanks for the update but isn't there already an internal ticket for the Lambda team? Or did they decide not to fix it after the rollout issue?
I’m not sure I understand why we should open a new ticket with AWS Support or how that would change the status quo if the Lambda team is already aware.

andreas-venturini avatar Jan 08 '25 13:01 andreas-venturini

As I mentioned before, this is actually a Lambda Function URL issue, not a problem with this repo. Support ticket is the right process to solve it. And customer voice will help Lambda team to priorize the work.

bnusunny avatar Jan 08 '25 13:01 bnusunny

I've opened a support ticket for our issue. If the answer is relevant to this issue I'll make sure to update here too

henriwoodcock avatar Jan 08 '25 14:01 henriwoodcock

This seems to have been fixed on or around Feb 4th

Image

andreas-venturini avatar Mar 18 '25 12:03 andreas-venturini