dashboard-plugin icon indicating copy to clipboard operation
dashboard-plugin copied to clipboard

TimeoutError False Positive? - Function execution duration going to exceeded configured timeout limit

Open ericmillsio opened this issue 4 years ago • 8 comments

I'm hitting a strange issue where my functions execute quickly (<100ms) and on the Lambda side is successful, but it advertises as a TimeoutError in the Serverless Dashboard shown below. My max timeout is set to 15 mins.

"Function execution duration going to exceeded configured timeout limit." Invocation ID: 05f8ca9c-6803-425b-81d3-ac828c428907

Screen Shot 2021-06-03 at 12 00 30 AM

Screen Shot 2021-06-02 at 11 51 59 PM

Only thing I could find was the sigterm_handler might be picking up. I did already look in see if there was any DB connections still open at the end of processing, but all of that seemed fine. Not sure what else it could be there.

What was more strange was the item was initially listed as "No errors" and then transitioned later to a "TimeoutError". This only occurs on my functions that are invoked from "schedule" and "sqs" events, not my "http" API functions. I also messed with what I was returning and nothing seemed to affect it.

Running a Python application

Serverless Versions Framework Core: 2.44.0 Plugin: 5.2.0 SDK: 4.2.3 Components: 3.11.0

Thanks for any help!

ericmillsio avatar Jun 03 '21 05:06 ericmillsio

Thanks for reporting @crearc. Looking into this now!

austencollins avatar Jun 04 '21 21:06 austencollins

@crearc do you observe the error constantly, or from time to time?

Do you have an example of python lambda for which error can be reproduced (I've done some local tests, and could not reproduce the issue)

medikoo avatar Jun 09 '21 10:06 medikoo

I see it constantly for scheduled and sqs jobs that run. Here's one example of one that shows the TimeoutError. Pulled some relevant bits.

  processor:
    handler: src/processor.process
    layers:
      - ${env:LAMBDA_INSIGHTS_ARN}
    events:
      - schedule: rate(1 day)

  runtime: python3.7
  lambdaHashingVersion: 20201221

custom:
  pythonRequirements:
    dockerizePip: non-linux

plugins:
  - serverless-python-requirements
  - serverless-offline
  - serverless-dotenv-plugin
  - serverless-prune-plugin
  - serverless-disable-functions
import json
import logging
from datetime import datetime, timedelta

import boto3
import jwt

logger = logging.getLogger()
logger.setLevel(logging.INFO)

dynamodb = boto3.resource("dynamodb")
table = dynamodb.Table('tokens')


def generate_token():
    # Obfuscated, just builds a token, nothing special
    return 'token_here'


def process(event, context):
    new_token = generate_token()

    response = table.update_item(
        Key={'service': 'a'},
        UpdateExpression="SET val = :val",
        ExpressionAttributeValues={
            ":val": new_token
        }
    )
    logger.info("UpdateItem succeeded:")
    logger.info(json.dumps(response, indent=4))

ericmillsio avatar Jun 09 '21 19:06 ericmillsio

Do you see it also without following plugins:

  - serverless-offline
  - serverless-dotenv-plugin
  - serverless-prune-plugin
  - serverless-disable-functions

?

medikoo avatar Jun 09 '21 19:06 medikoo

@crearc I've created a similar service: https://github.com/medikoo/test-serverless/tree/issue-plugin-592 I've deployed it, and I do not observe the issue.

Can you deploy it and confirm if it represents the issue on your side? If issue is not reproduced, can you update the service and push changes which are needed for reproduction, so I'm able to reproduce it on my side

medikoo avatar Jun 10 '21 12:06 medikoo

I am experiencing the same false positive on the dashboard. However, I know what the culprit is in my case...

When using the datadog agent extension (which does an occasional flush prior to lambda execution) the logs for the invocation will extend beyond the normal lambda timeout. I think this is confusing the TimeoutError since the total lambda time is well beyond the timeout, but the lambda reports that it ran well under the timeout.

I would not be surprised if something is doing the same for @ac360

takamuffin avatar Sep 30 '21 19:09 takamuffin

Was wondering if any progress has been made on this front, or if anybody has found a solution. We also use the Datadog plugin, and we also see a bunch of false positive timeouts that make the invocation log in the UI really hard to parse.

hnandiwada avatar Oct 26 '22 18:10 hnandiwada

@hnandiwada, in a few days, will roll out a new version of the Serverless Console, which should be 100% accurate on that. Please observe the announcements.

medikoo avatar Oct 26 '22 18:10 medikoo