powertools-lambda-python icon indicating copy to clipboard operation
powertools-lambda-python copied to clipboard

feat(event-handler): add appsync batch invoke

Open mploski opened this issue 2 years ago • 46 comments

Issue Number: https://github.com/awslabs/aws-lambda-powertools-python/issues/1303

Summary

Support Appsync batching mechanism for Lambda Resolvers

Changes

  • Modify Appsync BaseRouter to accept List of AppSyncResolverEventT events alongside singular event
  • Store List of AppSyncResolverEventT events in BaseRouter.current_event
  • Add Functional test covering batch processing
  • Add aws-cdk-aws-appsync-alpha cdk package
  • Add end2end tests that
    • deploys AppSync API with lambda resolver and specific schema
    • Validates different resolvers including batching mechanism

User experience

If batch processing is used for specific method, user can access AppSyncResolver events directly in their annotated method and do appropriate processing. Appsync resolver calls method for every event from the coming list of events:

from typing import List, Optional

from pydantic import BaseModel

from aws_lambda_powertools.event_handler import AppSyncResolver
from aws_lambda_powertools.utilities.data_classes import AppSyncResolverEvent
from aws_lambda_powertools.utilities.typing import LambdaContext

app = AppSyncResolver()


posts = {
    "1": {
        "post_id": "1",
        "title": "First book",
        "author": "Author1",
        "url": "https://amazon.com/",
        "content": "SAMPLE TEXT AUTHOR 1",
        "ups": "100",
        "downs": "10",
    },
    "2": {
        "post_id": "2",
        "title": "Second book",
        "author": "Author2",
        "url": "https://amazon.com",
        "content": "SAMPLE TEXT AUTHOR 2",
        "ups": "100",
        "downs": "10",
    },
    "3": {
        "post_id": "3",
        "title": "Third book",
        "author": "Author3",
        "url": None,
        "content": None,
        "ups": None,
        "downs": None,
    }
}

posts_related = {
    "1": [posts["2"]],
    "2": [posts["3"], posts["1"]],
    "3": [posts["2"], posts["1"]],
}


class Post(BaseModel):
    post_id: str
    author: str
    title: str
    url: str
    content: str
    ups: str
    downs: str

@app.batch_resolver(type_name="Post", field_name="relatedPosts")
def related_posts(event: AppSyncResolverEvent) -> Optional[list]:
    return posts_related[event.source["post_id"]] if event.source else None


def lambda_handler(event, context: LambdaContext) -> dict:
    return app.resolve(event, context)

Under the hood, Router:

  1. Recognize we got a list of events so it assumes request coming from appsync batch operations.
  2. It registers methods that we want to use to handle batch events. We use for it brand new resolver called batch_resolver.
  3. Router finds this method bases on type_name and field_name and calls it for every event in the list. It passes the event itself to it as parameter argument. Optionally it pass user specified parameters if they are present - they are taken from the inside of the event itself. We pass user specified parameters to keep similar UX as with base resolver.
  4. Once all events are processed by decorated method, router combines the output from multiple method calls in a form of a list and send it back to the appsync.

Checklist

If your change doesn't seem to apply, please leave them unchecked.

Is this a breaking change?

RFC issue number:

Checklist:

  • [ ] Migration process documented
  • [ ] Implement warnings (if it can live side by side)

Acknowledgment

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

Disclaimer: We value your time and bandwidth. As such, any pull requests created on non-triaged issues might not be successful.

mploski avatar Mar 10 '23 21:03 mploski

Codecov Report

Attention: Patch coverage is 99.25373% with 1 line in your changes missing coverage. Please review.

Project coverage is 96.45%. Comparing base (e14e768) to head (a612e38). Report is 657 commits behind head on develop.

Files Patch % Lines
aws_lambda_powertools/event_handler/appsync.py 98.78% 1 Missing :warning:
Additional details and impacted files
@@             Coverage Diff             @@
##           develop    #1998      +/-   ##
===========================================
+ Coverage    96.38%   96.45%   +0.07%     
===========================================
  Files          214      223       +9     
  Lines        10030    10723     +693     
  Branches      1846     1995     +149     
===========================================
+ Hits          9667    10343     +676     
- Misses         259      268       +9     
- Partials       104      112       +8     

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

codecov-commenter avatar Mar 10 '23 22:03 codecov-commenter

Thank you so much for the change, that's much better! Made one comment about favouring composition instead of multiple inheritance (it'll bite us!).

One super helpful change would be to update PR description with a sample UX, so that customers could easily spot what's coming, we could reuse in our docs, and release notes :-)

I'll do a proper review next week

@heitorlessa Did further refactor to address things we discussed (UX + design). Let me know if that's ok

mploski avatar Mar 31 '23 14:03 mploski

Hello @mploski! I'm still working on validating this PR and creating tests with as many variations as we have to make sure it's ok. So far I haven't found any problems. We're releasing a release today to fix an emergency bug we have in Codepipeline's EventSource, but I hope to include this PR in the next PR.

Thanks as always for putting in the effort to make this project even better. :rocket:

leandrodamascena avatar Apr 14 '23 08:04 leandrodamascena

Hi @mploski!!

I tested the code in different ways and with different resolvers and fields and everything worked exactly as expected. I spent a few days doing exploratory tests and found nothing that failed. I also ran the end2end tests and checked the features and everything was as expected.

My questions/next steps at this point are:

1 - Whether we will have a problem in the end2end test when running the different versions of Python in parallel. I haven't used the aws-cdk.aws-appsync-alpha constructor yet, and I don't know if there are fields that need to be unique across stacks. I know we may have two APIs in AppSync with the same name, but we need to check.

2 - For me the code makes sense and we don't have a break in the current functionality, but I'd like a double-check here @heitorlessa! Just to make sure I'm not missing anything.

3 - We need to adjust the documentation to explain that we now support batch calling and how the user can use it.

But again, great job @mploski. Every PR I interact with you, I learn a lot of new things and improve my skills! Thank you for your patience and we will merge this PR soon :)

leandrodamascena avatar Apr 27 '23 00:04 leandrodamascena

Based on my understanding of the User Experience description of this PR, I am not fond of using AppSyncResolver to iteratively call my python code for every event passed in the batch request. In the proposed setup, I no longer see the benefits of using AppSync Batch Resolvers. Please correct me if I am mistaken.

To illustrate, the Lambda function related_posts() in the provided example uses the "post_id" field from the posts_related dictionary for each event passed in AppSyncResolverEvent. In real-world scenarios, the in-memory lookup of posts_related dictionary is substituted with a database call. We use GraphQL to batch these database calls, particularly if the "post_id" field is expected to be repeated across multiple resolver calls. While the caching ability of AppSync is helpful, the TTL is not very effective (and expensive), and the initial call to populate the AppSync cache can be very slow.

As a user of AppSync Batch Resolvers, I would like to have more control over how my lambda resolvers group responses to Batch Events. This allows us to solve the GraphQL n+1 problem

The example provided shows how we use Lambda functions to respond to AppSync Batch requests, with the posts_batch_resolver() function being a key part of the setup.

@router.resolver(type_name="Post", field_name="relatedPosts")
@cdk_gql.appsync_batch(2000)
@tracer.capture_method
def posts_batch_resolver() -> List[List[Optional[Post]]]:
   # Extract IDs from each request in Batch Call. This will be used for the DB Batch call
    post_ids = batch_util.get_required_ids(event_list=router.current_event.event_list, fieldname="post_id")   

    # Remove duplicate IDs across all events. This reduces redundant lookups for the DB call
    unique_ids = list(set(post_ids))

    # Make single DB call to retrieve data for all IDs
    unique_dict = dao.get_relatedposts_from_id_list(id_list= unique_ids)

    # Build response
    to_return = []
    for _id in post_ids:
        to_return.append({'data':unique_dict[_id] if _id in unique_dict else None})

    return to_return

This is the base Lambda function that sets up the Router and resolvers the AppSync Request..


@logger.inject_lambda_context(correlation_id_path=correlation_paths.APPSYNC_RESOLVER)
@tracer.capture_lambda_handler
def lambda_handler(event: dict, context: LambdaContext) -> dict:
    """ This is the base Lambda handler for all GraphQL reoslvers. """
    if isinstance(event, list):
        return app.resolve(event, context, data_model=batch_util.BatchRequest)
    return app.resolve(event, context)

This is the batch_util file

from typing import List
from aws_lambda_powertools.utilities.data_classes.appsync_resolver_event import (
    AppSyncResolverEvent,
)

class BatchRequest(AppSyncResolverEvent):

    def __init__(self, event_list):
        super().__init__(event_list[0])

        self._event_list : List[AppSyncResolverEvent] = []
        for event in event_list:
            self._event_list.append(AppSyncResolverEvent(event))

    @property
    def event_list(self) -> dict:
        return self._event_list

def get_required_ids(event_list: List[AppSyncResolverEvent], fieldname: str) -> List[str]:
    ids = []
    for event in event_list:
        ids.append(event.source[fieldname])
    return ids

Please let me know if this makes sense.

Btw @cdk_gql.appsync_batch(2000) is a custom decorator we use to auto-generate aws-cdk-lib/aws-appsync CDK code for creating resolvers. I would love to discuss this use case in a separate ticket.

thenamanpatwari avatar Apr 27 '23 02:04 thenamanpatwari

Thanks for the feedback to improve the user experience on this PR @thenamanpatwari. I believe that a better way to resolve this would be to schedule a meeting to discuss this point and understand how to implement this feature.

Do you have an agenda for a meeting this coming week? @mploski @heitorlessa @thenamanpatwari.

Thank you

leandrodamascena avatar May 04 '23 07:05 leandrodamascena

I'd love to connect and figure this out. I am available either on May 11 or May 18 around noon CST

thenamanpatwari avatar May 05 '23 20:05 thenamanpatwari

Thanks for the feedback to improve the user experience on this PR @thenamanpatwari. I believe that a better way to resolve this would be to schedule a meeting to discuss this point and understand how to implement this feature.

Do you have an agenda for a meeting this coming week? @mploski @heitorlessa @thenamanpatwari.

Thank you

Happy to connect and discuss it in details. Im on PTO 11-18 May. Im ready to jump in the call before or after.

mploski avatar May 06 '23 18:05 mploski

Let us talk after your PTO on May 22. Does noon CST work? Can you please setup a Chime room? I am not really sure how I should share my email address with you.

thenamanpatwari avatar May 08 '23 20:05 thenamanpatwari

Naman - could you send an email to aws-lambda-powertools-feedback at amazon dot com?

We can then reply with a link with multiple slots you can choose to make this easier.

Apologies for the back and forth on this

On Mon, 8 May 2023 at 23:25, Naman Patwari @.***> wrote:

Let us talk after your PTO on May 22. Does noon CST work? Can you please setup a Chime room? I am not really sure how I should share my email address with you.

— Reply to this email directly, view it on GitHub https://github.com/awslabs/aws-lambda-powertools-python/pull/1998#issuecomment-1539015298, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAZPQBES2FURGZEU4QBEZDLXFFQDTANCNFSM6AAAAAAVW7KZBE . You are receiving this because you were mentioned.Message ID: @.***>

heitorlessa avatar May 09 '23 05:05 heitorlessa

Email sent. No issues on the back-and-forth, I am just glad this is moving forward. I want to maintain as little code as possible haha.

thenamanpatwari avatar May 09 '23 21:05 thenamanpatwari

Hi @thenamanpatwari! Thank you so much for meeting and for providing useful information and data points to help us deliver a better experience for customers. For now, it's our homework to review all the details you've given us and think about how to take the next steps. Probably next week we will have news about this and we will update here so we can decide the next steps in this PR.

Thanks for supporting Powertools and taking the time to tell us about your experience.

leandrodamascena avatar May 25 '23 11:05 leandrodamascena

Very nice stuff! We're going to begin using this library to speed up our Lambda work.

I implemented my own appsync-router package a few years ago. Some take aways from the batch processing portion (we use batch invocation quite heavily):

  • For most batch invocations, as the datasource developer I don't (and shouldn't) care that It is a batch invocation. The resolver/router library should handle this for me.
  • I should be able to control whether the methods fulfilling the invocations are run concurrently or sequentially (my stuff only does this at the router level, an improvement would be at the field level). Sequentially can be pretty bad for large batches that require individual DB or HTTP resolution.
  • For some batch invocations, the ability to deliver the whole batch at once (as detailed above) is really nice. That was not part of what I built, but I can definitely see a use, especially if you can make one data access call to resolve the whole batch.

I'm thinking about using your resolver in going-forward projects. Also thinking about contributing, if you'll have me. We do a LOT of Lambda development in Python (actually, most of our work).

joseph-wortmann avatar Jun 01 '23 16:06 joseph-wortmann

Just had a call with @mploski and @leandrodamascena ... @leandrodamascena will comment with an actionable plan here soon.

After reading this conversation and learning what Michal and Leandro discussed with @thenamanpatwari, there's another way to improve this with Partial failure + all-at-once processing (Asyncio) while giving power users more flexibility to prevent a N+1.

heitorlessa avatar Jun 08 '23 09:06 heitorlessa

Hello everyone. The next steps in this PR are as follows:

1 - We will enhance the user experience by handling partial failures in batch processing. This improvement will simplify the handling of failed records and reduce the need for manual implementation of error-handling records.

2 - We will introduce the ability to process records async, similar to BatchProcessor utility. For batch resolvers defined as async functions, we'll call with all items at the same time (asyncio.gather), since the intent is clear it can be processed concurrently safely.

@mploski we need to work to add these features and create the documentation.

@thenamanpatwari, your use case is highly valuable to us, and we don't want to miss the opportunity to implement this advanced feature for batch processing with AppSync. We kindly request you open a Feature Request issue so that we can prioritize and support this feature. If possible, please consider submitting a PR with the code for us to merge into the repository. It is very important for us to have contribs/PR from our users.

Thank you all for your patience. Indeed, this implementation has involved complex discussions and considerations.

leandrodamascena avatar Jun 08 '23 15:06 leandrodamascena

Hey @leandrodamascena

Thank you for the update. I have been playing around with SQS Batch Processing feature of Lambda Power Tools and I understand the reasoning behind your use-case. I think there are situations when this feature would be useful for AppSync. However this feature does not solve the GraphQL n+1 problem.

I accept your reasoning of implementing the advanced "batch" batch processing feature as an add-on after your initial implementation for batch processing. I will wait for this PR to get merged and then will either raise a new ticket or consider doing a PR myself along with the help of @joseph-wortmann

thenamanpatwari avatar Jul 03 '23 15:07 thenamanpatwari

Sounds like a plan!

@mploski @leandrodamascena whats the latest here?

What can I do to get this over to the finish line?

heitorlessa avatar Jul 07 '23 04:07 heitorlessa

We have identified some really cool use-cases for this feature. Eagerly looking forward to its successful implementation.

thenamanpatwari avatar Jul 25 '23 21:07 thenamanpatwari

Hey everyone! I had a meeting with @mploski to complete the changes to this PR and we agree to add the Partial Error Handling to a separate PR. The reason we decided this is because returning the error to AppSync requires the customer to change the mapping template to handle the error and it's done.

We'll split the PRs to make code review easier and allow us to create consistents examples of this. This PR has been open for a long time and adding the Partial Error Handling now would take much longer to merge.

Missing in this PR:

  • [ ] Documentation

leandrodamascena avatar Sep 04 '23 15:09 leandrodamascena

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 5 Code Smells

No Coverage information No Coverage information
0.4% 0.4% Duplication

sonarqubecloud[bot] avatar Sep 27 '23 21:09 sonarqubecloud[bot]

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 6 Code Smells

No Coverage information No Coverage information
0.3% 0.3% Duplication

sonarqubecloud[bot] avatar Oct 31 '23 13:10 sonarqubecloud[bot]

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 5 Code Smells

No Coverage information No Coverage information
2.8% 2.8% Duplication

sonarqubecloud[bot] avatar Dec 07 '23 10:12 sonarqubecloud[bot]

Hey team, any updates? Building a feature currently where this capability would be insanely helpful

thenamanpatwari avatar Dec 21 '23 07:12 thenamanpatwari

Hey team, any updates? Building a feature currently where this capability would be insanely helpful

Hi @thenamanpatwari thank you for the ping! We are currently working on two major features that will be released in the beginning of January, and immediately after that we'll look at finalising this one. So our current target to release this feature is the end of January.

rubenfonseca avatar Dec 21 '23 13:12 rubenfonseca

Hey everyone! I appreciate your patience in waiting for this new feature. I'm working on refactoring some parts of this code and we hope to release it in the next version scheduled for February 2nd.

In case of any update, I'll let you know.

leandrodamascena avatar Jan 25 '24 12:01 leandrodamascena

Hey @rubenfonseca! This PR is ready for review!

Thanks

leandrodamascena avatar Jan 25 '24 17:01 leandrodamascena

Hey, everyone! We will have an internal meeting with the AppSync Service team to discuss error handling when consuming an AppSync batch process. We want to hear from them about use cases and what challenges customers are facing.

After this meeting, I will update this PR. The expectation of having this feature in the next release remains, but I will update this PR if we have any additional news from this meeting.

leandrodamascena avatar Jan 29 '24 17:01 leandrodamascena

The meeting with the AppSync team has been rescheduled for Friday the 2nd and we will not have time to include this feature in this release. We plan to include it in the other release in 1 or 2 weeks.

leandrodamascena avatar Feb 01 '24 10:02 leandrodamascena

Hey @heitorlessa! Thanks for your suggestion about implementing fine-grained exception control per route. It makes a lot more sense and add flexibility when handling exceptions in specific routes.

I made the changes and pls you can review now.

leandrodamascena avatar Feb 07 '24 01:02 leandrodamascena

Thank you Le! self-assigning and looking into the afternoon to get the final doc touches.

heitorlessa avatar Feb 07 '24 09:02 heitorlessa