flow-go icon indicating copy to clipboard operation
flow-go copied to clipboard

Events pagination when requesting events for a transaction or a block

Open vishalchangrani opened this issue 3 years ago • 8 comments

Problem Definition

What problem is this solving?

The access node requests events from the Execution node to server the GetTransactionResult call. The number of events that a transaction can emit is unbounded. Some transactions can emit a very large number of events. One example of such a transaction is the rewards pay transaction which emits one event for every delegator and staker in the network. Transaction events can result in a large payload size of the response message from the EN to the AN and from the AN to the client. Recently, the payload size exceeded the 16MB default gRPC limit for the rewards payout transaction aa890ff09415b12005a0c233d09282abec26aaa8c42990259f3b4fc30d50f0d2, and we had to bump up the gRPC limit to 20MB on the AN, EN and the Flow cli.

Proposed Solution

What are the proposed solutions to this problem?

Instead of sending all the events back for a transaction or a block, a pagination scheme should be introduced between the Access node and the Execution node such that when the number of events is beyond a certain threshold, the access node can ask for the events piecemeal.

The pagination between the client and the access node will be a good-to-have extension of this implementation.

Definition of Done

What tests must pass for this issue to be considered solved? A transaction which has a large number of events should be retrievable from the Access node.

Actions Needed Before Submitting

Update ticket status using the following (remove this section once ticket created)

  • What workstream does this ticket deal with? Find the appropriate 'S-' label and add that label.
  • Is it a specific 'type' of ticket (ex: bug, documentation)? If yes, add that label as well.
  • Is this ticket related to an overarching theme (ex: architecture, performance)? If yes, add that label as well.
  • Add any/all descriptive characteristic labels needed (ex: Needs Estimation, Needs Test Cases).
  • Now we should determine what release this ticket is associated with. If none, leave it blank. If it is associated with a specific release, please add it to the appropriate release.
  • If this ticket is associated with a release, we want to assign it a level of importance within that release. These labels follow the standard MoSCoW method rules. We want to look at releases and then the importance of tickets within those specific releases. So the MoSCoW label is ONLY valid when it is taken in conjunction with its release.
  • Assign this ticket a priority level (High, Medium, Low) via the appropriate label. These labels control the importance of the ticket within the sprint. For example, all P-High tickets should be worked on first, then P-Medium, then P-Low. This gives us an easy way to identify the order of priority for tickets within a specific sprint.

vishalchangrani avatar Dec 01 '22 22:12 vishalchangrani

Streaming may be a better solution that pagination (as discussed earlier in the post mortem meeting)

vishalchangrani avatar Dec 06 '22 21:12 vishalchangrani

We have hit this problem again 25th Jan 2023: https://dapperlabs.pagerduty.com/incidents/Q1UZFW140O1LUM, https://dapperlabs.slack.com/archives/CEEGK3HGC/p1674663706807779

j1010001 avatar Jan 25 '23 16:01 j1010001

Mainnet ran into this issue again today for transaction:03aa46047cdadfcf7ee23ee86cd53064e05f8b5f8a6f570e9f53b2744eddbee4

error: code = Internal desc = failed to retrieve result from execution node: 3 errors occurred: * rpc error: code = ResourceExhausted desc = grpc: trying to send message larger than max (17689076 vs. 16777216) * rpc error: code = ResourceExhausted desc = grpc: trying to send message larger than max (17689076 vs. 16777216) * rpc error: code = ResourceExhausted desc = grpc: trying to send message larger than max (17689076 vs. 16777216)

vishalchangrani avatar Jan 25 '23 16:01 vishalchangrani

So there is a limit on the event payload for tx that does not bypass these (sa) so i think this is a problem that will pretty much never happen.

if we do this can we set the pagination limit pretty high.

bjartek avatar Jan 25 '23 23:01 bjartek

FYI there is an ongoing discussion on Flow's Slack about how the new event streaming API could potentially solve this issue.

diaswrd avatar Jan 26 '23 17:01 diaswrd

Meeting notes from Today's discussion

Short term solution - Will this be a problem in the upcoming multi-sign?

  • rpc error: code = ResourceExhausted desc = grpc: trying to send message larger than max (17689076 vs. 16777216)
  • So the upcoming multi-sign may be ok.
  • Could leverage Envoy to bump up the limits - High limit on the server, small and configurable limit on Envoy
  • PR from Peter to make the limit configurable - https://github.com/onflow/flow-go/pull/3855

Mid term solution -

  • When CCF is ready, the payload transferred on the wire will go down dramatically.
  • CCF will require AN changes.
  • May be possible to deploy it using a height-coordinated upgrade. OR just next spork (April 2023).
  • Even with CCF we will still need a solution to scale.
  • Another solution -
    • AN already has the event data from the Exe sync.
    • AN could serve the data using a different API with higher limits or pagination.
    • Has some overhead and eventually execution sync will get the data anyways. If the short term and CCF solution is not enough then we may pursue this.

Long term solution -

  • Splitting the rewards payout transaction into multiple transactions to comply with the general limit.
    • Could be split it in a way that makes it easier to figure out rewards for a node.
    • Josh, Kshitij, Jerome
  • Access API changes to include streaming API.

Need a Separate issue to deal with the rewards payout to be a multiple transactions instead of a single one.

vishalchangrani avatar Jan 26 '23 22:01 vishalchangrani

@vishalchangrani: rewards payout to be a multiple transactions instead of a single one - https://github.com/onflow/flow-core-contracts/issues/326

franklywatson avatar Jan 26 '23 23:01 franklywatson

The events steaming API is now live, CCF is deployed, and the default max message sizes were updated to 1 GB. Since those changes, we haven't run into issues with oversized payloads. However, the number of events in the rewards tx will continue to grow so we will need to have a solution that scales beyond the current limits.

peterargue avatar Apr 23 '24 14:04 peterargue

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

github-actions[bot] avatar Sep 03 '24 01:09 github-actions[bot]