App icon indicating copy to clipboard operation
App copied to clipboard

[Performance] Create PING for Pusher Connection

Open muttmuure opened this issue 1 year ago • 4 comments

If you haven’t already, check out our contributing guidelines for onboarding and email [email protected] to request to join our Slack channel!


What performance issue do we need to solve?

Intermittently, Pusher becomes unreachable for reasons out of our control and the client needs to reconnect to it, which causes delays or failure to get realtime updates to chat data.

We can't count on Pusher's internal PING to detect liveness -- we need our own application layer Pusher-ping.

What is the impact of this on end-users?

A more reliable Pusher-PING means users get realtime updates to their New Expensify data more reliably.

List any benchmarks that show the severity of the issue

This is a recurring issue: https://expensify.slack.com/archives/C05LX9D6E07/p1731299637345689

Proposed solution (if any)

Develop our own application layer Pusher-PING that works in the same way as our existing PING code, but instead tests the Pusher connection.

List any benchmarks after implementing the changes to show impacts of the proposed solution (if any)

Note: These should be the same as the benchmarks collected before any changes.

Platforms:

Which of our officially supported platforms is this issue occurring on?

  • [ ] Android: Native
  • [ ] Android: mWeb Chrome
  • [ ] iOS: Native
  • [ ] iOS: mWeb Safari
  • [ ] MacOS: Chrome / Safari
  • [ ] MacOS: Desktop

Version Number: Reproducible in staging?: Y Reproducible in production?: Y Email or phone of affected tester (no customers): Logs: https://stackoverflow.com/c/expensify/questions/4856 Notes/Photos/Videos: Any additional supporting documentation Expensify/Expensify Issue URL: Issue reported by: @quinthar Slack conversation: https://expensify.slack.com/archives/C05LX9D6E07/p1731299637345689

View all open jobs on Upwork

muttmuure avatar Dec 10 '24 10:12 muttmuure

Auto-assigning issues to engineers is no longer supported. If you think this issue should receive engineering attention, please raise it in #whatsnext.

melvin-bot[bot] avatar Dec 10 '24 10:12 melvin-bot[bot]

Triggered auto assignment to @mallenexpensify (Bug), see https://stackoverflow.com/c/expensify/questions/14418 for more details. Please add this bug to a GH project, as outlined in the SO.

melvin-bot[bot] avatar Dec 10 '24 10:12 melvin-bot[bot]

https://github.com/Expensify/App/issues/52437

muttmuure avatar Dec 10 '24 10:12 muttmuure

Checking to see if Deeter wants assignment here. If not I'll add AutoAssignerNewDotQuality to assign an engineer since this is Quality, High status and the other issue is held on this.

Guessing it needs to be Internal

mallenexpensify avatar Dec 10 '24 21:12 mallenexpensify

Triggered auto assignment to @cristipaval (AutoAssignerNewDotQuality)

melvin-bot[bot] avatar Dec 13 '24 23:12 melvin-bot[bot]

@cristipaval , what's your level of knowledge with Pusher?!??! :) Added AutoAssignerNewDotQuality cuz it's Quality and status is HIGH

mallenexpensify avatar Dec 13 '24 23:12 mallenexpensify

I don't know much about how Pusher works under the hood. Feel free to recruit someone else if you don't want to wait for me to gather knowledge before I start implementing.

cristipaval avatar Dec 16 '24 08:12 cristipaval

Thanks @cristipaval , checking in #quality and #engineering to try to recruit someone.

mallenexpensify avatar Dec 18 '24 00:12 mallenexpensify

I'd like to take this, so I will assign it to myself.

tgolen avatar Dec 18 '24 15:12 tgolen

I'm actually going to put this one on HOLD until I finish my investigation in https://github.com/Expensify/App/issues/52437

tgolen avatar Dec 18 '24 20:12 tgolen

Still on HOLD, no update.

tgolen avatar Dec 30 '24 18:12 tgolen

Weekly Update

  • This is off HOLD now since I haven't gotten any help from Pusher to resolve the connection errors. I'll gather my wits and hopefully begin on this later in the week.

Next Steps

  • @tgolen formulate a plan for what this will look like
  • Begin implementing the ping

ETA

  • Friday, Jan 31

tgolen avatar Jan 14 '25 20:01 tgolen

Daily Update

  • I'm bumping this up to Daily and actively working on it
  • I've begun the initial frontend work to put a simple Ping together and created a draft App PR

Next Steps

  • @tgolen complete the frontend code for the PING event
  • @tgolen start work on the backend code to reply with a PONG event

ETA

  • Everything finished up by Tuesday, Jan 21

tgolen avatar Jan 15 '25 22:01 tgolen

Daily Update

  • I wasn't able to make progress on this today due to working on allocations and following the interview

tgolen avatar Jan 16 '25 22:01 tgolen

@tgolen, @mallenexpensify Uh oh! This issue is overdue by 2 days. Don't forget to update your issues!

melvin-bot[bot] avatar Jan 20 '25 09:01 melvin-bot[bot]

Daily Update

  • I still have not made any further progress on this
  • I also got assigned an API improve performance chore which has got to take priority over this one

Next Steps

  • @tgolen complete the frontend code for the PING event
  • @tgolen start work on the backend code to reply with a PONG event

ETA

  • All done by Friday, Jan. 31

tgolen avatar Jan 21 '25 16:01 tgolen

Daily Update

  • I created a Web-E PR to implement the necessary API
  • I put the App PR into full review with the completed solution

Next Steps

  • @arosiclair review and merge the Web-E PR
  • @tgolen get the App PR tested and reviewed

ETA

  • Everything merged by Friday, Jan 24

tgolen avatar Jan 21 '25 23:01 tgolen

Daily Update

  • The Web-E PR has been merged and deployed to staging
  • I have bumped the App PR to get it reviewed as it has not been reviewed yet

Next Steps

  • Get the App PR merged

ETA

  • Still Friday, Jan 24 (tomorrow)

tgolen avatar Jan 23 '25 23:01 tgolen

Daily Update

  • The App PR is in final stages of review and should be merged momentarily

Next Steps

  • Wait for the PR to be deployed
  • Check logs to ensure the ping is operated as I expect it to
  • Create a grafana chart to see how long the PINGs take

ETA

  • Not sure when the deploy would happen, but maybe by EOW

tgolen avatar Jan 27 '25 22:01 tgolen

Daily Update

  • The App PR is still in review and just needs a final review by an internal engineer

Next Steps

  • @MariaHCD review and merge the PR

ETA

  • Merged tomorrow, Jan. 29

tgolen avatar Jan 28 '25 22:01 tgolen

Daily Update

Next Steps

  • Wait for it to deploy to production
  • @tgolen look at some logs to ensure it's working like I expect it to
  • @tgolen create a grafana chart to expose the ping latency
  • @tgolen create a grafana chart to expose how often the pusher ping causes clients to go offline

ETA

  • Depending on the timeline of deploys, I imagine this can be closed out early next week.

tgolen avatar Jan 29 '25 23:01 tgolen

Reviewing label has been removed, please complete the "BugZero Checklist".

melvin-bot[bot] avatar Jan 31 '25 06:01 melvin-bot[bot]

The solution for this issue has been :rocket: deployed to production :rocket: in version 9.0.92-6 and is now subject to a 7-day regression period :calendar:. Here is the list of pull requests that resolve this issue:

  • https://github.com/Expensify/App/pull/55326

If no regressions arise, payment will be issued on 2025-02-07. :confetti_ball:

For reference, here are some details about the assignees on this issue:

  • @ishpaul777 requires payment (Needs manual offer from BZ)

melvin-bot[bot] avatar Jan 31 '25 06:01 melvin-bot[bot]

@ishpaul777 @mallenexpensify @ishpaul777 The PR fixing this issue has been merged! The following checklist (instructions) will need to be completed before the issue can be closed. Please copy/paste the BugZero Checklist from here into a new comment on this GH and complete it. If you have the K2 extension, you can simply click: [this button]

melvin-bot[bot] avatar Jan 31 '25 06:01 melvin-bot[bot]

Daily Update

  • This PR caused a deploy blocker because it was marking clients as being offline more than normal
  • I submitted a PR that only turned off the part of the ping that forces the client into offline mode

Next Steps

  • I need to dig in and analyze the logs to understand what was going on, then formulate a plan for how to prevent false alarms from happening

ETA

  • Friday, Feb. 7

tgolen avatar Jan 31 '25 20:01 tgolen

Update

  • Submitted a new E/App PR to relax some of the PONG requirements to try and dial in the PING PONG mechanism further
  • It still doesn't change the online/offline state of the client yet (that's currently disabled)
  • My goal is to keep an eye on logs and graphs to see if the behavior is less extreme

Next Steps

  • Have that PR deployed
  • @tgolen monitor logs and graphs

ETA

  • Next Friday, Feb. 14

tgolen avatar Feb 06 '25 15:02 tgolen

Payment Summary

Upwork Job

  • ROLE: @ishpaul777 paid $(AMOUNT) via Upwork (LINK)

BugZero Checklist (@mallenexpensify)

  • [ ] I have verified the correct assignees and roles are listed above and updated the neccesary manual offers
  • [ ] I have verified that there are no duplicate or incorrect contracts on Upwork for this job (https://www.upwork.com/ab/applicants//hired)
  • [ ] I have paid out the Upwork contracts or cancelled the ones that are incorrect
  • [ ] I have verified the payment summary above is correct

melvin-bot[bot] avatar Feb 07 '25 09:02 melvin-bot[bot]

I see payment was due a few days ago but I feel like there's more work to do. @ishpaul777 , Is there more than just this PR?

  • https://github.com/Expensify/App/pull/55326

mallenexpensify avatar Feb 10 '25 22:02 mallenexpensify

This is https://github.com/Expensify/App/pull/55326 the only one i reviewed,. @tgolen can you confirm if there will be more E/App PR planned where I can help? if no I think we can release the payment

ishpaul777 avatar Feb 11 '25 12:02 ishpaul777

There might be a couple more frontend PRs, but I think they should all be separate from this. I think it's OK to release payment here and close this out.

On Tue, Feb 11, 2025 at 5:52 AM Ishpaul Singh @.***> wrote:

This is #55326 https://github.com/Expensify/App/pull/55326 the only one i reviewed,. @tgolen https://github.com/tgolen can you confirm if there will be more E/App PR planned where I can help? if no I think we can release the payment

— Reply to this email directly, view it on GitHub https://github.com/Expensify/App/issues/53826#issuecomment-2650731727, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAJMAB3F7XM37RNUIOQMTYD2PHW7NAVCNFSM6AAAAABTKZJW3KVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDMNJQG4ZTCNZSG4 . You are receiving this because you were mentioned.Message ID: @.***>

tgolen avatar Feb 11 '25 16:02 tgolen