OneSignal-Android-SDK icon indicating copy to clipboard operation
OneSignal-Android-SDK copied to clipboard

WIP: [Improvement] make part of operationrepo initialization async

Open jinliu9508 opened this issue 1 year ago • 2 comments

Description

One Line Summary

Make part of the initialization of OperationRepo asynchronous so that previously saved operations can be added asynchronously, preventing long-loading operations from blocking the main thread.

Details

Motivation

We have observed numerous ANRs during the initialization phase, with OperationRepo.init being the top cause. This issue does not occur consistently, and we suspect it may be related to the device's state or having a problem accessing device's disk. To address this, we plan to make the initialization process asynchronous in OperationRepo. By moving the loading part to a background thread, we aim to prevent the main thread from being blocked when the initialization process unexpectedly takes a long time.

Scope

Saved operations from previous session will not be executed until they are loaded successfully. The order may be incorrect depends on the timing of the loading completion This change will try to insert saved operations starting from the beginning of the queue, and any later operation will be added to the end of the queue.

Testing

Unit testing

OPTIONAL - Explain unit tests added, if not clear in the code.

Manual testing

RECOMMEND - OPTIONAL - Explain what scenarios were tested and the environment. Example: Tested opening a notification while the app was foregrounded, app build with Android Studio 2020.3 with a fresh install of the OneSignal example app on a Pixel 6 with Android 12.

Affected code checklist

  • [ ] Notifications
    • [ ] Display
    • [ ] Open
    • [ ] Push Processing
    • [ ] Confirm Deliveries
  • [ ] Outcomes
  • [ ] Sessions
  • [ ] In-App Messaging
  • [ ] REST API requests
  • [ ] Public API changes

Checklist

Overview

  • [ ] I have filled out all REQUIRED sections above
  • [ ] PR does one thing
    • If it is hard to explain how any codes changes are related to each other then it most likely needs to be more than one PR
  • [ ] Any Public API changes are explained in the PR details and conform to existing APIs

Testing

  • [ ] I have included test coverage for these changes, or explained why they are not needed
  • [ ] All automated tests pass, or I explained why that is not possible
  • [ ] I have personally tested this on my device, or explained why that is not possible

Final pass

  • [ ] Code is as readable as possible.
    • Simplify with less code, followed by splitting up code into well named functions and variables, followed by adding comments to the code.
  • [ ] I have reviewed this PR myself, ensuring it meets each checklist item
    • WIP (Work In Progress) is ok, but explain what is still in progress and what you would like feedback on. Start the PR title with "WIP" to indicate this.

This change is Reviewable

jinliu9508 avatar Apr 29 '24 21:04 jinliu9508

We need to delay OperationModelStore.load() as well, as this is what does the disk read. See this ANR stack trace:

       at com.onesignal.common.modeling.Model.initializeFromJson(Model.kt:98)
       at com.onesignal.core.internal.operations.impl.OperationModelStore.create(OperationModelStore.kt:68)
       at com.onesignal.core.internal.operations.impl.OperationModelStore.create(OperationModelStore.kt:30)
       at com.onesignal.common.modeling.ModelStore.load(ModelStore.kt:162)
       at com.onesignal.core.internal.operations.impl.OperationModelStore.<init>(OperationModelStore.kt:32)
       at java.lang.reflect.Constructor.newInstance0(Native method)
       at java.lang.reflect.Constructor.newInstance(Constructor.java:343)
       at com.onesignal.common.services.ServiceRegistrationReflection.resolve(ServiceRegistration.kt:89)
       at com.onesignal.common.services.ServiceProvider.getServiceOrNull(ServiceProvider.kt:79)
       at com.onesignal.common.services.ServiceProvider.getService(ServiceProvider.kt:67)
       at com.onesignal.common.services.ServiceRegistrationReflection.resolve(ServiceRegistration.kt:82)
       at com.onesignal.common.services.ServiceProvider.getServiceOrNull(ServiceProvider.kt:79)
       at com.onesignal.common.services.ServiceProvider.getService(ServiceProvider.kt:67)
       at com.onesignal.internal.OneSignalImp.initWithContext(OneSignalImp.kt:510)
       at com.onesignal.OneSignal.initWithContext(OneSignal.kt:135)

So the order of operations of ServiceProvider creating instances of classes is it goes deep first and works its way back up. So in this case since OperationRepo requires an instance of ConfigModelStore as part of it's constructor, an instance of ConfigModelStore is created before OperationRepo.

Since load() is a genetic function from ModelStore, should we delay all model stores or limit the change to OperationModelStore only?

Also, both load() and persist() may be locking the models for longer than needed, especially they include the access to the preference service inside the synchronized block. Do you think we can also introduce a little optimization along with this issue?

jinliu9508 avatar Apr 30 '24 19:04 jinliu9508

Since load() is a genetic function from ModelStore, should we delay all model stores or limit the change to OperationModelStore only?

Longer term we probably want change ModelStore, so none of the models read from disk in the constructor. Or ensure we never create these instances on the main thread. In the short term, to get a quick fix out, scoping it to only OperationModelStore is probably what we should do for now.

Also, both load() and persist() may be locking the models for longer than needed, especially they include the access to the preference service inside the synchronized block. Do you think we can also introduce a little optimization along with this issue?

Ya we could make those changes in this PR as well.

jkasten2 avatar Apr 30 '24 19:04 jkasten2

@jinliu9508 I believe this PR will break RecoverFromDroppedLoginBug.kt. As when it calls OperationRepo.containsInstanceOf() it assumes it will already have loaded all the save operations from disk. Can you address this in a follow up PR?

jkasten2 avatar May 02 '24 21:05 jkasten2