firebase-functions-python icon indicating copy to clipboard operation
firebase-functions-python copied to clipboard

Firestore updates cause intermittent freezing of event triggers

Open DominicOrga opened this issue 1 year ago • 4 comments

I am currently testing the 2nd gen on-write Firestore cloud triggers. For some reason, the trigger freezes and stops executing after a few hours from deployment.

It appears that updating a Firestore document is causing the trigger to freeze.

firestore_bug

Environment details

  • OS type and version: MacOS Sonoma 14.3.
  • Python version: 3.12.2
  • pip version: 24.0
  • google-cloud-firestore version: 2.15.0

Steps to reproduce

Note that this bug is difficult to reproduce and would require some repetition to occur.

  1. Deploy the test_document_write HTTP request function and test_on_document_write on-write Firestore trigger.
  2. Run the test_document_write function.

This will create a test Firestore collection and will create or update 10 fixed Firestore documents that will trigger the test_on_document_write cloud trigger.

  1. Observe that the test_on_document_write will make the changes in the updated Firestore docs.
  2. Wait a couple of hours.
  3. Re-run the test_document_write function.
  4. Observe that the Firestore docs will not update due to the function freezing (as shown in the image above). If it did, repeat step 4.

Code example

test_document_write HTTP Request Function
  • This function creates 10 documents in the test collection, and increments their field a value.
from app.https.common.http_responses import success_response
from firebase_admin import firestore
from firebase_functions import https_fn, options
from google.cloud.firestore import Client, Increment


@https_fn.on_request(cors=options.CorsOptions(cors_methods=['post']))
def test_document_write(request: https_fn.Request) -> https_fn.Response:
    firestore_client: Client = firestore.client()

    batch = firestore_client.batch()

    for i in range(0, 10):
        batch.set(
            firestore_client.document(f'test/doc_{i}'),
            {'a': Increment(1)},
            merge=True,
        )

    batch.commit()

    return success_response()
test_on_document_write on-write Firestore trigger
  • This trigger ensures that field b always matches the value of field a.
from firebase_functions import logger
from firebase_functions.firestore_fn import (Change, DocumentSnapshot, Event,
                                             on_document_written)


@on_document_written(document='test/{id}')
def test_on_document_write(event: Event[Change[DocumentSnapshot]]):

    after = event.data.after

    if not after:
        return

    after_data = event.data.after.to_dict()
    a = after_data.get('a', 0)
    b = after_data.get('b', 0)

    if a != b:
        updated_data = {'a': a, 'b': a}

        logger.info(
            'Updating document: {}, Old data:{}, New data: {}'.format(
                event.document,
                after_data,
                updated_data,
            ),
        )

        # Occasionally, it appears that the code is freezing here, preventing the final logs from being printed
        after.reference.update(updated_data)

        logger.info(
            'Document updated: {}, Old data:{}, New data: {}'.format(
                event.document,
                after_data,
                updated_data,
            ),
        )

Stack trace

Here's the logs extracted from the GCP Logs Explorer. downloaded-logs-20240228-171306.json

DominicOrga avatar Mar 14 '24 06:03 DominicOrga

Update: This is also occurring on Firebase Storage triggers.

DominicOrga avatar Apr 02 '24 15:04 DominicOrga

I've also encountered the very same issue on a @https_fn.on_request function when calling it inside an event trigger: Screenshot 2024-04-10 at 5 13 51 PM

DominicOrga avatar Apr 10 '24 09:04 DominicOrga

I am observing this issue as well in a number of my cloud functions, but as stated, it's intermittent so I haven't found a reliable way to reproduce it. Generally happens within 1-3 days of deployment. A redeploy resolves it. Additional observations:

  1. I see the PUBSUB message in the log, indicating that the framework is attempting to invoke the function:

INFO 2024-04-10T15:25:38.904447Z [httpRequest.requestMethod: POST] [httpRequest.status: 200] [httpRequest.responseSize: 772 B] [httpRequest.latency: 59.996 s] [httpRequest.userAgent: APIs-Google; (+https://developers.google.com/webmasters/APIs-Google.html)] https://on-event-created-txulcecjpa-uc.a.run.app/?__GCP_CloudEventsMode=CE_PUBSUB_BINDING

  1. The log message above will have an httpRequest.latency of roughly whatever the timeout_sec of the function is set to, as if the framework is waiting on a deadlocked Python process and then gives up.

  2. Once in this state, even if the first line of the function is a simple debug message, it won't get invoked:

@firestore_fn.on_document_created(document="events/{doc_id}", timeout_sec=60, memory=options.MemoryOption.GB_1) def on_event_created(event: Event[DocumentSnapshot]) -> None: _log.info(f'Received event: {event.document}') handle_event(event)

andy-tmpt-me avatar Apr 11 '24 12:04 andy-tmpt-me

I have not found a "smoking gun" yet or a test scenario that easily reproduces the issue. I can say though that I was able to reduce the occurrence of deadlock in my own functions by reducing calls that GET the document that is the subject of the trigger. For example, rather than passing the document by path and calling:

path = event.document
...many layers of code...
doc = db.document(path).get()

...I am using event.data.to_dict() (or event.data.after.to_dict()) and passing that data through the stack instead. Am continuing to try to reproduce using test cases that attempt to re-read the document from within the trigger.

andy-tmpt-me avatar Apr 17 '24 15:04 andy-tmpt-me