Firestore updates cause intermittent freezing of event triggers
I am currently testing the 2nd gen on-write Firestore cloud triggers. For some reason, the trigger freezes and stops executing after a few hours from deployment.
It appears that updating a Firestore document is causing the trigger to freeze.
Environment details
- OS type and version: MacOS Sonoma 14.3.
- Python version: 3.12.2
- pip version: 24.0
-
google-cloud-firestoreversion: 2.15.0
Steps to reproduce
Note that this bug is difficult to reproduce and would require some repetition to occur.
- Deploy the
test_document_writeHTTP request function andtest_on_document_writeon-write Firestore trigger. - Run the
test_document_writefunction.
This will create a
testFirestore collection and will create or update 10 fixed Firestore documents that will trigger thetest_on_document_writecloud trigger.
- Observe that the
test_on_document_writewill make the changes in the updated Firestore docs. - Wait a couple of hours.
- Re-run the
test_document_writefunction. - Observe that the Firestore docs will not update due to the function freezing (as shown in the image above). If it did, repeat step 4.
Code example
test_document_write HTTP Request Function
- This function creates 10 documents in the
testcollection, and increments their fieldavalue.
from app.https.common.http_responses import success_response
from firebase_admin import firestore
from firebase_functions import https_fn, options
from google.cloud.firestore import Client, Increment
@https_fn.on_request(cors=options.CorsOptions(cors_methods=['post']))
def test_document_write(request: https_fn.Request) -> https_fn.Response:
firestore_client: Client = firestore.client()
batch = firestore_client.batch()
for i in range(0, 10):
batch.set(
firestore_client.document(f'test/doc_{i}'),
{'a': Increment(1)},
merge=True,
)
batch.commit()
return success_response()
test_on_document_write on-write Firestore trigger
- This trigger ensures that field
balways matches the value of fielda.
from firebase_functions import logger
from firebase_functions.firestore_fn import (Change, DocumentSnapshot, Event,
on_document_written)
@on_document_written(document='test/{id}')
def test_on_document_write(event: Event[Change[DocumentSnapshot]]):
after = event.data.after
if not after:
return
after_data = event.data.after.to_dict()
a = after_data.get('a', 0)
b = after_data.get('b', 0)
if a != b:
updated_data = {'a': a, 'b': a}
logger.info(
'Updating document: {}, Old data:{}, New data: {}'.format(
event.document,
after_data,
updated_data,
),
)
# Occasionally, it appears that the code is freezing here, preventing the final logs from being printed
after.reference.update(updated_data)
logger.info(
'Document updated: {}, Old data:{}, New data: {}'.format(
event.document,
after_data,
updated_data,
),
)
Stack trace
Here's the logs extracted from the GCP Logs Explorer. downloaded-logs-20240228-171306.json
Update: This is also occurring on Firebase Storage triggers.
I've also encountered the very same issue on a @https_fn.on_request function when calling it inside an event trigger:
I am observing this issue as well in a number of my cloud functions, but as stated, it's intermittent so I haven't found a reliable way to reproduce it. Generally happens within 1-3 days of deployment. A redeploy resolves it. Additional observations:
- I see the PUBSUB message in the log, indicating that the framework is attempting to invoke the function:
INFO 2024-04-10T15:25:38.904447Z [httpRequest.requestMethod: POST] [httpRequest.status: 200] [httpRequest.responseSize: 772 B] [httpRequest.latency: 59.996 s] [httpRequest.userAgent: APIs-Google; (+https://developers.google.com/webmasters/APIs-Google.html)] https://on-event-created-txulcecjpa-uc.a.run.app/?__GCP_CloudEventsMode=CE_PUBSUB_BINDING
-
The log message above will have an httpRequest.latency of roughly whatever the timeout_sec of the function is set to, as if the framework is waiting on a deadlocked Python process and then gives up.
-
Once in this state, even if the first line of the function is a simple debug message, it won't get invoked:
@firestore_fn.on_document_created(document="events/{doc_id}", timeout_sec=60, memory=options.MemoryOption.GB_1) def on_event_created(event: Event[DocumentSnapshot]) -> None: _log.info(f'Received event: {event.document}') handle_event(event)
I have not found a "smoking gun" yet or a test scenario that easily reproduces the issue. I can say though that I was able to reduce the occurrence of deadlock in my own functions by reducing calls that GET the document that is the subject of the trigger. For example, rather than passing the document by path and calling:
path = event.document
...many layers of code...
doc = db.document(path).get()
...I am using event.data.to_dict() (or event.data.after.to_dict()) and passing that data through the stack instead. Am continuing to try to reproduce using test cases that attempt to re-read the document from within the trigger.