Cannot obtain resource count from API ( _summary=count ) when using MongoDB if matched resources count is moderately large (roughly 170k+)

Open JeremiahSanders opened this issue 10 months ago • 0 comments

Describe the bug Spark returns an internal server error when a ?_summary=count request would find a moderately large number of resources. (Exact count is unknown but observed when the applicable resources were between 170,000 and 225,000.)

Captured Message from OperationOutcome:

Operation was unsuccessful, and returned status InternalServerError. OperationOutcome: Overall result: FAILURE (2 errors and 0 warnings)
[ERROR] (no details)(further diagnostics: BsonSerializationException: An error occurred while serializing the Keys property of class Spark.Engine.Core.Snapshot: Size 17094593 is larger than MaxDocumentSize 16777216.)
[ERROR] (no details)(further diagnostics: FormatException: Size 17094593 is larger than MaxDocumentSize 16777216.)
.

To Reproduce Steps to reproduce the behavior:

Seed a Spark server using MongoDB with 225,000 Patient resources. (Exact count needed is unknown, but the minimum is > 150,000.)
Submit a FHIR GET request to obtain the Patient count at the server's Patient API (i.e., <fhirRootUri>/Patient?_summary=count)
Observe failure.

Expected behavior Expected to receive a Bundle response containing only the count of Patient resources.

Spark version

Version: 2.3.4

Operating system + Database

OS: Linux
Database: MongoDB

Container service / Cloud infrastructure:

Container service: Elastic Container Service
Cloud provider: AWS
Cloud infrastructure: Docker container hosted on AWS EC2 in ECS
Database as a service: AWS DocumentDB w/ MongoDB API

Additional context This persistence error appears directly related to search snapshot persistence. This error does not prevent inserting resources or preclude searches which return small result sets.

Jun 27 '25 17:06 JeremiahSanders