spark
spark copied to clipboard
Cannot obtain resource count from API ( _summary=count ) when using MongoDB if matched resources count is moderately large (roughly 170k+)
Describe the bug
Spark returns an internal server error when a ?_summary=count request would find a moderately large number of resources. (Exact count is unknown but observed when the applicable resources were between 170,000 and 225,000.)
Captured Message from OperationOutcome:
Operation was unsuccessful, and returned status InternalServerError. OperationOutcome: Overall result: FAILURE (2 errors and 0 warnings)
[ERROR] (no details)(further diagnostics: BsonSerializationException: An error occurred while serializing the Keys property of class Spark.Engine.Core.Snapshot: Size 17094593 is larger than MaxDocumentSize 16777216.)
[ERROR] (no details)(further diagnostics: FormatException: Size 17094593 is larger than MaxDocumentSize 16777216.)
.
To Reproduce Steps to reproduce the behavior:
- Seed a Spark server using MongoDB with 225,000
Patientresources. (Exact count needed is unknown, but the minimum is> 150,000.) - Submit a FHIR
GETrequest to obtain thePatientcount at the server'sPatientAPI (i.e.,<fhirRootUri>/Patient?_summary=count) - Observe failure.
Expected behavior
Expected to receive a Bundle response containing only the count of Patient resources.
Spark version
- Version:
2.3.4
Operating system + Database
- OS: Linux
- Database: MongoDB
Container service / Cloud infrastructure:
- Container service: Elastic Container Service
- Cloud provider: AWS
- Cloud infrastructure: Docker container hosted on AWS EC2 in ECS
- Database as a service: AWS DocumentDB w/ MongoDB API
Additional context This persistence error appears directly related to search snapshot persistence. This error does not prevent inserting resources or preclude searches which return small result sets.