datasketch icon indicating copy to clipboard operation
datasketch copied to clipboard

Synchronous Mongodb Storage

Open hsicsa opened this issue 2 years ago • 3 comments

While async mongodb storage is nice for scalability, some environments (e.g. Apache Beam) don't support async io. Synchronous Mongodb storage would satisfy that particular use case as well as facilitate testing for smaller data sets.

hsicsa avatar Mar 20 '23 20:03 hsicsa

Good point. Would love to get some help on this one.

ekzhu avatar Mar 21 '23 08:03 ekzhu

In the near term, would be be possible to wrap the async functionality in a synchronous wrapper? How would one do this in python?

hsicsa avatar Mar 23 '23 16:03 hsicsa

Perhaps the easiest thing to do is to implement a separate MongoDB storage layer. The code should be similar but simpler than the async MongoDB storage.

ekzhu avatar Mar 24 '23 19:03 ekzhu