Feature: implement database subseting for MongoDB
Implement database subsetting for MongoDB as we did for PostgreSQL..
However, MongoDB is not a relational database and we need to support "Virtual Foreign Key". Meaning, as a user I want to indicate that column collection_a.post_id is linked to column collection_b.id and then keep the consistency across the collections.
Meaning, as a user I want to indicate that column collection_a.post_id is linked to column collection_b.id
How would you indicate that as a user? If there isn't a known standard in MongoDB for such behavior, I'm not really sure how we can add support for this concept.
From my experience, developers using MongoDB (and any NoSQL db) end up managing relations between collections (MongoDB table concept) from their code. We can add a way to declare virtual relations between tables in the YAML file. Eg.
source:
connection_uri: postgres://root:password@localhost:5432/root
database_subset:
database: public
table: orders
strategy_name: random
strategy_options:
percent: 50
passthrough_tables:
- us_states
virtual_relations:
- from_table: collection_a
from_column: post_id
- to_table: collection_b
to_column: id
WDYT?
This could work, but I think it'd be best if we'd limit this to IDs only, i.e you can only reference from a field of type bson::oid::ObjectId to another field of type bson::oid::ObjectId .
Another thing to keep in mind is that the MongoDB dump parser works differently from the postgres one, in the sense that postgres uses query strings to build its DB while Mongo actually builds all of the DB in memory from the archive dump. This will probably have an impact on the way the subsetting strategy should be implemented for MongoDB.