Why should every ObjectId's be found in sample datas ?
Hello,
I have some issues customizing collections with importField. I have 2 environments, staging and production. Models are the same on those environment, however on nodeJS server start, the agent doesn't find the same schema of the database. The production version is missing some relations between collections. I dug to find that we are looking to cover every ObjectId found in the sample data.
However from my understanding, the sample data are way to thin to have every relations as our database is close to 5k documents by collection.
https://github.com/ForestAdmin/agent-nodejs/blob/68e5a00d7b96e082171e9e32a04e3f7ca97a4ae6/packages/datasource-mongo/src/introspection/reference-candidates-verifier.ts#L40
I replaced every() by some() as we can not have collision on our ObjectIds in multiple collections and it works way better. Do you think it is something that could be in a next release ?
Thank you
Hello,
The code that you quoted is used to find which collection is related to another. We have some objectIds as input and we need to find to which collection they belongs. To do that, we take a sample of records from collection A and checks that every related object is from a same collection. This is to avoid linking the wrong collection if some objectId are shared between tables.
The cleanest solution for you would be to clean your database by removing the relations to deleted records.
If the dirty records are rare, you may just reduce the sample size so that just a few records are needed to identify related collections. I suggest you to do that on a new environment with the same data as the production to ensure all relations are well defined before deploying.
Thank you for your feedback ! Indeed some records may have dead links to deleted documents in other collection. However reducing the sample size doesn't help unfortunately, as the first documents (I noticed you use limit() ) are not always complete documents, as Mongo allow it.
How can an ObjectId be the same in multiple collections ? I don't see it as a good practice.
Thank's for your help.
Have you tried with referenceSampleSize=0 ? it disables the feature and skip the check
With referenceSampleSize=0 we miss a lot more relations unfortunately