Efficient Bulk Create/Insert With One Database Call
Suggestion
The createAll() method provides the ability to bulk create records, however, it is translated into multiple insert statements since the dao layer inserts records one by one. I suggested that, in MySQL for example, LB4 uses the multiple-row INSERT syntax which aims at reducing the communication overhead between the client and the server.
Use Cases
Less server overhead and better performance. In case of inserting around 10k records, the performance impact due to the current implementation is huge. A query might take something like 10 seconds or more and raise the server utilization extensively. Whereas with the multiple-row INSERT syntax the same query could take something like 1 second.
Examples
Optimally, instead of:
INSERT INTO `tbl`(`fld1`,`fld2`,`fld3`,`fld4`) VALUES(1,2,3,4);
INSERT INTO `tbl`(`fld1`,`fld2`,`fld3`,`fld4`) VALUES(10,20,30,40);
INSERT INTO `tbl`(`fld1`,`fld2`,`fld3`,`fld4`) VALUES(100,200,300,400);
We would have:
INSERT INTO `tbl`(`fld1`,`fld2`,`fld3`,`fld4`) VALUES(1,2,3,4),(10,20,30,40),(100,200,300,400);
Acceptance criteria
TBD - will be filled by the team.
I am agree with the point of @shadyanwar . I am also looking for the same .
any update on the same ?
Talked to @jannyHou, this will be at the connector level. @shadyanwar @pankajcheema, which connector are you using?
Talked to @jannyHou, this will be at the connector level. @shadyanwar @pankajcheema, which connector are you using?
@dhmlau MySQL
@dhmlau I’m happy to help
I'm using Mongodb facing the same issue.... have any one found the solution?
@dhmlau @achrinza This is a very significant issue. We fixed it by overriding the createAll method in repository. But that's not right, because I have to do this at multiple places. I can create a mixin and do this too, but since it might be a performance bug for most of the databases, should we not have it in LB4 itself ?
Having said that, I tried to look for a solution to this in connector level. I checked postgres connector, for instance, but there was no reference of createAll here. Then I checked its base class in loopback-connector. There also I can only see create method. I checked juggler-bridge then. There I saw the createAll method making an internal call to create method. The create method invokes modelclass.create, which eventually invokes the connector create method. See below.
async createAll(entities: DataObject<T>[], options?: Options): Promise<T[]> {
// perform persist hook
const data = await Promise.all(
entities.map(e => this.entityToData(e, options)),
);
const models = await ensurePromise(this.modelClass.create(data, options));
return this.toEntities(models);
}
I am basically trying to raise a PR to have this capability of either using create method multiple times or using insert query for multiple records. This definitely should go to connectors actually. Can you confirm, I have to modify the juggler bridge function first and then move to connectors, or am I getting it wrong ?
A summary of the code flow is:
- LB4 Repository calls Juggler Model Class
.create - Juggler Model Class gets the above method (and others) from being mixin-ed Juggler DataAccessObject by default
- Juggler DataAccessObject
.createcalls the Juggler SQL Connectors'.create - The Juggler SQL Connectors inherit from
loopback-connector'sSQLConnector
So:
- "crux" of the issue is that the connector needs to implement
.createAll. - Since the SQL Connectors inherit from a common
SQLConnector, updating that should be sufficient (assuming the SQL dialect isn't too different). - The Juggler DataAccessObject also needs to be updated to implement
.createAll, which would be mixin-ed into the Juggler Model Class during runtime - Finally, LB4 DefaultCrudRepository's
.createAllneeds to be updated in a backwards-compatible way.
Thanks @achrinza . However, I think mongodb connector also uses the same sql connector. Mongo DB has different query for insert multiple. I'll check and confirm.
@samarpan-b The MongoDB connector inherits from Connector instead, hence it should not be affected by updates to SQLConnector:
https://github.com/loopbackio/loopback-connector-mongodb/blob/51d132a48c0b559978a3d379a9695ff6bc1d888a/lib/mongodb.js#L211
Though it does implement the necessary CRUD functions to be compatible with Juggler DataAccessObject.
@achrinza I checked if the multiple insert in one query dialect is available in different databases or not and also if its available, is it same or not ? I observed that only postgresql, mysql, sql server have a common dialect. Oracle supports multiple insert in one query but dialect is different and cassandra doesn't even support it.
So here is what I am thinking
- Add
createAllmethod to connector - Within
createAll, I will invokebuildInsertAllmethod but I'll keep its implementation abstract (or empty, likebuildReturningmethod), leaving the implementation to child connectors. - Put a check if
buildInsertAllis not implemented, then invoke create in loop - Implement
buildInsertAllin postgres, mysql, sql server, oracle connectors - Update Juggler DataAccessObject implement
createAll, which would be mixin-ed into the Juggler Model Class during runtime. - Update LB4 DefaultCrudRepository's
createAll
Point 1 to 5 are done and PRs are raised. @achrinza please do take a look.
Point 6 is also done. But its in draft. We need connector and juggler to release first. Then only I can update dependency in @loopback/repository.
This is now released for postgresql in version 6.0.0 of loopback-connector-postgresql. See here - https://github.com/loopbackio/loopback-connector-postgresql/blob/master/CHANGES.md