loopback-next icon indicating copy to clipboard operation
loopback-next copied to clipboard

Efficient Bulk Create/Insert With One Database Call

Open shadyanwar opened this issue 6 years ago • 12 comments

Suggestion

The createAll() method provides the ability to bulk create records, however, it is translated into multiple insert statements since the dao layer inserts records one by one. I suggested that, in MySQL for example, LB4 uses the multiple-row INSERT syntax which aims at reducing the communication overhead between the client and the server.

Use Cases

Less server overhead and better performance. In case of inserting around 10k records, the performance impact due to the current implementation is huge. A query might take something like 10 seconds or more and raise the server utilization extensively. Whereas with the multiple-row INSERT syntax the same query could take something like 1 second.

Examples

Optimally, instead of:

INSERT INTO `tbl`(`fld1`,`fld2`,`fld3`,`fld4`) VALUES(1,2,3,4);
INSERT INTO `tbl`(`fld1`,`fld2`,`fld3`,`fld4`) VALUES(10,20,30,40);
INSERT INTO `tbl`(`fld1`,`fld2`,`fld3`,`fld4`) VALUES(100,200,300,400);

We would have: INSERT INTO `tbl`(`fld1`,`fld2`,`fld3`,`fld4`) VALUES(1,2,3,4),(10,20,30,40),(100,200,300,400);

Acceptance criteria

TBD - will be filled by the team.

shadyanwar avatar Jul 15 '19 06:07 shadyanwar

I am agree with the point of @shadyanwar . I am also looking for the same .

pankajcheema avatar Feb 10 '20 06:02 pankajcheema

any update on the same ?

pankajcheema avatar Feb 10 '20 06:02 pankajcheema

Talked to @jannyHou, this will be at the connector level. @shadyanwar @pankajcheema, which connector are you using?

dhmlau avatar Jun 10 '20 14:06 dhmlau

Talked to @jannyHou, this will be at the connector level. @shadyanwar @pankajcheema, which connector are you using?

@dhmlau MySQL

shadyanwar avatar Jun 10 '20 15:06 shadyanwar

@dhmlau I’m happy to help

shadyanwar avatar Dec 25 '20 13:12 shadyanwar

I'm using Mongodb facing the same issue.... have any one found the solution?

dishantr16 avatar Feb 23 '21 07:02 dishantr16

@dhmlau @achrinza This is a very significant issue. We fixed it by overriding the createAll method in repository. But that's not right, because I have to do this at multiple places. I can create a mixin and do this too, but since it might be a performance bug for most of the databases, should we not have it in LB4 itself ?

Having said that, I tried to look for a solution to this in connector level. I checked postgres connector, for instance, but there was no reference of createAll here. Then I checked its base class in loopback-connector. There also I can only see create method. I checked juggler-bridge then. There I saw the createAll method making an internal call to create method. The create method invokes modelclass.create, which eventually invokes the connector create method. See below.

async createAll(entities: DataObject<T>[], options?: Options): Promise<T[]> {
    // perform persist hook
    const data = await Promise.all(
      entities.map(e => this.entityToData(e, options)),
    );
    const models = await ensurePromise(this.modelClass.create(data, options));
    return this.toEntities(models);
  }

I am basically trying to raise a PR to have this capability of either using create method multiple times or using insert query for multiple records. This definitely should go to connectors actually. Can you confirm, I have to modify the juggler bridge function first and then move to connectors, or am I getting it wrong ?

samarpanB avatar Jun 24 '22 07:06 samarpanB

A summary of the code flow is:

  1. LB4 Repository calls Juggler Model Class .create
  2. Juggler Model Class gets the above method (and others) from being mixin-ed Juggler DataAccessObject by default
  3. Juggler DataAccessObject .create calls the Juggler SQL Connectors' .create
  4. The Juggler SQL Connectors inherit from loopback-connector's SQLConnector

So:

  1. "crux" of the issue is that the connector needs to implement .createAll.
  2. Since the SQL Connectors inherit from a common SQLConnector, updating that should be sufficient (assuming the SQL dialect isn't too different).
  3. The Juggler DataAccessObject also needs to be updated to implement .createAll, which would be mixin-ed into the Juggler Model Class during runtime
  4. Finally, LB4 DefaultCrudRepository's .createAll needs to be updated in a backwards-compatible way.

achrinza avatar Jun 26 '22 17:06 achrinza

Thanks @achrinza . However, I think mongodb connector also uses the same sql connector. Mongo DB has different query for insert multiple. I'll check and confirm.

samarpan-b avatar Jun 27 '22 04:06 samarpan-b

@samarpan-b The MongoDB connector inherits from Connector instead, hence it should not be affected by updates to SQLConnector:

https://github.com/loopbackio/loopback-connector-mongodb/blob/51d132a48c0b559978a3d379a9695ff6bc1d888a/lib/mongodb.js#L211

Though it does implement the necessary CRUD functions to be compatible with Juggler DataAccessObject.

achrinza avatar Jun 27 '22 05:06 achrinza

@achrinza I checked if the multiple insert in one query dialect is available in different databases or not and also if its available, is it same or not ? I observed that only postgresql, mysql, sql server have a common dialect. Oracle supports multiple insert in one query but dialect is different and cassandra doesn't even support it.

So here is what I am thinking

  1. Add createAll method to connector
  2. Within createAll, I will invoke buildInsertAll method but I'll keep its implementation abstract (or empty, like buildReturning method), leaving the implementation to child connectors.
  3. Put a check if buildInsertAll is not implemented, then invoke create in loop
  4. Implement buildInsertAll in postgres, mysql, sql server, oracle connectors
  5. Update Juggler DataAccessObject implement createAll, which would be mixin-ed into the Juggler Model Class during runtime.
  6. Update LB4 DefaultCrudRepository's createAll

samarpanB avatar Jul 03 '22 08:07 samarpanB

Point 1 to 5 are done and PRs are raised. @achrinza please do take a look.

samarpanB avatar Sep 15 '22 11:09 samarpanB

Point 6 is also done. But its in draft. We need connector and juggler to release first. Then only I can update dependency in @loopback/repository.

samarpanB avatar Sep 25 '22 19:09 samarpanB

This is now released for postgresql in version 6.0.0 of loopback-connector-postgresql. See here - https://github.com/loopbackio/loopback-connector-postgresql/blob/master/CHANGES.md

samarpanB avatar Dec 01 '22 07:12 samarpanB