tortoise-orm icon indicating copy to clipboard operation
tortoise-orm copied to clipboard

Bulk Create without returning objects

Open Abdeldjalil-H opened this issue 1 year ago • 9 comments

The main problem here is that bulk_create converts objects to a list before insertion which loads all the models at the same time. The only reason for this is to return the inserted objects after the query is done. However, those objects are not being updated (adding pk for example), i.e the mothod will return what you sent. I think we can enhance this and make bulk_create benefits from #1613 .

Describe the solution you'd like There are several approaches for this:

  1. Do not return anything from bulk_create or return the number of inserted rows. This maybe a breaking change.
  2. Add a keyword argument called return_objects or something like that to control the behaviour.
  3. Do not return anything when batch_size is set.

If none of the above seems logical, I think at least we should mention what is going behind the scenes in the docs, so the user can handle chunking on his own. And maybe use tuple(objects) instead of list(objects).

Describe alternatives you've considered The most natural alternative is to chunk data on your side and leave batch_size as None. But this makes batch_size useless.

Abdeldjalil-H avatar May 14 '24 10:05 Abdeldjalil-H

Hi!

I think ideally would be to support id population for backends that allow it to do in same query - as with RETURNING in postgres But not sure I will be able to do it in any near time

As right now returning objects doesn't bring any benefits, as we return what we got in - I don't see much value in this return and we can change it

I think most explicit way to do that - would be for now just removing that return object, allowing to iterate through models without loading them all in memory

Next step could be adding return_objects param and allowing it only for supported databases, where we would return updated objects from db in same query

abondar avatar May 14 '24 10:05 abondar

Hi @abondar . I can make a PR on that, removing the returned object.

Abdeldjalil-H avatar May 14 '24 10:05 Abdeldjalil-H

If you could - I would gladly help you with reviewing and merging it

abondar avatar May 14 '24 11:05 abondar

Hi!

I think ideally would be to support id population for backends that allow it to do in same query - as with RETURNING in postgres But not sure I will be able to do it in any near time

As right now returning objects doesn't bring any benefits, as we return what we got in - I don't see much value in this return and we can change it

I think most explicit way to do that - would be for now just removing that return object, allowing to iterate through models without loading them all in memory

Next step could be adding return_objects param and allowing it only for supported databases, where we would return updated objects from db in same query

I think returning objects may be useful when it comes to the following situations, or there're better ways to do it?

class Movie(Model):
    mv_id = fields.CharField(max_length=20, pk=True)
    search_logs: fields.ManyToManyRelation["SearchLog"]
    
class SearchLog(Model):
    search_date = fields.DateField()
    movies: fields.ManyToManyRelation["Movie"] = fields.ManyToManyField(
        "app.Movie", related_name="search_logs"
    )


async def run():
    await init_db("test")
    # if we got many movies and want insert them at once
    movies = [Movie(mv_id=f"{i}") for i in range(100)]
    _movies = await Movie.bulk_create(movies, ignore_conflicts=True)
    search_log, _ = await SearchLog.get_or_create({"search_date": date.today()})
    await search_log.movies.add(*_movies)
    await close_db()

rgbygv avatar Nov 26 '24 09:11 rgbygv

Hi @rgbygv The method bulk_create was returning the same objects you passed to it which means if you don't provide id to your model the returned one will not have the id populated. That's why it was removed. In your example movies and _movies would have the same objects.

Abdeldjalil-H avatar Nov 26 '24 09:11 Abdeldjalil-H

Hi @rgbygv The method bulk_create was returning the same objects you passed to it which means if you don't provide id to your model the returned one will not have the id populated. That's why it was removed. In your example movies and _movies would have the same objects.

Thanks, I got it. I may need something like bulk_save(). I think inserting movies one by one would be slow, so I choose to insert them all at once. My current solution is query again based on movies after batch insert, although it seems a bit silly.

async def run():
    await init_db("test")

    movies = [Movie(mv_id=f"{i}") for i in range(10)]

    await Movie.bulk_create(movies, ignore_conflicts=True)

    movies = await Movie.filter(mv_id__in=[movie.mv_id for movie in movies]).all()
    search_log, _ = await SearchLog.get_or_create(search_date=date.today())
    await search_log.movies.add(*movies)

    await close_db()

rgbygv avatar Nov 26 '24 11:11 rgbygv

@rgbygv why do you need to fetch the movies if you have the IDs?

Abdeldjalil-H avatar Nov 26 '24 11:11 Abdeldjalil-H

@rgbygv why do you need to fetch the movies if you have the IDs?

Because I want to do this “await search_log.movies.add(*movies)” and the movies need to be Movie instances, which can be obtained via .create() or .save(). But I don't want to use either of those methods. The other way to get Movie instances is fetch through the database.

rgbygv avatar Nov 26 '24 11:11 rgbygv

I see now. You can bypass this by adding this before you call add method

for movie in movies:
    movie._saved_in_db = True

However, I think we should have a way to access the through table like django does it.

This class can be used to query associated records for a given model instance like a normal model:

Model.m2mfield.through.objects.all()

What do you think about this @abondar ?

Abdeldjalil-H avatar Nov 26 '24 13:11 Abdeldjalil-H