MongoFramework icon indicating copy to clipboard operation
MongoFramework copied to clipboard

Support for multithreading

Open AlizerUncaged opened this issue 2 years ago • 8 comments

I have the following code below to seed a MongoDb instance of 500K+ objects

        List<Member> members = new();

        for (int i = 0; i < 500000; i++)
        {
            var data = $"{_random.Next()}";
            members.Add(new Member()
            {
                Email = $"{data}@random.com",
            });
        }

        _applicationMongoDbContext.Members.AddRange(members);

        await _applicationMongoDbContext.SaveChangesAsync();

Creating the objects takes a few milliseconds but the await _applicationMongoDbContext.SaveChangesAsync() method takes around 20 minutes I checked my VPS and found out it seems to be only utilizing the first core

image

AlizerUncaged avatar Jul 29 '23 10:07 AlizerUncaged

Is that screenshot of core utilization system wide or just the dotnet process?

Turnerj avatar Jul 29 '23 11:07 Turnerj

Pretty much just the dotnet process, this server is a fresh install

AlizerUncaged avatar Jul 29 '23 13:07 AlizerUncaged

Any update on this?

solo812 avatar Sep 28 '23 01:09 solo812

I'm looking into it right now but I don't think the problem is SaveChangesAsync, it is the call to AddRange (at least in my own tests). A call to AddRange effectively internally calls Add, this then is checking the state of the entity in the change tracker.

As part of the change tracking, it needs to check ID values of each entry to ensure it doesn't already exist. While MongoFramework knows what the property is, it only knows it via the PropertyInfo type and uses GetValue - this is not a slow method per-se but it isn't something you want to call a lot.

With 500,000 entities, the adding the 1st one needs to only get its own ID, the 2nd one needs to get its own and the 1st, the 3rd one needs to get all 3 IDs. That means even at 10,000 entities, it is doing the call 10,000 times.

I likely have two options:

  • Attempt to cache the ID (this is problematic for reasons)
  • Find another way to access the real ID (this is likely the best bet for now)

The temporary alternative solution, do smaller batches of say 5,000 items, add them, save changes, clear the change tracker. Repeat this until you've cleared through all your data.

Turnerj avatar Sep 30 '23 13:09 Turnerj

Here's the result of a quick benchmark I put together. At 100 entities it takes 0.3ms, at 1000 it takes 32ms. So for 10x the entities, it was getting 100x slower.

Method EntryCount Mean Error StdDev Allocated
SetEntityState 100 357.0 us 7.02 us 12.66 us -
SetEntityState 1000 32,937.1 us 657.91 us 1,043.52 us 30 B

Turnerj avatar Sep 30 '23 13:09 Turnerj

Added one more iteration, at 10000 entities it takes 3402ms. This is definitely the problem area.

Method EntryCount Mean Error StdDev Allocated
SetEntityState 100 352.7 us 6.66 us 6.84 us 2 B
SetEntityState 1000 32,571.1 us 161.22 us 134.62 us 30 B
SetEntityState 10000 3,402,924.9 us 45,554.20 us 40,382.61 us 3656 B

Turnerj avatar Sep 30 '23 14:09 Turnerj

One thing I'm experimenting with is, instead of using reflection, creating a delegate dynamically via expressions. That does improve performance quite a bit though the scaling is still pretty bad.

Method EntryCount Mean Error StdDev Allocated
SetEntityState 100 54.62 us 1.080 us 1.326 us -
SetEntityState 1000 4,291.31 us 84.723 us 104.047 us 4 B
SetEntityState 10000 392,415.23 us 7,712.945 us 8,252.765 us 1280 B

The worst case here, at 10000 entities, now takes 392ms which is about ~88% faster. I'll see though if there is a nicer way I can improve the algorithm to avoid the calls in the first place.

Turnerj avatar Sep 30 '23 14:09 Turnerj

Managed to find an algorithmic improvement so I don't need to check the ID of every entry if the entry we're setting doesn't have an ID defined:

Method EntryCount Mean Error StdDev Allocated
SetEntityState 100 30.62 us 0.588 us 0.764 us -
SetEntityState 1000 1,061.44 us 20.910 us 19.559 us 2 B
SetEntityState 10000 87,913.98 us 1,750.216 us 2,395.715 us 80 B

And if I stack the algorithmic improvement with the created delegate:

Method EntryCount Mean Error StdDev Allocated
SetEntityState 100 22.62 us 0.450 us 0.645 us -
SetEntityState 1000 911.66 us 17.737 us 18.979 us 1 B
SetEntityState 10000 82,002.73 us 1,611.556 us 3,104.921 us 69 B

Turnerj avatar Oct 01 '23 07:10 Turnerj