csharp-driver icon indicating copy to clipboard operation
csharp-driver copied to clipboard

CSHARP-977 Reduce memory allocations by renting arrays from ArrayPool

Open marcin-krystianc opened this issue 3 years ago • 4 comments

This change reduces GC pressure by avoiding temporary arrays allocation for ToDouble and ToSingle conversions.

We've discovered this problem because we use wide tables, i.e. ~90 columns of mostly doubles. Our benchmarks show that driver's performance is greatly improved by this change. For the "workstation GC", we see a stunning 2x-2.5x speed-up.

marcin-krystianc avatar Aug 16 '22 08:08 marcin-krystianc

Hi, thanks for the PR. Unfortunately we aren't planning on releasing a new version of the driver in the near future so just a heads up that we might take a while until we take a look at this PR and merge it.

joao-r-reis avatar Aug 16 '22 09:08 joao-r-reis

@joao-r-reis Thank you for the heads up. Can you say when approximately we can expect the next release? Is there a chance for a release happening by the end of this year?

marcin-krystianc avatar Aug 16 '22 09:08 marcin-krystianc

Yes, we are planning to have at least one release before the end of this year.

joao-r-reis avatar Aug 16 '22 14:08 joao-r-reis

Actually, there are a few more options to run the conversion which are even more efficient than the ArrayPool approach:

  • swap bytes in the source array (option 1): image
  • swap bytes in the source array, convert, and swap it again so the buffer is untouched (if the above is not an option) (option 2): image
  • Let the driver use unsafe code (AllowUnsafeBlocks=true) so it is possible to convert to double in place without any byte swapping at all (option 3)

All of these are significantly faster than the ArrayPool approach. Please let me know which one you would be willing to accept. I think option 2 is the least controversial but is still significantly faster than the current ArrayPool approach.

marcin-krystianc avatar Aug 24 '22 11:08 marcin-krystianc

Any update about this?

M0n7y5 avatar Sep 22 '22 23:09 M0n7y5

No updates, as I mentioned earlier we plan to take a look at open PRs and get a release out before the end of the year so there's unlikely to be any update on this until late October at the very least.

joao-r-reis avatar Sep 23 '22 15:09 joao-r-reis

Option 2 is definitely the one that is the safest so if you don't have any objections you can modify this PR with that approach instead and I'll review it.

Also, please sign our CLA if you haven't already: https://cla.datastax.com

joao-r-reis avatar Nov 03 '22 13:11 joao-r-reis

Option 2 is definitely the one that is the safest so if you don't have any objections you can modify this PR with that approach instead and I'll review it.

I've found another possibility (https://learn.microsoft.com/en-us/dotnet/api/system.bitconverter.int64bitstodouble?view=net-6.0) that doesn't require unsafe code but it is more performant than option 2.

marcin-krystianc avatar Nov 04 '22 12:11 marcin-krystianc

I've found another possibility (https://learn.microsoft.com/en-us/dotnet/api/system.bitconverter.int64bitstodouble?view=net-6.0) that doesn't require unsafe code but it is more performant than option 2.

So your suggestion is to use the ToInt64 method of the same class first and then calling BitConverter.Int64BitsToDouble(Int64) on that long? And for float/single you would use ToInt32 and BitConverter.Int32BitsToSingle?

This is a very interesting idea, I'm also ok with this one but I'd ask you to implement some unit tests in the Cassandra.Tests.SerializerTests class that use double and float values to get some test coverage on these 2 methods that are going to be changed in this PR.

joao-r-reis avatar Nov 04 '22 13:11 joao-r-reis

So your suggestion is to use the ToInt64 method of the same class first and then calling BitConverter.Int64BitsToDouble(Int64) on that long?

I didn't thought about using ToInt64, I coded a manual conversion instead - I'll try compare both approaches and if there is no performance difference I'll use to ToInt64 approach as it is simpler.

And for float/single you would use ToInt32 and BitConverter.Int32BitsToSingle?

Unfortunately BitConverter.Int32BitsToSingle is not available for .NET Framework / .NET Standard so we need to use byte swapping approach.

This is a very interesting idea, I'm also ok with this one but I'd ask you to implement some unit tests in the Cassandra.Tests.SerializerTests class that use double and float values to get some test coverage on these 2 methods that are going to be changed in this PR.

Good idea, I'll add tests.

marcin-krystianc avatar Nov 04 '22 14:11 marcin-krystianc

Would BinaryPrimitives from System.Buffers.Binary work? That is available in .Net Framework and .Net Core.

Telavian avatar Nov 04 '22 14:11 Telavian

Would BinaryPrimitives from System.Buffers.Binary work? That is available in .Net Framework and .Net Core.

I believe that would require adding the System.Memory package dependency since this driver is still targeting .NET Standard 2.0 and .NET Framework 4.5.2 and we should try to avoid adding extra dependencies that can cause version conflicts in the applications that use this driver if possible.

joao-r-reis avatar Nov 04 '22 16:11 joao-r-reis

So your suggestion is to use the ToInt64 method of the same class first and then calling BitConverter.Int64BitsToDouble(Int64) on that long?

I didn't thought about using ToInt64, I coded a manual conversion instead - I'll try compare both approaches and if there is no performance difference I'll use to ToInt64 approach as it is simpler.

I've updated my code to use ToInt64 as it is simpler and the perf is the same.

Regarding the tests, the EncodeDecodeSingleValuesTest seems to be already covering methods which I'm updating.

marcin-krystianc avatar Nov 07 '22 11:11 marcin-krystianc

Do we have an ETA for the 3.19.0 release?

xsoheilalizadeh avatar Nov 22 '22 09:11 xsoheilalizadeh

Do we have an ETA for the 3.19.0 release?

No specific date but we're trying to get the release out before the end of the year so mid December is the current target.

joao-r-reis avatar Nov 22 '22 11:11 joao-r-reis

I just realized that all our serializers are implemented in a thread safe way and this change effectively makes this particular serializer not thread safe as concurrent calls to Deserialize with the same buffer will cause issues. I'd prefer if we went with the first approach (ArrayPool) even though it is not as performant as just avoiding any new buffers completely. Sorry for not realizing this before.

We finally have things sorted out and we are ready to get a new release out so as soon as this is handled I will merge this PR.

joao-r-reis avatar Dec 19 '22 17:12 joao-r-reis

I cherry picked your initial commit on this PR to this branch https://github.com/datastax/csharp-driver/pull/575 so I can start testing it, let me know if you agree with this change and whether you want me to merge the new PR or if you want to revert your new commits on this PR and merge this one instead.

joao-r-reis avatar Dec 19 '22 19:12 joao-r-reis

I just realized that all our serializers are implemented in a thread safe way and this change effectively makes this particular serializer not thread safe as concurrent calls to Deserialize with the same buffer will cause issues. I'd prefer if we went with the first approach (ArrayPool) even though it is not as performant as just avoiding any new buffers completely. Sorry for not realizing this before. Ah I see, fair enough.

I cherry picked your initial commit on this PR to this branch #575 so I can start testing it, let me know if you agree with this change and whether you want me to merge the new PR or if you want to revert your new commits on this PR and merge this one instead.

I'll revert changes on this branch.

marcin-krystianc avatar Dec 20 '22 08:12 marcin-krystianc

@joao-r-reis I reverted changes on this branch so we are back to the ArrayPool approach. Another quick idea I've is to use stackalloc byte[8] instead of renting the array from ArrayPool. I'll measure whether it makes any difference.

marcin-krystianc avatar Dec 20 '22 09:12 marcin-krystianc

@joao-r-reis I reverted changes on this branch so we are back to the ArrayPool approach. Another quick idea I've is to use stackalloc byte[8] instead of renting the array from ArrayPool. I'll measure whether it makes any difference.

After some trying I thing that it is not actually possible to use stackalloc as it requires Span<T> based APIs and these are not available until netstandard 2.1 / netcore

marcin-krystianc avatar Dec 20 '22 09:12 marcin-krystianc

We can keep the BitConverter.Int64BitsToDouble(ToInt64(value, offset)); approach for ToDouble though, can you make that quick change?

joao-r-reis avatar Dec 20 '22 10:12 joao-r-reis

We can keep the BitConverter.Int64BitsToDouble(ToInt64(value, offset)); approach for ToDouble though, can you make that quick change?

👍

marcin-krystianc avatar Dec 20 '22 11:12 marcin-krystianc

justfyi: C# Driver 3.19.0 is now GA.

joao-r-reis avatar Dec 22 '22 18:12 joao-r-reis

@joao-r-reis FYI I've compared performance of versions 3.18.0 and 3.19.0 with a benchmark that mostly does double value reading. I've noticed about 50% improvement of the performance.

marcin-krystianc avatar Dec 28 '22 09:12 marcin-krystianc

Awesome! In our benchmark that doesn't focus on a particular type or request we also noticed an improvement of ~10% so I'm fairly confident that this release has performance benefits for everyone regardless of workload type. Great stuff!

joao-r-reis avatar Jan 09 '23 15:01 joao-r-reis