Dapper icon indicating copy to clipboard operation
Dapper copied to clipboard

[Optimization] Proposal: Span-based mapping for reduced allocations

Open AlexGreatDev opened this issue 8 months ago • 5 comments

Problem

Current object mapping in Dapper uses reflection which causes memory overhead for bulk operations.

Proposed Solution

Implement a Span<T>-based mapper as an alternative for scenarios like:

  • High-volume data processing
  • AOT compatibility

Benchmark Expectations

Expected 20-30% reduction in memory allocations (based on prototype).

AlexGreatDev avatar May 13 '25 12:05 AlexGreatDev

For the reflection discussion: https://aot.dapperlib.dev/gettingstarted

The underlying API that DapperAOT uses already supports span sources, however at the current time the normal usage piggy-backs off the regular Dapper API. I doubt the span usage Vs object is going to be relevant when compared to a DB fetch, but if you have a particular scenario in mind: let me know.

mgravell avatar May 13 '25 19:05 mgravell

In particular: span usage and reflection are both salient points, but are orthogonal. It would be good to understand which your actual concern is.

mgravell avatar May 13 '25 19:05 mgravell

For the reflection discussion: https://aot.dapperlib.dev/gettingstarted

The underlying API that DapperAOT uses already supports span sources, however at the current time the normal usage piggy-backs off the regular Dapper API. I doubt the span usage Vs object is going to be relevant when compared to a DB fetch, but if you have a particular scenario in mind: let me know.

Thanks for the clarification.

To clarify our scenario: we're working with a high-throughput ETL system where we need to map large volumes of data (millions of records per batch) from the DB. In this context, even small reductions in memory allocations and GC pressure can have significant impact on overall performance. So while DB fetch is indeed the primary cost in many scenarios, in our case the in-process mapping cost — particularly due to allocations from reflection-based object creation — becomes noticeable at scale. Our actual concern is mostly the memory allocation from object materialization, and we're exploring Span<T>-based alternatives to reduce that. We are already using DapperAOT for AOT compatibility, and would love to see more granular control or support for Span<T> mapping in the future. Let me know if you'd like a concrete test case or benchmark from our side.

AlexGreatDev avatar May 16 '25 09:05 AlexGreatDev

Maybe we should take a step back, then, and be very clear about the scenario you're trying to describe.

Our actual concern is mostly the memory allocation from object materialization,

Right; let's discuss that. There are five main sources of allocations here:

  1. metadata discovery and ref-emit; elided in AOT, and mostly amortized in "vanilla" via the strategy cache, so let's ignore that for now
  2. raw ADO.NET overheads - largely unavoidable, but AOT has some features to allow optimization ([CacheCommand])
  3. DTOs - might be avoidable by using struct DTOs in some cases
  4. non-trivial fields, in particular string and byte[]; not simple to avoid
  5. collection overheads, in particular List<T> array backers

So: which are you interested in? I assume we can skip 1 ("switch to AOT") and 2 ("enable command caching, then move on with life"); I assume for 3 you're already using struct DTOs, and there aren't easy options for 4.

Since you mention spans, can I assume your concern is actually 5? Is that what you're wanting to improve here? Note that if you use non-buffered mode, you can control how the buffering works there, but yes: there are definitely things we could do relatively easily to improve 5, so that we at least don't pay the resizing overhead - however, it is hard to remove the final array without having some API that exchanges a buffer lifetime.

There's also options in AOT to offer a row-count hint there ([RowCountHint(...)]), which can make things more efficient today for large sets, by starting the buffer with that size.

But; can you confirm you're talking about 5? How many rows are you returning in your problem scenario? Have you tried [RowCountHint(...)]? Would you be interested in me trying some tweaks to improve the collection of data? (I have an idea of a thing we could do)

mgravell avatar May 16 '25 10:05 mgravell

See ^^^ for my idea there; we would hope to make DapperAOT also use this same approach, but respecting the size hint

mgravell avatar May 16 '25 14:05 mgravell