kit icon indicating copy to clipboard operation
kit copied to clipboard

Mutexmap

Open elena-kolevska opened this issue 1 year ago • 10 comments

Description

This PR adds a map of mutexes that provide an inner and outer locking mechanism to support keeping separate locks on dynamic objects, as is the case with namespaced stream pools, for example.

Checklist

Please make sure you've completed the relevant tasks for this PR, out of the following list:

  • [X] Code compiles correctly
  • [X] Created/updated tests

elena-kolevska avatar Apr 28 '24 19:04 elena-kolevska

In general, the code seems good... But what is this going to be used for? We already have other libraries used in the codebase that support thread-safe maps and do that in a more performant way too. For example, we already use github.com/alphadose/haxmap in other parts of Dapr (go.mod here). Although it has some caveat, it's a lot faster, thanks to being lock-free. Could that be used here too?

ItalyPaleAle avatar May 14 '24 00:05 ItalyPaleAle

Codecov Report

Attention: Patch coverage is 74.57627% with 30 lines in your changes are missing coverage. Please review.

Project coverage is 79.83%. Comparing base (0c7cfce) to head (d43a33c). Report is 2 commits behind head on main.

Files Patch % Lines
concurrency/atomicmap.go 66.66% 22 Missing :warning:
concurrency/mutexmap.go 84.61% 7 Missing and 1 partial :warning:
Additional details and impacted files
@@            Coverage Diff             @@
##             main      #95      +/-   ##
==========================================
- Coverage   79.94%   79.83%   -0.11%     
==========================================
  Files          56       62       +6     
  Lines        4382     4760     +378     
==========================================
+ Hits         3503     3800     +297     
- Misses        733      805      +72     
- Partials      146      155       +9     

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

codecov-commenter avatar May 14 '24 14:05 codecov-commenter

Thanks @ItalyPaleAle , haxmap looks great, I wasn't aware of it. I updated the PR in runtime to use it instead and removed the atomic map from kit. I'd like to keep the mutexmap though, because it has some more specialised methods that keep the code in runtime cleaner.

elena-kolevska avatar May 19 '24 22:05 elena-kolevska

Thanks @ItalyPaleAle , haxmap looks great, I wasn't aware of it. I updated the PR in runtime to use it instead and removed the atomic map from kit

Just keep in mind one important thing about haxmap. If an item is added to the map while it's being resized internally, the item shows up only after the resizing is done. See here

Would this be a problem?

ItalyPaleAle avatar May 20 '24 15:05 ItalyPaleAle

If an item is added to the map while it's being resized internally, the item shows up only after the resizing is done.

Thanks for pointing that out. Does this also relate to GetOrSet? Will it block until the resize is complete, or? Also... the issue doesn't seem to have reached a consensus on the cause of this error. Someone else suggested it's related to deletes. Do you remember more details? I know it's been a while, but I think this is important to get right 🙏

elena-kolevska avatar May 21 '24 01:05 elena-kolevska

It won't block. How haxmap works is that it allocates memory to store the items. If more items are needed than the memory that's available, it will need to allocate a new block of memory, copy the items, and then move the pointer to the new memory.

Haxmap doesn't block. As the docs say:

If a resizing operation is happening concurrently while calling Set() then the item might show up in the map only after the resize operation is finished

If you know in advance how many items you'll have, you can pre-allocate the haxmap with enough capacity for them.

ItalyPaleAle avatar May 21 '24 03:05 ItalyPaleAle

Unfortunately, we don't know the number of elements, it's going to depend on how users use the system (specifically, for now, it will be the number of namespaces). I suppose we could set the initial map size to a higher starting number of ~50, or make the size a placement service start up parameter, but I don't really like either of these options.

I'm concerned that a non-replicated data structure can't give me a read-your-write consistency. I didn't dig deep into the paper and the inner structure, and I'm sure if it were easy, it would have been done, but this doesn't look like an unsolvable problem. For example, the resize could be triggered in the background when the size reaches X% of full capacity, and even if we still get to a point where a write has to happen before the resize completes, the write operation shouldn't be acknowledged to the user before the resize has completed, for correctness.

Since we can have multiple sidecars from the same namespace trying to send their updates at the same time, we might indeed have a situation where this issue affects us.

Let me know if I'm understanding this correctly, or missing something.

elena-kolevska avatar May 21 '24 12:05 elena-kolevska

Looking at this issue now, there also seems to be a race on the GetOrSet operation, where a delete wouldn't be observed if it's concurrent to a GetOrSet. It's still eventually consistent, but we need strong consistency here. If everyone's ok with it, I would like to move back to the sync.Mutex based map we had before, even though it's sad to give up on the performance benefits of haxmap. The library is really interesting though, and maybe we could contribute to it in the future and address these issues.

elena-kolevska avatar May 21 '24 14:05 elena-kolevska

+1 to use sync.Mutex map.

artursouza avatar May 21 '24 14:05 artursouza

Thanks @artursouza , I reverted it.

elena-kolevska avatar May 21 '24 14:05 elena-kolevska

@artursouza you merged this PR when there was only 1 vote. The comment should not have been resolved as the conversation was not over.

If the problem was the need to support both int64 and uint32, you don't need to duplicate the code. You can still use the atomic pattern, internally using only atomic.Int64. Since uint32 can safely be contained within int64, you can still use a generic class and then cast the response appropriatley.

ItalyPaleAle avatar May 29 '24 17:05 ItalyPaleAle

@artursouza you merged this PR when there was only 1 vote. The comment should not have been resolved as the conversation was not over.

If the problem was the need to support both int64 and uint32, you don't need to duplicate the code. You can still use the atomic pattern, internally using only atomic.Int64. Since uint32 can safely be contained within int64, you can still use a generic class and then cast the response appropriatley.

Can you provide the code for that? I did hit limitations in the casting capabilities of Go - maybe there is a viable path. I am happy to take another PR on this but I don't think this possible improvement justifies blocking progress on 1.14, given the deadline.

artursouza avatar May 29 '24 17:05 artursouza