Mutexmap
Description
This PR adds a map of mutexes that provide an inner and outer locking mechanism to support keeping separate locks on dynamic objects, as is the case with namespaced stream pools, for example.
Checklist
Please make sure you've completed the relevant tasks for this PR, out of the following list:
- [X] Code compiles correctly
- [X] Created/updated tests
In general, the code seems good... But what is this going to be used for? We already have other libraries used in the codebase that support thread-safe maps and do that in a more performant way too. For example, we already use github.com/alphadose/haxmap in other parts of Dapr (go.mod here). Although it has some caveat, it's a lot faster, thanks to being lock-free. Could that be used here too?
Codecov Report
Attention: Patch coverage is 74.57627% with 30 lines in your changes are missing coverage. Please review.
Project coverage is 79.83%. Comparing base (
0c7cfce) to head (d43a33c). Report is 2 commits behind head on main.
| Files | Patch % | Lines |
|---|---|---|
| concurrency/atomicmap.go | 66.66% | 22 Missing :warning: |
| concurrency/mutexmap.go | 84.61% | 7 Missing and 1 partial :warning: |
Additional details and impacted files
@@ Coverage Diff @@
## main #95 +/- ##
==========================================
- Coverage 79.94% 79.83% -0.11%
==========================================
Files 56 62 +6
Lines 4382 4760 +378
==========================================
+ Hits 3503 3800 +297
- Misses 733 805 +72
- Partials 146 155 +9
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
Thanks @ItalyPaleAle , haxmap looks great, I wasn't aware of it. I updated the PR in runtime to use it instead and removed the atomic map from kit. I'd like to keep the mutexmap though, because it has some more specialised methods that keep the code in runtime cleaner.
Thanks @ItalyPaleAle , haxmap looks great, I wasn't aware of it. I updated the PR in runtime to use it instead and removed the atomic map from kit
Just keep in mind one important thing about haxmap. If an item is added to the map while it's being resized internally, the item shows up only after the resizing is done. See here
Would this be a problem?
If an item is added to the map while it's being resized internally, the item shows up only after the resizing is done.
Thanks for pointing that out. Does this also relate to GetOrSet? Will it block until the resize is complete, or? Also... the issue doesn't seem to have reached a consensus on the cause of this error. Someone else suggested it's related to deletes. Do you remember more details? I know it's been a while, but I think this is important to get right 🙏
It won't block. How haxmap works is that it allocates memory to store the items. If more items are needed than the memory that's available, it will need to allocate a new block of memory, copy the items, and then move the pointer to the new memory.
Haxmap doesn't block. As the docs say:
If a resizing operation is happening concurrently while calling Set() then the item might show up in the map only after the resize operation is finished
If you know in advance how many items you'll have, you can pre-allocate the haxmap with enough capacity for them.
Unfortunately, we don't know the number of elements, it's going to depend on how users use the system (specifically, for now, it will be the number of namespaces). I suppose we could set the initial map size to a higher starting number of ~50, or make the size a placement service start up parameter, but I don't really like either of these options.
I'm concerned that a non-replicated data structure can't give me a read-your-write consistency. I didn't dig deep into the paper and the inner structure, and I'm sure if it were easy, it would have been done, but this doesn't look like an unsolvable problem. For example, the resize could be triggered in the background when the size reaches X% of full capacity, and even if we still get to a point where a write has to happen before the resize completes, the write operation shouldn't be acknowledged to the user before the resize has completed, for correctness.
Since we can have multiple sidecars from the same namespace trying to send their updates at the same time, we might indeed have a situation where this issue affects us.
Let me know if I'm understanding this correctly, or missing something.
Looking at this issue now, there also seems to be a race on the GetOrSet operation, where a delete wouldn't be observed if it's concurrent to a GetOrSet. It's still eventually consistent, but we need strong consistency here.
If everyone's ok with it, I would like to move back to the sync.Mutex based map we had before, even though it's sad to give up on the performance benefits of haxmap.
The library is really interesting though, and maybe we could contribute to it in the future and address these issues.
+1 to use sync.Mutex map.
Thanks @artursouza , I reverted it.
@artursouza you merged this PR when there was only 1 vote. The comment should not have been resolved as the conversation was not over.
If the problem was the need to support both int64 and uint32, you don't need to duplicate the code. You can still use the atomic pattern, internally using only atomic.Int64. Since uint32 can safely be contained within int64, you can still use a generic class and then cast the response appropriatley.
@artursouza you merged this PR when there was only 1 vote. The comment should not have been resolved as the conversation was not over.
If the problem was the need to support both int64 and uint32, you don't need to duplicate the code. You can still use the atomic pattern, internally using only
atomic.Int64. Since uint32 can safely be contained within int64, you can still use a generic class and then cast the response appropriatley.
Can you provide the code for that? I did hit limitations in the casting capabilities of Go - maybe there is a viable path. I am happy to take another PR on this but I don't think this possible improvement justifies blocking progress on 1.14, given the deadline.