Add 'between' to 'Data.Map' and 'Data.Set'
Patch Description
This PR adds 2 new functions to containers: Data.Map.between, and Data.Set.between. These functions will consume a map/set (respectively), and will filter the map/set so that the keys/values lie between an upper and lower bound (inclusive). This operation comes up pretty often in certain applications, and feels generally useful.
Notes
I'm by no means convinced that between is a good name, so if anyone feels that they have something more evocative please chime in!
Interesting. But this is only one of four kinds of "between". What about the others?
Perhaps a more generalized version could be added, along with something like:
data Bound a
= Inclusive a
| Exclusive a
If this was added, I still think the simpler inclusive version ought to be added for ergonomics sake, but I might just be biased because that's the one I need 🙂
Yeah, I think you're biased. I imagine just as many people will need the others.
For reference: GHC contains a related restrictMap utility:
https://gitlab.haskell.org/ghc/ghc/-/blob/18b9ba5602121c75f184f29e5b3e70bd7d4779c4/compiler/GHC/Cmm/Switch.hs#L467-470
Note that it's implemented using split, which I think might be more efficient.
why would implementation via split be more efficient? It's something like
between lo hi = snd . M.split lo . fst . M.split hi
and the first split it might waste some work to build (re-balance) a tree that will be cut down by the second split. Consider the extreme case lo = hi = some mid-range key. The suggested implementation will do no balancing, but just return the singleton (or empty) map.
I'm all for keeping interfaces small, but this could be a case where the current interface does not allow an efficient implementation.
Thanks for pointing this out, @jwaldmann! I think I misunderstood the current implementation and also missed the bit where split is strict in both the trees it returns.
In light of this, I think the current implementation is actually pretty good.
Strictness - I did not check, but yes, I thought that all balanced maps are strict in the structure. Note, my argument (too much work) also holds if only one of the components returned by split is evaluated.
Perhaps the suggested function (and implementation) can be generalized to take a "bitonic filter" predicate? (a function f :: Key -> Ordering that must be monotonic - which we don't check - , and the result is the submap of all keys k with f k == EQ) But that's speculative generality ... It would solve the inclusive/exclusive variants, we could write filterBitonic (\ k -> if k <= lo then LT else if k > hi then GT else EQ), etc. Well, looks ugly, and looks like more operations - unless it's inlined everywhere. Altough I guess the cost is not in doing these comparisons, but in the construction (the balancing) of the result map.
[EDIT] the between function could detect and short-cut the case lo > hi (giving empty result). My suggested filterBitonic function could not do this (it cannot know beforehand that there is no k with f k == EQ).
I made a note of this at https://gitlab.haskell.org/ghc/ghc/-/issues/21217
See also https://github.com/haskell/containers/issues/778 . It says that implementation via split is "at most a constant factor off", which I believe for time, but not for space.
I wonder if we can generalize this idea some more, taking some inspiration from mergeA. Suppose we have two predicates, p, q :: k -> Bool, with the following properties:
- If
x <= ythenp x <= p yandq x <= q y. - For any
x,q x <= p x.
Now p and q will divide any set/map into three portions:
- An initial part where
p x = q x = False. - A middle part where
p x = Truebutq x = False. - A final part where
p x = q x = True.
We might want to do something different with each part: preserve it, drop it, or transform it.