cats-collections icon indicating copy to clipboard operation
cats-collections copied to clipboard

A purely functional Set implementation using `Eq`

Open nasadorian opened this issue 7 years ago • 6 comments

I noticed that the distinct implementations in cats data types are using on Scala's TreeSet and the Order typeclass. But what about a purely functional set based on Eq using only boolean logic?

Upsides:

  • Intuitively more performant than TreeSet, with O(1) lookup and insert
  • Small memory footprint per object, so far just a case class
  • Uses Eq instead of Order or Hash, which is closest to the mathematical definition of Set semantics
  • Could give rise to a Predicate data type for cats as well, similar to the one seen in Haskell docs (http://hackage.haskell.org/package/contravariant-1.5/docs/Data-Functor-Contravariant.html)

Downsides:

  • Ironically, could not derive an Eq instance for the set itself since it's based on a function?
  • GC pressure due to copying of case classes
  • No way to inspect elements, since it is only optimized for checking containment
  • No way to check size

I wrote up a basic implementation (https://gist.github.com/nasadorian/8f10cc1f89225a932238026cf46e57a7) using just a case class and Eq.

Would like to improve it and use more lawful constructs - I am thinking a better impl could depend on some Contravariant Functor like Predicate since it is "containing" a function, but I'm not totally sure how to make that work.

The Set itself could have Monoid and Semigroup instances easily. I would also like to figure out if it's possible to make it a Functor.

Cheers!

nasadorian avatar Nov 04 '18 21:11 nasadorian

That should basically be ISet. Making it a functor wouldn't work, however.

larsrh avatar Nov 04 '18 21:11 larsrh

Thanks @larsrh - looks like I'm a few years late to the party 😅. I do have some improvements I think I could make in a PR.

ISet seems to be based on a generic predicate. Would the special case based on Eq (which I'd assume is most common anyway) belong in the same library or is that too trivial?

nasadorian avatar Nov 04 '18 22:11 nasadorian

Also interestingly, it seems neither ISet nor my implementation are stack safe after a certain size. Found that ISet stack overflows when looking up in a set of around 20k elements.

nasadorian avatar Nov 05 '18 01:11 nasadorian

It depends on how you construct such a set. One possibility to make it stack-safe would be to use a function A => cats.Eval[Boolean] instead of A => Boolean.

larsrh avatar Nov 05 '18 07:11 larsrh

I think Eval[A => Boolean] would be even better (similar to what we did for State) as function composition isn't stack safe in general either

LukaJCB avatar Nov 05 '18 08:11 LukaJCB

I did try it with Eval, but ended up with a deferred version of the same SO.

What I'm starting to realize is... maybe this structure actually has O(n) lookup time instead of constant as one would expect. Correct me if I'm wrong here...

Every time I wrap a function in another function like in Option((_:Int) == 1).map(f => (n: Int) => f(n) || n == 2), we create a new Function1 which references the first Function1 in its apply. So there are N objects being created for an N-element functional set, and when we call contains, Scala is pushing up to N frames onto the stack where we really just want to call a single, flat function.

How would we resolve this with a trampoline if the inevitable act of combining the functions is creating this nesting?

nasadorian avatar Nov 05 '18 18:11 nasadorian