IntervalSets.jl icon indicating copy to clipboard operation
IntervalSets.jl copied to clipboard

Rethink unions and intersections

Open dlfivefifty opened this issue 7 years ago • 7 comments

The following inconsistency feels "wrong":

julia> union(1,2)
2-element Array{Int64,1}:
 1
 2

julia> union(1:1 , 2:2)
2-element Array{Int64,1}:
 1
 2

julia> union(1..1, 2..2)
ERROR: ArgumentError: Cannot construct union of disjoint sets.
Stacktrace:
 [1] _leq_union at /Users/solver/Projects/IntervalSets.jl/src/interval.jl:148 [inlined]
 [2] union(::Interval{:closed,:closed,Int64}, ::Interval{:closed,:closed,Int64}) at /Users/solver/Projects/IntervalSets.jl/src/interval.jl:125
 [3] top-level scope at none:0

Following the success of the lazy Broadcasted approach, I'd propose that

union(a::AbstractInterval, b::AbstractInterval) = UnionDomain(a,b)

where UnionDomain (and IntersectDomain) are lazy:

struct UnionDomain{T, D} <: Domain{T}
   domains::D
end
UnionDomain(a...) = UnionDomain{mapreduce(eltype,promote_type,a),typeof(a)}(a)
in(x, d::UnionDomain) = any(in.(x, d.domains))

as in https://github.com/JuliaApproximation/Domains.jl. This works as follows:

julia> UnionDomain(1..1, 2..2)
a union of 2 domains:
	1.	: 1..1
	2.	: 2..2

julia> 1 ∈ UnionDomain(1..1, 2..2)
true

julia> 2 ∈ UnionDomain(1..1, 2..2)
true

julia> 1.5 ∈ UnionDomain(1..1, 2..2)
false

The current behaviour would be recovered via

julia> Interval(union(1..2, 1.5..3))
1.0..3.0

julia> Interval(union(1..2, 1.5..3))
1.0..3.0

julia> Interval(union(1..1, 2..2))
ERROR: ArgumentError: Cannot construct interval from disjoint union of intervals.

dlfivefifty avatar Sep 12 '18 19:09 dlfivefifty

@ararslan @timholy Please let me know if you think this is a good idea and I can prepare a PR.

dlfivefifty avatar Sep 20 '18 08:09 dlfivefifty

It's an interesting proposal. I guess the main alternative is like sqrt(-3) returning a DomainError, and asking users who want that operation to succeed to call it with arguments under which union is a closed operation:

union(IntervalUnion.((1..2, 3.5..5))...)

One potential advantage of this approach is that I'm not sure you need a separate IntersectDomain---the mathematical concept here is a union of intervals, an "object," rather than an "operation."

timholy avatar Sep 20 '18 09:09 timholy

Hi Folks. The solution I chose for ClosedIntervals is not to define union and intersect but rather have the operations join and meet. In the context of lattices/partially ordered sets, the join of two object is the smallest object that is above both. For intervals, the smallest interval containing both [1,2] and [3,4] is [1,4]. On the other hand, the meet of two intervals is the largest interval contained in both, and that agrees with set intersection. My two cents.

scheinerman avatar Sep 20 '18 11:09 scheinerman

I think the best behaviour is to mimic how union works on collections as much as possible.

For collections, it auto-promotes: union(1,2) works and returns a [1,2], and we do not need to convert 1 and 2 to AbstractVector{Int}s first. Also, union(1..2, [3, 7, 10]) would be supported. In the proposal it would return UnionDomain((1..2, [3,7,10])).

A more general design might be able to include all of the following operations:

  1. union
  2. setdiff
  3. intersection (though I agree this not needed for intervals and the current behaviour is good.)
struct CombinationDomain{T, FF<:Function, DD<:Tuple} <: Domain{T}
   op::FF
   domains::DD
end

in(x, d::CombinationDomain) = d.op(in.(x, d.domains))

const UnionDomain{T,DD} = CombinationDomain{T,typeof(any),DD}
const IntersectDomain{T,DD} = CombinationDomain{T,typeof(all),DD}

setdiff_f(a,b) = a && !b
const SetDiffDomain{T,DD} where DD<:Tuple{A,B} where {A,B} = CombinationDomain{T,typeof(setdiff_f),DD}

dlfivefifty avatar Sep 20 '18 12:09 dlfivefifty

I guess it's a mix. The auto-promotion of numbers to arrays is necessitated because there isn't a way to express the union of two distinct numbers as a number. However, for a discrete set (aka, a collection), the output type tends to be preserved: we don't, for example, return a Set when passed Vectors even though internally we perform that exact conversion temporarily. Here we have a continuous set, so in some sense one can argue we should also try to preserve the output type. In this case, for Intervals there is a useful notion of union that allows one to return another Interval, but whether it succeeds depends on values rather than types.

I can go either way here; I guess the advantage of your approach is that union will always work, whereas mine is perhaps more motivated by backwards compatibility and keeping types contained by default. I do not feel that the points I've made obviously have more merit than yours; indeed, having raised them as potential concerns, I should say I think I slightly lean in the direction you're proposing. EDIT: probably my biggest concern is limiting "type explosion"; I'm not sure I'm happy with UnionDomain, IntersectDomain, CombinationDomain, and SetDiffDomain; if you can figure out how to do everything with IntervalUnion I will be much happier.

Either way, I will happily support you if you feel this is the right direction (and assuming, as I do, that Interval(union(1..2, 1.5..3)) will have the same performance it currently has thanks to all those lovely compiler optimizations).

@scheinerman, I also would not be opposed to join and meet. At one point I thought I had defined boundingbox, but I now suspect I did that in one of my lab's internal packages, and indeed join is a briefer name for the same operation.

timholy avatar Sep 20 '18 14:09 timholy

Looking at Base's union code it should actually be possible to jack into the following line:

union(s, sets...) = union!(emptymutable(s, promote_eltype(s, sets...)), s, sets...)

I'm thinking something like this:

eltype(x::AbstractInterval{T}) = Infinitesimal{T}
struct DomainStyle end
emptymutable(s, ::Infinitesimal) = DomainStyle()
union!(::DomainStyle, s...) = UnionDomain(s)

=============

This has the obvious drawback that it is abusing union! to mean something completely different. Perhaps a PR into Base would be more appropriate that replicates Broadcasted, something like this:

abstract type UnionStyle end
struct SetStyle <: UnionStyle end
struct VectorStyle <: UnionStyle end

UnionStyle(::Type{<:AbstractSet}) = SetStyle()
UnionStyle(_) = VectorStyle()

struct Unioned{Style,SV}
   args::SV
end

Unioned(args) = Unioned{combine_union_styles(args), typeof(args)}(args)

union(s...) = materialize(Unioned(s))
union!(x, s...) = materialize!(x, Unioned(s))

dlfivefifty avatar Sep 20 '18 21:09 dlfivefifty

eltype(x::AbstractInterval{T}) = Infinitesimal{T}

As I'm sure you see, that's problematic because eltype(x::AbstractInterval{T}) = T (and should stay that way).

The BroadcastStyle-like solution seems pretty reasonable to me. Is union! easily supportable in general, though? When you're doing this inferrably with two sets I think the result is going to have to be like your UnionDomain above, and that will be immutable.

timholy avatar Sep 22 '18 13:09 timholy