Rethink unions and intersections
The following inconsistency feels "wrong":
julia> union(1,2)
2-element Array{Int64,1}:
1
2
julia> union(1:1 , 2:2)
2-element Array{Int64,1}:
1
2
julia> union(1..1, 2..2)
ERROR: ArgumentError: Cannot construct union of disjoint sets.
Stacktrace:
[1] _leq_union at /Users/solver/Projects/IntervalSets.jl/src/interval.jl:148 [inlined]
[2] union(::Interval{:closed,:closed,Int64}, ::Interval{:closed,:closed,Int64}) at /Users/solver/Projects/IntervalSets.jl/src/interval.jl:125
[3] top-level scope at none:0
Following the success of the lazy Broadcasted approach, I'd propose that
union(a::AbstractInterval, b::AbstractInterval) = UnionDomain(a,b)
where UnionDomain (and IntersectDomain) are lazy:
struct UnionDomain{T, D} <: Domain{T}
domains::D
end
UnionDomain(a...) = UnionDomain{mapreduce(eltype,promote_type,a),typeof(a)}(a)
in(x, d::UnionDomain) = any(in.(x, d.domains))
as in https://github.com/JuliaApproximation/Domains.jl. This works as follows:
julia> UnionDomain(1..1, 2..2)
a union of 2 domains:
1. : 1..1
2. : 2..2
julia> 1 ∈ UnionDomain(1..1, 2..2)
true
julia> 2 ∈ UnionDomain(1..1, 2..2)
true
julia> 1.5 ∈ UnionDomain(1..1, 2..2)
false
The current behaviour would be recovered via
julia> Interval(union(1..2, 1.5..3))
1.0..3.0
julia> Interval(union(1..2, 1.5..3))
1.0..3.0
julia> Interval(union(1..1, 2..2))
ERROR: ArgumentError: Cannot construct interval from disjoint union of intervals.
@ararslan @timholy Please let me know if you think this is a good idea and I can prepare a PR.
It's an interesting proposal. I guess the main alternative is like sqrt(-3) returning a DomainError, and asking users who want that operation to succeed to call it with arguments under which union is a closed operation:
union(IntervalUnion.((1..2, 3.5..5))...)
One potential advantage of this approach is that I'm not sure you need a separate IntersectDomain---the mathematical concept here is a union of intervals, an "object," rather than an "operation."
Hi Folks. The solution I chose for ClosedIntervals is not to define union and intersect but rather have the operations join and meet. In the context of lattices/partially ordered sets, the join of two object is the smallest object that is above both. For intervals, the smallest interval containing both [1,2] and [3,4] is [1,4]. On the other hand, the meet of two intervals is the largest interval contained in both, and that agrees with set intersection. My two cents.
I think the best behaviour is to mimic how union works on collections as much as possible.
For collections, it auto-promotes: union(1,2) works and returns a [1,2], and we do not need to convert 1 and 2 to AbstractVector{Int}s first. Also, union(1..2, [3, 7, 10]) would be supported. In the proposal it would return UnionDomain((1..2, [3,7,10])).
A more general design might be able to include all of the following operations:
- union
- setdiff
- intersection (though I agree this not needed for intervals and the current behaviour is good.)
struct CombinationDomain{T, FF<:Function, DD<:Tuple} <: Domain{T}
op::FF
domains::DD
end
in(x, d::CombinationDomain) = d.op(in.(x, d.domains))
const UnionDomain{T,DD} = CombinationDomain{T,typeof(any),DD}
const IntersectDomain{T,DD} = CombinationDomain{T,typeof(all),DD}
setdiff_f(a,b) = a && !b
const SetDiffDomain{T,DD} where DD<:Tuple{A,B} where {A,B} = CombinationDomain{T,typeof(setdiff_f),DD}
I guess it's a mix. The auto-promotion of numbers to arrays is necessitated because there isn't a way to express the union of two distinct numbers as a number. However, for a discrete set (aka, a collection), the output type tends to be preserved: we don't, for example, return a Set when passed Vectors even though internally we perform that exact conversion temporarily. Here we have a continuous set, so in some sense one can argue we should also try to preserve the output type. In this case, for Intervals there is a useful notion of union that allows one to return another Interval, but whether it succeeds depends on values rather than types.
I can go either way here; I guess the advantage of your approach is that union will always work, whereas mine is perhaps more motivated by backwards compatibility and keeping types contained by default. I do not feel that the points I've made obviously have more merit than yours; indeed, having raised them as potential concerns, I should say I think I slightly lean in the direction you're proposing. EDIT: probably my biggest concern is limiting "type explosion"; I'm not sure I'm happy with UnionDomain, IntersectDomain, CombinationDomain, and SetDiffDomain; if you can figure out how to do everything with IntervalUnion I will be much happier.
Either way, I will happily support you if you feel this is the right direction (and assuming, as I do, that Interval(union(1..2, 1.5..3)) will have the same performance it currently has thanks to all those lovely compiler optimizations).
@scheinerman, I also would not be opposed to join and meet. At one point I thought I had defined boundingbox, but I now suspect I did that in one of my lab's internal packages, and indeed join is a briefer name for the same operation.
Looking at Base's union code it should actually be possible to jack into the following line:
union(s, sets...) = union!(emptymutable(s, promote_eltype(s, sets...)), s, sets...)
I'm thinking something like this:
eltype(x::AbstractInterval{T}) = Infinitesimal{T}
struct DomainStyle end
emptymutable(s, ::Infinitesimal) = DomainStyle()
union!(::DomainStyle, s...) = UnionDomain(s)
=============
This has the obvious drawback that it is abusing union! to mean something completely different. Perhaps a PR into Base would be more appropriate that replicates Broadcasted, something like this:
abstract type UnionStyle end
struct SetStyle <: UnionStyle end
struct VectorStyle <: UnionStyle end
UnionStyle(::Type{<:AbstractSet}) = SetStyle()
UnionStyle(_) = VectorStyle()
struct Unioned{Style,SV}
args::SV
end
Unioned(args) = Unioned{combine_union_styles(args), typeof(args)}(args)
union(s...) = materialize(Unioned(s))
union!(x, s...) = materialize!(x, Unioned(s))
eltype(x::AbstractInterval{T}) = Infinitesimal{T}
As I'm sure you see, that's problematic because eltype(x::AbstractInterval{T}) = T (and should stay that way).
The BroadcastStyle-like solution seems pretty reasonable to me. Is union! easily supportable in general, though? When you're doing this inferrably with two sets I think the result is going to have to be like your UnionDomain above, and that will be immutable.