dex-lang icon indicating copy to clipboard operation
dex-lang copied to clipboard

Rethink overloading

Open apaszke opened this issue 4 years ago • 12 comments

As Julia demonstrates, ad hoc overloading is a great fit for numerical computing! In Dex we have some support for it, in the form of interfaces (type classes), but I would claim that it's still a little less ad'hoc then we would like (in the spirit opposite to How to make ad-hoc polymorphism less ad hoc that introduced type classes). In particular, when defining a type class one has to pick a single type signature that would then be enforced on every single implementation. The big benefit of this approach is that it plays very well with parametric polymorphism and type inference, but it does limit expressivity. For example, when defining interface Mul a it is tempting to just say

(*) : a -> a -> a

but that means that no one will ever be able to overload * to represent the common Python idiom of "a" * 4.

I'm not sure what the options are, but we should do something about it. Julia's standard library features over 20 types for matrices and it would be a shame to miss out on this. One potential solution would be to add support for type-directed name resolution (which is already supported in Idris). It doesn't play quite as well with polymorphic functions, but it does work well in monomorphic environments like at the top level.

(cc @oxinabox for any insights)

apaszke avatar Oct 18 '21 11:10 apaszke

@apaszke does Dex currently support multi-parameter type classes?

I'd argue that the strict typing discipline presented by type classes has some benefits, especially when (in Julia) users can import Base methods and implement them on types with semantics that don't necessarily match what is to be expected. In the context of Julia, this sort of implementation might be seen as an anti-pattern in normal code (but is sometimes used to great effect in metaprogramming -- e.g. Metatheory.jl uses + in a distinct way as part of a DSL to construct theories for equality saturation usage).

With multi-parameter type classes, it's no longer an anti-pattern -- it's a type error, and for common algebraic structures, the implementations likely follow various "theorems for free" principles.

In any case, I think there are ups and downs -- and it would be nice to develop a few more examples in depth.

(Also, I'm ignoring the type piracy / orphan discussion which comes with ad hoc polymorphism -- I'm assuming that's resolved in an orthogonal way, even a compile time enforced way).

femtomc avatar Oct 18 '21 21:10 femtomc

Re -- the reason I think more examples might be necessary: for that sort of Python-like idiom, you could just use the fact that String is a monoid, then iterate (assuming expression in a Haskell-like way).

But I really do understand the sentiment -- that a user of this language may not be accustom or want to reason in this way (with type classes).

femtomc avatar Oct 18 '21 21:10 femtomc

@femtomc Yes, multi-parameter type classes work.

Regarding your point about arbitrary overloading being dangerous: type classes do restrict the set of valid behaviors of different implementations, but they still don't enforce all the invariants you might care about. For example, you're free to implement a Semigroup instance with a binary operation that's not associative, and you might get weird results when you use it to perform a reduction. Theorems for free are a neat tool, but they often require very expressive types to be able to express all invariants you might care about so in day to day programming it's usually not used that much.

All in all, I think the killer feature of type classes is not the fact that they restrict the types of implementations (I see that as a downside!), but the way the upfront restrictions make it possible to perform type inference in polymorphic contexts.

apaszke avatar Oct 19 '21 10:10 apaszke

By the way I'd be curious to learn about the ways ad-hoc overloading is commonly used in Julia and the best practices people have formulated. Please let me know if you have any more insights!

apaszke avatar Oct 19 '21 10:10 apaszke

Regarding your point about arbitrary overloading being dangerous: type classes do restrict the set of valid behaviors of different implementations, but they still don't enforce all the invariants you might care about. For example, you're free to implement a Semigroup instance with a binary operation that's not associative, and you might get weird results when you use it to perform a reduction.

That's fair -- but enforcing those invariants comes with its own set of usability problems (especially for programmers who are new to strongly typed paradigms).

By the way I'd be curious to learn about the ways ad-hoc overloading is commonly used in Julia and the best practices people have formulated. Please let me know if you have any more insights!

It's so deeply ingrained in the language, I doubt there's a single popular package in the ecosystem which doesn't overload Base in some form. Have you watched the talk by Karpinsky?

Edit: note that multiple dispatch (as used in Julia) has a different flavor than dispatch in type systems without abstract inheritance. E.g. in Julia, you organize hierarchies of types by declaring an abstract supertype abstract type Something end and then concrete implementors struct Foo <: Something end implement the interfaces you intended for Something. This stems from the assumption that

being able to inherit behavior is much more important than being able to inherit structure

So one common idiom in packages is to develop a hierarchy of types based around a small set of abstract types.

Now, Julia does not have multiple inheritance on abstract types -- and this would be plenty nice, but there are technical and social reasons for why this sort of extension has not been implemented in the type system.

femtomc avatar Oct 19 '21 14:10 femtomc

Re — the talk above is really a nice entry point. But then it might be useful to study the practical ways that multiple dispatch is used for composition: a good (numeric) example here might be https://github.com/JuliaSymbolics/Symbolics.jl.

Just to quickly elaborate on this example -- this library uses ad hoc polymorphism for methods which are declared for abstract supertype Number to implement symbolic tracing.

femtomc avatar Oct 19 '21 15:10 femtomc

(Hi, allow me to jumping in. I just found this in Julia slack.)

but that means that no one will ever be able to overload * to represent the common Python idiom of "a" * 4.

This is also somewhat non-idiomatic in Julia, since + is typically used for commutative monoid (or maybe semigroup). See: Why does Julia use * for string concatenation? Why not + or something else? · Frequently Asked Questions · The Julia Language. That said, something like the "typeclass law" in Julia is not explicitly documented and people tend to have different mental models.

But I think there are various idiomatic examples in Julia that support

(*) : a₁ -> a₂ -> a

where a₁, a₂, and a are "semantically similar" for humans but not necessarily related in the type system. For example, (a₁, a₂) = (Int, ComplexF32) and (a₁, a₂) = (Float64, Vector).

Maybe it's also worth mentioning that well-designed Julia interfaces are not automatic from multiple dispatch. They often define what I call "dispatch pipeline" to tame the flexibility of multiple dispatch to avoid method ambiguity. Good examples are the type promotion pipeline (e.g., how it makes promote_type commutative), array indexing pipeline, and broadcast style. The pattern used often in a complex dispatch pipeline (e.g., broadcast style) is called Holy-trait in Julia community. A bad example is how matrices are dispatched in *; it uses multiple dispatch naively and, as a result, there are a lot of manual method ambiguity resolutions. I'm no type-system expert, but I wonder if it makes sense to try distilling this kind of good practice in Julia into a system designed from the first principle.

tkf avatar Oct 21 '21 18:10 tkf

Thanks for all the great comments. I finally had some time to watch the talk linked by @femtomc, but haven't had the time to appropriately process the docs yet.

It seems to me that to a large degree the secret sauce is multiple dispatch together with subtyping. Multiple dispatch itself is not really that unique to Julia (as the talk seems to suggest), because all languages with multi-parameter typeclasses or traits can support it too (with the added benefit of the required implementations being clear from the method type). What I think is more interesting, is that Julia heavily embraces overlapping instances, and uses the subtyping relation to try to resolve ambiguity. To some extent this is available in Haskell through the OVERLAPPING pragmas, but it's much more rudimentary. It's an interesting concept and I'll have to think a little more if it would be possible to integrate that into a language such as Dex.

Subtyping might also be useful to allow for => to represent nd collections of data without necessarily assuming that they're backed by dense arrays, but that's for later.

apaszke avatar Nov 09 '21 11:11 apaszke

It seems to me that to a large degree the secret sauce is multiple dispatch together with subtyping.

That sounds right to me. And together with "open classes" (technically I think you can argue that there is a world with subtyping that is not open)

Multiple dispatch itself is not really that unique to Julia (as the talk seems to suggest), because all languages with multi-parameter typeclasses or traits can support it too

I am sure everyone involved in that talk knows it is not a julia specific feature (Common Lisp, D, and several other languages also have it) There were a few versions of that talk made and I don't think I have seen that one, but I guess it was unclear due to trying to hit time constraints.

(with the added benefit of the required implementations being clear from the method type).

Yes, mostly a benefit, though I am coming around to the idea that the trade-off surfaces is more interesting than I initially though. Since not having to fully implement things you don't use cuts the number of things to implement down from every combination to just the ones that occur. To play fast and loose with ideas, O(n^k) for k combinations of n types, to O(l) where l is the length of your source code). But that is probably not an interesting space for Dex to explore.

An interesting thing though is that the thing which is called interfaces/contracts/protocols/typeclasses/traits/specs is always one of two ideas, and often both. but they are reasonably distinct. It is both dispatch which purely gives expressiveness. And checking which gives correctness promises. (@awf and I had a discussion ages ago that you can delete the type-checker from code which passes it and it has no effect on the behavior. Because its not being used for dispatch.) You can do the checking any way you like really, it could be done at compile-time (like most static languages), it could be done at test time implictly through tests that define the interface (like most dynamic languages), or at test time though tests that explictly check the interface (how we tend to use julia in practice a lot, with test suites).

That was a digression. Short version is there is no need for dispatch and checking to use same method. But there is not nesc a good reason not to either once you are already being a static language.

What I think is more interesting, is that Julia heavily embraces overlapping instances, and uses the subtyping relation to try to resolve ambiguity.

Yes, this is the really important bit.

The other important bit is doing this in a way that doesn't incur polynmial amounts of compilation. (in contrast to C++ templates) Which julia does via being purely JITed, but I think can be done without that.

oxinabox avatar Nov 09 '21 14:11 oxinabox

I think there's even more nuance to the dispatch vs checking point. In Julia, not being able to resolve a method statically (e.g. due to type instability) is not an issue, because you have run-time support for method selection. In a language like Dex, you reaaaally don't want to do dispatch at runtime, because that just doesn't work that well on accelerators. So you need to have more checking capabilities (I treat checking as equivalent with static reasoning), which in turn lets you prune the set of methods that can be chosen at dispatch time and defunctionalize the program to eliminate the run-time dynamism.

apaszke avatar Nov 10 '21 12:11 apaszke

I’ve often wondered about this myself — and indeed there’s some recent work on detecting dynamism requiring call sites (as an after inference analysis): https://aviatesk.github.io/JETTest.jl/dev/toolset/dispatch/#Dispatch-Analysis which could become a type checking error if dynamism be disallowed.

@apaszke do the type and effect systems intermingle? One thing I’ve been concerned about (and is a very real problem if one tries to think about constructing an effect system for Julia) is that dispatch on different levels of subtypes might imply the call site has different effects. In fully static (AOT) dispatch, this isn’t a problem (it seems) because you can run effect inference after inference (after the checker).

It would be interesting to think about a system with multiple dispatch which must be fully AOT resolved — it’s a much more restrictive mode than Julia, but “performant” Julia code tries to do this anyways (as much of the call graph exposed statically to the optimizer as possible).

femtomc avatar Nov 10 '21 12:11 femtomc

I think there's even more nuance to the dispatch vs checking point. In Julia, not being able to resolve a method statically (e.g. due to type instability) is not an issue, because you have run-time support for method selection. In a language like Dex, you reaaaally don't want to do dispatch at runtime, because that just doesn't work that well on accelerators.

Julia programmers also need to write statically type-inferable code (often called type-stable code [^1]) when the performance matters. This is strictly necessary if you do GPU programming when using CUDA.jl, AMDGPU.jl, KernelAbstractions.jl, etc. Essentially, they need to program in a (implicitly defined) subset of Julia language that can be statically inferable. So, I don't think run-time vs compile-time distinction is important here. Perhaps more direct evidence that the issue is not run-time vs compile-time is Featherweight Go which made Philip Wadler said structural subtyping (that is used in Julia as well) is a solution to the Expression Problem.

Rather, I think the important bit is:

Julia heavily embraces overlapping instances, and uses the subtyping relation to try to resolve ambiguity

I'd also want to add that the ambiguity resolution is not solely restricted to the subtyping relation of the original type. You can use aforementioned Holy-trait technique to define a type-level relation that has different from the subtyping relation. This is typically done by a type-level (or at least statically inferable) mapping to a completely different set of types and defining a custom ordering on such types. It can be done in the "user space" by defining simple functions.

[^1] I've only skimmed it a bit, but maybe Type Stability in Julia: Avoiding Performance Pathologies in JIT Compilation (SPLASH 2021 - OOPSLA) - SPLASH 2021 is a nice introduction for non-Julia programmer.

tkf avatar Nov 10 '21 19:11 tkf