ceps icon indicating copy to clipboard operation
ceps copied to clipboard

Multiple Default Typeclass Databases

Open Janno opened this issue 7 months ago • 8 comments

Rendered: https://github.com/janno/ceps/blob/hint-db/text/XXX-hint-db.md

Janno avatar Jul 02 '25 12:07 Janno

Thanks for this proposal! I think this would be very useful for the proof automation I am working on. I've often encountered the problem that I would like to use multiple different TC databases, but it is just very annoying to do this currently.

How would this CEPS affect the Instance command? Would the instances be added all default databases? Or would it be prohibited to use Instance without specifying a database? (The text currently reads like it might be the second, but it is not entirely clear.)

MackieLoeffel avatar Jul 08 '25 09:07 MackieLoeffel

How would this CEPS affect the Instance command? Would the instances be added all default databases? Or would it be prohibited to use Instance without specifying a database? (The text currently reads like it might be the second, but it is not entirely clear.)

The proposal contains a small remark at the end about a way to select the default TC database for commands such as Instance (https://github.com/janno/ceps/blob/hint-db/text/XXX-hint-db.md#possibly-select-default-typeclass-database). I do not think any command that mutates TC database state should mutate the entire set of registered databases. I can't think of a situation where that would be the desired outcome. Given that @rlepigre-skylabs-ai also asked about this in his comment above I think I will amend the proposal and make the command/setting to select which database these commands act on a required part of the proposal instead of just a possibility.

Janno avatar Jul 08 '25 10:07 Janno

I am not a fan, this seems like a feature to work around a "bug".

In the long run, it seems like a better idea to make sure to have more principled options for type classes, rather than a zoo of options where everyone can make their own database to configure these options differently.

What is the competition doing? AFAIK Haskell just has a single database for type classes. But what about Lean?

This solution also seems to make the diamond problem even worse. Now you have a development X that depends on two libraries L1 and L2, that both have their own database. To which database is X supposed to add its instance? The database of L1 or L2, or both?

robbertkrebbers avatar Jul 08 '25 10:07 robbertkrebbers

In the long run, it seems like a better idea to make sure to have more principle options for type classes, rather than a zoo of options where everyone can make their own database to configure these options differently.

I am not sure that this RFC addresses the problems you are thinking of. Are you thinking of a more fundamental redesign of the typeclass system? With a single database used by everyone it seems to me that there is no way to reconcile the needs of all clients. One development might treat List.map as an opaque object about which it registers instances (e.g. Proper) while another one relies on the same List.map function to be transparent because it needs to be unfolded during TC unification.

What is the competition doing? AFAIK Haskell just has a single database for type classes. But what about Lean?

I am not familiar enough with Haskell to comment on that in particular. I assume they do not have unification with delta in their typeclass searches but I could be wrong. Lean is much closer to Rocq and they do have some defaults that are better than ours. For example, class projections are treated as opaque by default. Apart from that, Lean's support for tabling probably helps to avoid penalties from lax opacity settings but they might still end up suffering down the road. It is hard to tell what will happen given that Lean simply has not been around that long. I suspect they also have better metaprogramming and tactic support than Rocq. That could lead people away from relying on typeclass search for automation purposes (which is where most of our problems show up).

This solution also seems to make the diamond problem even worse. Now you have a development X that depends on two libraries L1 and L2, that both have their own database. To which database is X supposed to add its instance? The database of L1 or L2, or both?

This is a good point that is not yet addressed in the proposal. Given that deduplication of hints across databases is not generally sound I don't think adding the hint to both databases is generally a good choice. So this leaves either the L1's or L2's database. If the new instances are of an existing class then it is likely only present in one database and that's the one to pick in almost all cases. If the class is new but its instances rely on classes from both databases then the user is free to pick whichever database has the most fitting transparency settings for their own instances. Or make a new database with exactly the right settings.

Making a new database will possibly involve copying a bunch of settings from either L1, L2, or both. There is a whole can of worms to open here about an algebra of databases that could allow people to define a database in terms of one or more existing databases by copying and possibly unifying existing ones (with careful choices about conflicting settings). They could then remove the existing DBs from the set of default TC databases and register only the new one. I think this would be my ideal solution to the diamond problem. But it is probably outside the scope of this RFC.

Janno avatar Jul 08 '25 13:07 Janno

Do I understand that the mode of use you generally have in mind is the following:

When you develop automation based on type classes, you typically define a bunch of new class. Now for those classes you use a new database with your custom settings (e.g., for unfolding).

Isn't it then more the case that for each class you want to be able to have a custom database with opacity settings (and other settings)?

robbertkrebbers avatar Jul 08 '25 13:07 robbertkrebbers

When you develop automation based on type classes, you typically define a bunch of new class. Now for those classes you use a new database with your custom settings (e.g., for unfolding).

Yes, that is what I had in mind.

Isn't it then more the case that for each class you want to be able to have a custom database with opacity settings (and other settings)?

I suppose the RFC does not preclude that way of using databases. So this could indeed be one way to go. I personally think one db per class seems too much but I could certainly see different subsystems within a development using different databases. It's really not so much about any particular unit such as a class or instance but rather a coherent package of classes and instances that are tuned to work together within one database. But there is no reason it couldn't be just one class.

Janno avatar Jul 08 '25 13:07 Janno

I didn't mean that you make a new database for each class, but rather that you associate to each class a database. So multiple classes can be associated to one database, but each class only has a single database.

Another advantage is that you do not need to specify which database to use each time you declare an instance. You only specify that for a class.

robbertkrebbers avatar Jul 08 '25 14:07 robbertkrebbers

Oh, I see. I think this might be a great addition to the RFC. Instead of setting a default TC database for all Instance commands and the like we could (also) associate the database with the class. I am not sure if we want to forbid hints for one class to appear in multiple database if users truly want that but making the default configurable per class sounds great.

Janno avatar Jul 08 '25 14:07 Janno