F# sig data contains Too Much Information
While working on https://github.com/dotnet/fsharp/pull/11521 with @vzarytovskii we've identified that the F# signature data metadata resource blob contains Too Much Information.
In particular it contains
- the metadata for "private" declarations in implementation files that don't have signature files
- the metadata for "internal" declarations when there are no InternalsVisibleTo in the assembly
This means the reference assembly feature will not be quite as effective as we would like , as reference assemblies will get updated even when private implementation details change.
A prototype of the fixes needed is in https://github.com/dotnet/fsharp/pull/12674, however this needs more work. Note that the work needed to reduce the information on the signature bloc is of a different nature to the rest of the work on reference assemblies and is in a sense completely independent from a logical/correctness point of view. It also has some subtle edge cases, and thus we think this fix is best made separately, and perhaps released under preview or else have a command line option to disable it.
The reference assembly feature is still highly useful without this fix as there are plenty of cases where it's still useful without this.
In particular it contains the metadata for "private" declarations in implementation files that don't have signature files the metadata for "internal" declarations when there are no InternalsVisibleTo in the assembly
Keeping this info could really help if we want to use it later for, say, decompiling or debug related features. If one day we want to use signature data for a decompiler-like feature, it would be sad to find out that some needed info is missing, and we wouldn't be able to help users to do anything with assemblies coming from NuGet libraries in that case.
If this signature info is not taking too much space, maybe it's better to keep it?
If this signature info is not taking too much space, maybe it's better to keep it?
Problem is MVID is generated taking this info into account, resulting in it to be different in case if private function arity or parameter names are changed.
And MVID will be used to decide whether reference assemblies shall be rebuilt.
That is the main reason we would like to trim it.
Theoretically, MVID calculation logic can be changed to only account for "public" sigdata members, but it will require substantial changes to the ilwriter.
Theoretically, MVID calculation logic can be changed to only account for "public" sigdata members, but it will require substantial changes to the ilwriter.
That sounds like a much better solution, though, indeed it'd involve more work
The major problem here is that local functions are affected, not just top level explicit private/internal functions.
Original:
let f() = 1 + 1
Effects of changes:
let f() = 2 // Ok: no rebuild of dependent projects
let f() = // Error: Rebuild of dependent projects since an inner g function is defined
let g(i: int) = i + i
g 1
My estimates based on typical changes and builds in my work:
Currently reference assemblies in F# are performing at 25% of their potential, with benefits applying for changes that don't involve creating or changing functions. These are often formatting changes, or amendments to simple calculations or numbers or strings. This 25% is already a very noticeable improvement in the experience of programming in F#.
Enabling this for inner functions would get this to 90%, since working with functions is extremely common in F#, with the remaining 10% from removing explicitly defined top-level private/internal functions.
@vzarytovskii says on slack:
I guess we shall first ignore those functions when calculating mvid, and later on consider not generating them in signature blob
The previous conv has gone down a bad path, with a suggestion of intentionally including information in a signature blob which is not part of the public signature, i.e. intentional incorrectness! That should never even be considered. The spurious info should just be deleted.
Fixing the incorrectness (deleting sig info), rather than doing a workaround (preserving it and deleting mvids) is not only correct, but also easier (according to the above conv) and will also improve the responsiveness of the compiler because there is less reading to do, which may make working in IDEs on F# projects which depend on other F# projects more responsive.
@KevinRansom this issue is not Impact-Low but will be one of the top 10 improvements in F# in the last 10 years! It could save F# devs about 15mins per day of unnecessary builds.
The previous conv has gone down a bad path, with a suggestion of intentionally including information in a signature blob which is not part of the public signature, i.e. intentional incorrectness!
It is not intentional incorrectness, it is part of initial compiler design.
That should never even be considered. The spurious info should just be deleted.
It is a potential breaking change, so it cannot be "just deleted", it needs to be aproached very carefully.
Fixing the incorrectness (deleting sig info), rather than doing a workaround (preserving it and deleting mvids) is not only correct, but also easier (according to the above conv) and will also improve the responsiveness of the compiler because there is less reading to do, which may make working in IDEs on F# projects which depend on other F# projects more responsive.
It won't affect majority of the compiler flow, since reading is very fast, as is calculating mvid.
@KevinRansom Kevin Ransom FTE this issue is not Impact-Low but will be one of the top 10 improvements in F# in the last 10 years! It could save F# devs about 15mins per day of unnecessary builds.
Issue is Impact-Low, because the whole team agreed on it on triage, it is internal, it does not signal what the priority is, of where it is in planning. We also shouldn't be adding impact on non-bugs, not entirely sure why is it here.
Improvements to the type-checker or incremental building can (and probably will) result in much, much more significant improvements, rather than reference assemblies, since in majority of the cases, typechecking will still be involved even when reference assemblies are involved.
As of now, the sig data remains as it was (= "too much"), however the MVID calculation does not take into account which eliminates the main drawback for ref assembly effectiveness.
I will close this issue for now, shall be reopened if there is another reason (besides ref assemblies) to do so.