csharpstandard Criteria for putting new unsafe-related text in chapters other than "Unsafe code"

Note: Since this issue was created, chapter (and maybe section) numbers have changed!

[Although this is an issue for V9, Jon suggested we discuss the general principle soon. That is, how hard should we try to keep all unsafe-related text in the unsafe chapter? Are there reasonable exceptions for not doing so?]

Support for unsafe mode is optional, and 1-2 years ago, we made the decision to push almost all the unsafe-related stuff into the unsafe chapter, §22, except for bits in the grammar, which we’ve flagged as “unsafe-mode only.”

I’ve nearly completed getting the MS v9 proposal for function pointers into shape for use by TG2. However, I have a situation for which I’m looking for guidance.

The addition of support for function pointers requires unsafe support, so most of the spec for that proposal will necessarily go in §22. However, this feature also impacts type inference, as described in six subsections of §11.6.3, “Type inference.”

I see two alternate approaches:

Following the current approach of putting as much as possible unsafe stuff into 22, I can add text to the corresponding sections in §11.6.3.* saying “This subclause is extended in unsafe code (§forward-pointer to §22.x.y).” And then describe those extensions in the new sections §22.x.y.
I can put the unsafe-related stuff in-line in §11.6.3.*, and somehow mark it as being unsafe-related.

Approach 1 is pure, but has the problem that some text in §22.x.y needs to be “merged” into specific places in lists in §11.6.3.*. For example, this would result in the following:

§22.6.(x) Output type inferences [new section]

In §11.6.3.7, the following bullet is added between the second and third bullets:

If E is an address-of method group and T is a function pointer type with parameter types T1..Tk and return type Tb, and overload resolution of E with the types T1..Tk yields a single method with return type U, then a lower-bound inference is made from U to Tb.

The reader of this new section, §22.6.x, will have to flip between this text and that in §11.6.3.7 to make sense of it. And then, we are at the mercy of this positional dependence of that list, which could easily get out of sync as §11.6.3.7 evolves.

Here’s a similar case:

§22.6.(y) Lower-bound inferences [new section]

In §11.6.3.10, the following case is added to the third bullet:

V is a function pointer type delegate*<V2..Vk, V1> and there is a function pointer type delegate*<U2..Uk, U1> such that U is identical to delegate*<U2..Uk, U1>, and the calling convention of V is identical to U, and the refness of Vi is identical to Ui.

The first bullet of inference from Ui to Vi is modified to:

If U is not a function pointer type and Ui is not known to be a reference type, or if U is a function pointer type and Ui is not known to be a function pointer type or a reference type, then an exact inference is made

Then, added after the third bullet of inference from Ui to Vi:

Otherwise, if V is delegate*<V2..Vk, V1> then inference depends on the i-th parameter of delegate*<V2..Vk, V1>:

If V1:

If the return is by value, then a lower-bound inference is made.

If the return is by reference, then an exact inference is made.

If V2..Vk:

If the parameter is by value, then an upper-bound inference is made.

If the parameter is by reference, then an exact inference is made.

This enhancement involves a change of existing words, not just the addition of new words, which complicates things further. That said, if we push this stuff back into §11.6.3.x., we likely can find a way to have two branches for this: with and without unsafe support.

The longer I study the problem, the more I lean towards putting this stuff in §11.6.3.* with suitable unsafe-conditional text. Putting it in §22 makes it stand out, but not positively so, and looks somewhat like the situation we had previously with the grammar in earlier chapters being augmented by unsafe extensions in §22. And we dropped that approach and pushed the unsafe grammar back into the main spec.

Mar 13 '23 13:03 RexJaeschke

Jon's input in private mail:

Can we discuss this at next week's meeting? Aside from anything else, I want to document the pros and cons clearly, and how we weigh them up, so that in the next similar situation we have a record of how we handled this. I do like having all the unsafe aspects in 22, but I can see it being an issue here.

It would be worth asking folks with more knowledge of later (and even unreleased) features how much similar stuff we're looking at over the next few years.

If we do include it in 11.3.6, we should also back-reference it from 22 so that anyone wanting to know "What extra features are available in unsafe code" can just look through 22 and not miss this.

Mar 13 '23 13:03 RexJaeschke

We're leaning towards putting this in 11.6.3 as Rex suggests, mostly because type inference is so complex and easy to get wrong (when both reading and writing...). But this shouldn't be seen as precedent beyond "we can discuss it if we think it's worth violating our normal approach".

We need @MadsTorgersen to sign off on that approach though.

Nov 01 '23 21:11 jskeet

We have agreed to:

Put the relevant text "inline" (e.g. in 11.6.3) but with a consistent (human and machine readable) label to indicate "unsafe only"
Reference back from section 22 as appropriate (and validate this before each release, to account for changes in locations etc)
Rex will create a "demo" of what this might look like before this issue is closed

Feb 07 '24 21:02 jskeet

Here's my demo; actual spec text changes are shown underlined.

I've used the notation **UnsafeMode**: ... **end UnsafeMode** to delimit unsafe-specific text, so it can be found programmatically, but I'm open to changing its spelling. This form almost mirrors that for examples and notes (which are informative), but as this new delimiter is for normative text, I've made it slightly different.

Example 1 (simple)

Here is the unsafe-specific text added to the core chapter:

12.6.3.4 Expressions|Function members|Type inference|Input types

If E is a method group or implicitly typed anonymous function and T is a delegate type or expression tree type then all the parameter types of T are input types of E with type T.

UnsafeMode: If E is an address-of method group and T is a function pointer type, then all the parameter types of T are input types of E with type T. end UnsafeMode:

12.6.3.5 Output types

…

Here is a pointer from the unsafe chapter back to the unsafe-specific text added to the core chapter for this topic:

23.6 Pointers in expressions

23.6.x Type inference

23.6.x.1 Input types

See §12.6.3.4 for the unsafe-context impact on this topic.

Example 2 (non-trivial)

Here is the unsafe-specific text added to the core chapter:

12.6.3.10 Lower-bound inferences

A lower-bound inference from a type U to a type V is made as follows:

If V is one of the unfixed Xᵢ then U is added to the set of lower bounds for Xᵢ.
Otherwise, if V is the type V₁? and U is the type U₁? then a lower bound inference is made from U₁ to V₁.
Otherwise, sets U₁...Uₑ and V₁...Vₑ are determined by checking if any of the following cases apply:
- V is an array type V₁[...]and U is an array type U₁[...]of the same rank
- V is one of IEnumerable<V₁>, ICollection<V₁>, IReadOnlyList<V₁>>, IReadOnlyCollection<V₁> or IList<V₁> and U is a single-dimensional array type U₁[]
- V is a constructed class, struct, interface or delegate type C<V₁...Vₑ> and there is a unique type C<U₁...Uₑ> such that U (or, if U is a type parameter, its effective base class or any member of its effective interface set) is identical to, inherits from (directly or indirectly), or implements (directly or indirectly) C<U₁...Uₑ>.
- UnsafeMode: V is a function pointer type delegate*<V2..Vk, V1> and there is a function pointer type delegate*<U2..Uk, U1> such that U is identical to delegate*<U2..Uk, U1>, and the calling convention of V is identical to U, and the refness of Vi is identical to Ui. end UnsafeMode:
- (The “uniqueness” restriction means that in the case interface C<T>{} class U: C<X>, C<Y>{}, then no inference is made when inferring from U to C<T> because U₁ could be X or Y.)
  If any of these cases apply then an inference is made from each Uᵢ to the corresponding Vᵢ as follows:
- If Uᵢ is not known to be a reference type then an exact inference is made; or alternatively, UnsafeMode: If U is not a function pointer type and Ui is not known to be a reference type, or if U is a function pointer type and Ui is not known to be a function pointer type or a reference type, then an exact inference is made end UnsafeMode:
- Otherwise, if U is an array type then a lower-bound inference is made
- Otherwise, if V is C<V₁...Vₑ> then inference depends on the i-th type parameter of C:
  - If it is covariant then a lower-bound inference is made.
  - If it is contravariant then an upper-bound inference is made.
  - If it is invariant then an exact inference is made.
- UnsafeMode: Otherwise, if V is delegate*<V2..Vk, V1> then inference depends on the i-th parameter of delegate*<V2..Vk, V1>:
  - If V1:
    - If the return is by value, then a lower-bound inference is made.
    - If the return is by reference, then an exact inference is made.
  - If V2..Vk:
    - If the parameter is by value, then an upper-bound inference is made.
    - If the parameter is by reference, then an exact inference is made. end UnsafeMode:

Otherwise, no inferences are made.

Here is a pointer from the unsafe chapter back to the unsafe-specific text added to the core chapter for this topic:

23.6 Pointers in expressions

23.6.x Type inference

23.6.x.4 Lower-bound inferences

See §12.6.3.10 for the unsafe-context impact on this topic.

Mar 05 '24 22:03 RexJaeschke

Looks good to me, although I'm not keen on "UnsafeMode" as the label. Let's spitball it and see if we can come up with anything better. (UnsafeSupport?)

Mar 06 '24 11:03 jskeet

Decision on 2024-05-15:

Keep UnsafeMode as the name for now, but we'll revisit later (possibly to UnsafeContext, but it'll be easier to check that when the work has been done)
We still want @MadsTorgersen sign-off before doing significantly more work on this

May 15 '24 20:05 jskeet

After a short discussion, we agreed to not have a new bracketing label, but, rather, to add a note to the end of the unsafe-code-specific text saying something like, “Note: This is only applicable in unsafe code. end note” And we’ll have a pointer to the core code changes from the unsafe code clause, as previously proposed.

Rex will revise PR #984 accordingly.

Jun 12 '24 20:06 RexJaeschke

PR #https://github.com/dotnet/csharpstandard/pull/984 has been revised to incorporate changes modeled on the resolution of this issue.

Jun 16 '24 22:06 RexJaeschke