fsharp icon indicating copy to clipboard operation
fsharp copied to clipboard

List/Array/Seq.max/min/maxBy/minBy may return nan if the sequence starts with nan

Open Happypig375 opened this issue 3 years ago • 16 comments

The position where NaN appears changes what Seq.max returns.

Repro steps

printfn $"{Seq.max [nan; 2.; 1.; 4.]}"
printfn $"{Seq.max [3.; nan; 1.; 4.]}"
printfn $"{Seq.max [3.; 2.; nan; 4.]}"
printfn $"{Seq.max [3.; 2.; 1.; nan]}"

Expected behavior

4
4
4
3

Actual behavior

NaN
4
4
3

Known workarounds

Filter away all NaNs first.

Related information

SharpLab https://sharplab.io/#v2:DYLgZgzgNALiBOBXAdgHwA7wJbJmZABACQBEA3gMoCmAjgHQC2AhgB4EDayTyA3AQEx0+ARiEEALHQC6AXxIBYAFCYceQqUq1GrDgGYxXXgVF9Jshcuy58xctXrM27fX0F9Dp6XKUrr6u1qOemJuxgbc5kA=

Happypig375 avatar May 30 '22 12:05 Happypig375

FYI, Seq.min behaves the same, which, technically, makes sense, since:

> 4. > nan;;
val it: bool = false

> 4. < nan;;
val it: bool = false

So, current will remain nan forever, since it was initially bound as first element.

vzarytovskii avatar May 30 '22 12:05 vzarytovskii

Tricky problem, cf. min/max:

> max nan 1.0;;
val it: float = nan

> max 1.0 nan;;
val it: float = nan

> min 1.0 nan;;
val it: float = nan

> min nan 1.0;;
val it: float = nan

Code here:

        let mutable acc = e.Current

        while e.MoveNext() do
            let curr = e.Current
            if curr > acc then acc <- curr

        acc

could be

        let mutable acc = e.Current

        while e.MoveNext() do
            acc <- max acc e.Current

        acc

Fixing it does seem to make sense. Might be slower

dsyme avatar Jun 01 '22 20:06 dsyme

Noting List, Array, maxBy, minBy also have this problem

@vzarytovskii I do think this should be fixed. Labelling as good-first-issue

dsyme avatar Jun 01 '22 20:06 dsyme

Just to clarify, for the example in the original post:

Repro steps

printfn $"{Seq.max [nan; 2.; 1.; 4.]}"
printfn $"{Seq.max [3.; nan; 1.; 4.]}"
printfn $"{Seq.max [3.; 2.; nan; 4.]}"
printfn $"{Seq.max [3.; 2.; 1.; nan]}"

Expected behavior

4
4
4
3

based on how min/max works currently, the actual expected behavior is

NaN
NaN
NaN
NaN

correct?

edit: maybe I shouldn't have used the phrase "the actual expected behavior" when trying to clarify things 😄

abonie avatar Jun 06 '22 21:06 abonie

Expected behavior

4
4
4
3

This is how @Happypig375 expected Seq.max to behave.

based on how min/max works currently, the actual expected behavior is

NaN
NaN
NaN
NaN

This is how it behaves currently.

vzarytovskii avatar Jun 07 '22 05:06 vzarytovskii

based on how min/max works currently, the actual expected behavior is

NaN
NaN
NaN
NaN

This is how it behaves currently.

The above is not quite correct and that's why I am trying to clarify. The actual observed behavior, both by @Happypig375 based on their original post and by me is in fact:

Actual behavior

NaN
4
4
3

edit: ...and fix suggested by @dsyme:

could be

        let mutable acc = e.Current

        while e.MoveNext() do
            acc <- max acc e.Current

        acc

as far as I can tell would result in 4x NaN, as opposed to the behavior expected by @Happypig375. So it would be best to explicitly agree on the expected behavior.

abonie avatar Jun 07 '22 06:06 abonie

based on how min/max works currently, the actual expected behavior is

NaN
NaN
NaN
NaN

This is how it behaves currently.

The above is not quite correct and that's why I am trying to clarify. The actual observed behavior, both by @Happypig375 based on their original post and by me is in fact:

Actual behavior

NaN
4
4
3

edit: ...and fix suggested by @dsyme:

could be

        let mutable acc = e.Current

        while e.MoveNext() do
            acc <- max acc e.Current

        acc

as far as I can tell would result in 4x NaN, as opposed to the behavior expected by @Happypig375. So it would be best to explicitly agree on the expected behavior.

Sorry, I misread your comment. I personally would expect that Seq.max and Seq.min would never return nan unless it is the only element in the array. The proposed change to replace inner implementation to use built-in function will result in the opposite behaviour.

Not sure which default behaviour @dsyme has in mind.

vzarytovskii avatar Jun 07 '22 07:06 vzarytovskii

I personally think returning NaN upon encountering is not helpful when nan is viewed as "missing data". Meanwhile, there are alternative expectations as seen here image By sort, nan would be the minimum. By sum and average, nan would be returned upon encountering.

Happypig375 avatar Jun 07 '22 07:06 Happypig375

I mean, technically all that follow IEEE 754, since NaN <> NaN, and both x > NaN and x < NaN are false.

vzarytovskii avatar Jun 07 '22 08:06 vzarytovskii

My understanding is min and max should return NaN in all four cases.

We should compare with LINQ

dsyme avatar Jun 07 '22 19:06 dsyme

LINQ is an odd mix, seemingly making NaN the lowest number, so it only affects Min, not Max.

See https://sharplab.io/#v2:C4LgTgrgdgNAJiA1AHwAICYCMBYAUKgBgAJVMA6AGQEsoBHAbjz1QGYT0iBhIgbzyIEk2pAGwkALEQCyAQxoAKUgQDaAXSIywAcwBuASl79Bx0gE558qAFMA7kTgB7CACMANlbW97Tt1bIA5GX8YInQyAhCWcKIAXz0yWQAPeT09RlxjE0xzS1tvF3dPHiIoiPzfAKCQsOI4hJlk1PTMwTMLaztHAo91YtKQrorA4NDouqkFJqMWttzOn0LekvCBhb9h6rH4iagUtOnYvBigA===

Systematic alignment with LINQ Max() and Min() seems wise. Or else making no change at all.

dsyme avatar Jun 07 '22 19:06 dsyme

My understanding is min and max should return NaN in all four cases.

We should compare with LINQ

Yeah, I did some comparisons: https://sharplab.io/#v2:C4LgTgrgdgNAJiA1AHwAICYCMBYAUKgBgAJVMA6AGQEsoBHAbjz1QGYT0iBhIgbzyIEk2pAGwkALEQCyAQxoAKUgQDaAXSIB9AJS9+g/aQCc8gKJQIAWwCmYGQCMANlbKyAHvKhWA7kTgB7CEcrNV5fAKCyADkZSJgicgI49DJiAF8tLUZcfQNMYzNLG3snFxl3Tx9/QKcQniJkxLDq52jY+JSidMy9HJI803NrWwipBQqmoNqJktaklLiWDq6s3oEjAcLhktGoD29p4PU6hriqiNn2tIyVwVS8VKA==

Min is weird one, seems NaN is always lowest.

Update: oh, @dsyme beat me to it.

vzarytovskii avatar Jun 07 '22 19:06 vzarytovskii

Would the min and max operator need to be also updated for consistency . See https://github.com/dotnet/fsharp/issues/5647

edgarfgp avatar Jul 22 '22 19:07 edgarfgp

@edgarfgp, I've taken that on, see #13558. So far I've only added tests, but that by and of itself exhibited some behavior that may need addressing.

Min is weird one, seems NaN is always lowest.

@dsyme/@vzarytovskii, considering that .NET as a whole is moving to more conformance towards IEEE 754:2019, I think it's probably best not to mimic the bug from LINQ. Besides, if that one in particular relies on Math.Min, there's https://github.com/dotnet/runtime/pull/70865 by @tannergooding, which fixes that behavior.

My understanding is min and max should return NaN in all four cases.

That's also my understanding. It would otherwise be quite surprising w.r.t. the basic rule with NaN: if either operand is NaN, return NaN.

abelbraaksma avatar Jul 25 '22 10:07 abelbraaksma

I think it's probably best not to mimic the bug from LINQ.

Noting that LINQ isn't a "bug" here, just a method with an inconvenient name.

The IEEE 754:2008 spec defines:

  • maxNum
  • maxNumMag
  • minNum
  • minNumMag

These generally correspond to the following IEEE 754:2019 functions:

  • maximumNumber
  • maximumNumberMagnitude
  • minimumNumber
  • minimumNumberMagnitude

Both 2008 and 2019 agree with how NaN vs number are handled and require:

  • minimumNumber and maximumNumber given f(nan, number) must return number

The magnitude functions do the comparison using abs(x) and abs(y), but return the original x or y, respectively (so maxNumMag(2, -3) returns -3, because abs(2) < abs(-3)).

However, there was an issue in that the 2008 definition differed from how many runtimes implementation max/min and didn't specify the behavior of -0 vs +0 which was also much contended.

The IEEE 754:2019 spec then explicitly defined:

  • maximum
  • maximumMagnitude
  • minimum
  • minimumMagnitude

These functions are "inverted" from their *Number counterparts and given f(nan, number) must return nan.

This, along with the APIs now being "recommended" rather than "required" operations allows existing runtimes to conformingly specify how their APIs map to the IEEE 754 spec. The one deviation is that 2019 requires a well-defined behavior for -0 vs +0 (and given that -0 == +0 this is generally "not an issue" for APIs to change):

  • maximum, maximumNumber, and their Magnitude counterparts given f(-0, +0) must return +0
  • minimum, minimumNumber, , and their Magnitude counterparts given f(-0, +0) must return -0

F# is "free" to make its operations conform to "either" behavior, it is likewise free to expose both. In a different world LINQ might've made Enumable.Max/Min match Math.Max/Min and then we could've exposed Enumerable.MaxNumber/MinNumber (matching the names that now exist for Max, MaxMagnitude, MaxNumber, and MaxNumberMagnitude which are exposed on Double, Half, and Single), but hindsight always comes after the fact 😄

tannergooding avatar Jul 25 '22 17:07 tannergooding

seemingly making NaN the lowest number, so it only affects Min, not Max.

Missed this bit. This does seem like a bug and an undesirable one at that. If someone is willing to log an issue, then we should probably look at this and make a determination.

tannergooding avatar Jul 25 '22 17:07 tannergooding