List/Array/Seq.max/min/maxBy/minBy may return nan if the sequence starts with nan
The position where NaN appears changes what Seq.max returns.
Repro steps
printfn $"{Seq.max [nan; 2.; 1.; 4.]}"
printfn $"{Seq.max [3.; nan; 1.; 4.]}"
printfn $"{Seq.max [3.; 2.; nan; 4.]}"
printfn $"{Seq.max [3.; 2.; 1.; nan]}"
Expected behavior
4
4
4
3
Actual behavior
NaN
4
4
3
Known workarounds
Filter away all NaNs first.
Related information
SharpLab https://sharplab.io/#v2:DYLgZgzgNALiBOBXAdgHwA7wJbJmZABACQBEA3gMoCmAjgHQC2AhgB4EDayTyA3AQEx0+ARiEEALHQC6AXxIBYAFCYceQqUq1GrDgGYxXXgVF9Jshcuy58xctXrM27fX0F9Dp6XKUrr6u1qOemJuxgbc5kA=
FYI, Seq.min behaves the same, which, technically, makes sense, since:
> 4. > nan;;
val it: bool = false
> 4. < nan;;
val it: bool = false
So, current will remain nan forever, since it was initially bound as first element.
Tricky problem, cf. min/max:
> max nan 1.0;;
val it: float = nan
> max 1.0 nan;;
val it: float = nan
> min 1.0 nan;;
val it: float = nan
> min nan 1.0;;
val it: float = nan
Code here:
let mutable acc = e.Current
while e.MoveNext() do
let curr = e.Current
if curr > acc then acc <- curr
acc
could be
let mutable acc = e.Current
while e.MoveNext() do
acc <- max acc e.Current
acc
Fixing it does seem to make sense. Might be slower
Noting List, Array, maxBy, minBy also have this problem
@vzarytovskii I do think this should be fixed. Labelling as good-first-issue
Just to clarify, for the example in the original post:
Repro steps
printfn $"{Seq.max [nan; 2.; 1.; 4.]}" printfn $"{Seq.max [3.; nan; 1.; 4.]}" printfn $"{Seq.max [3.; 2.; nan; 4.]}" printfn $"{Seq.max [3.; 2.; 1.; nan]}"Expected behavior
4 4 4 3
based on how min/max works currently, the actual expected behavior is
NaN
NaN
NaN
NaN
correct?
edit: maybe I shouldn't have used the phrase "the actual expected behavior" when trying to clarify things 😄
Expected behavior
4 4 4 3
This is how @Happypig375 expected Seq.max to behave.
based on how min/max works currently, the actual expected behavior is
NaN NaN NaN NaN
This is how it behaves currently.
based on how min/max works currently, the actual expected behavior is
NaN NaN NaN NaNThis is how it behaves currently.
The above is not quite correct and that's why I am trying to clarify. The actual observed behavior, both by @Happypig375 based on their original post and by me is in fact:
Actual behavior
NaN 4 4 3
edit: ...and fix suggested by @dsyme:
could be
let mutable acc = e.Current while e.MoveNext() do acc <- max acc e.Current acc
as far as I can tell would result in 4x NaN, as opposed to the behavior expected by @Happypig375. So it would be best to explicitly agree on the expected behavior.
based on how min/max works currently, the actual expected behavior is
NaN NaN NaN NaNThis is how it behaves currently.
The above is not quite correct and that's why I am trying to clarify. The actual observed behavior, both by @Happypig375 based on their original post and by me is in fact:
Actual behavior
NaN 4 4 3edit: ...and fix suggested by @dsyme:
could be
let mutable acc = e.Current while e.MoveNext() do acc <- max acc e.Current accas far as I can tell would result in 4x NaN, as opposed to the behavior expected by @Happypig375. So it would be best to explicitly agree on the expected behavior.
Sorry, I misread your comment. I personally would expect that Seq.max and Seq.min would never return nan unless it is the only element in the array.
The proposed change to replace inner implementation to use built-in function will result in the opposite behaviour.
Not sure which default behaviour @dsyme has in mind.
I personally think returning NaN upon encountering is not helpful when nan is viewed as "missing data".
Meanwhile, there are alternative expectations as seen here
By sort, nan would be the minimum.
By sum and average, nan would be returned upon encountering.
I mean, technically all that follow IEEE 754, since NaN <> NaN, and both x > NaN and x < NaN are false.
My understanding is min and max should return NaN in all four cases.
We should compare with LINQ
LINQ is an odd mix, seemingly making NaN the lowest number, so it only affects Min, not Max.
See https://sharplab.io/#v2:C4LgTgrgdgNAJiA1AHwAICYCMBYAUKgBgAJVMA6AGQEsoBHAbjz1QGYT0iBhIgbzyIEk2pAGwkALEQCyAQxoAKUgQDaAXSIywAcwBuASl79Bx0gE558qAFMA7kTgB7CACMANlbW97Tt1bIA5GX8YInQyAhCWcKIAXz0yWQAPeT09RlxjE0xzS1tvF3dPHiIoiPzfAKCQsOI4hJlk1PTMwTMLaztHAo91YtKQrorA4NDouqkFJqMWttzOn0LekvCBhb9h6rH4iagUtOnYvBigA===
Systematic alignment with LINQ Max() and Min() seems wise. Or else making no change at all.
My understanding is min and max should return NaN in all four cases.
We should compare with LINQ
Yeah, I did some comparisons: https://sharplab.io/#v2:C4LgTgrgdgNAJiA1AHwAICYCMBYAUKgBgAJVMA6AGQEsoBHAbjz1QGYT0iBhIgbzyIEk2pAGwkALEQCyAQxoAKUgQDaAXSIB9AJS9+g/aQCc8gKJQIAWwCmYGQCMANlbKyAHvKhWA7kTgB7CEcrNV5fAKCyADkZSJgicgI49DJiAF8tLUZcfQNMYzNLG3snFxl3Tx9/QKcQniJkxLDq52jY+JSidMy9HJI803NrWwipBQqmoNqJktaklLiWDq6s3oEjAcLhktGoD29p4PU6hriqiNn2tIyVwVS8VKA==
Min is weird one, seems NaN is always lowest.
Update: oh, @dsyme beat me to it.
Would the min and max operator need to be also updated for consistency . See https://github.com/dotnet/fsharp/issues/5647
@edgarfgp, I've taken that on, see #13558. So far I've only added tests, but that by and of itself exhibited some behavior that may need addressing.
Min is weird one, seems NaN is always lowest.
@dsyme/@vzarytovskii, considering that .NET as a whole is moving to more conformance towards IEEE 754:2019, I think it's probably best not to mimic the bug from LINQ. Besides, if that one in particular relies on Math.Min, there's https://github.com/dotnet/runtime/pull/70865 by @tannergooding, which fixes that behavior.
My understanding is min and max should return NaN in all four cases.
That's also my understanding. It would otherwise be quite surprising w.r.t. the basic rule with NaN: if either operand is NaN, return NaN.
I think it's probably best not to mimic the bug from LINQ.
Noting that LINQ isn't a "bug" here, just a method with an inconvenient name.
The IEEE 754:2008 spec defines:
-
maxNum -
maxNumMag -
minNum -
minNumMag
These generally correspond to the following IEEE 754:2019 functions:
-
maximumNumber -
maximumNumberMagnitude -
minimumNumber -
minimumNumberMagnitude
Both 2008 and 2019 agree with how NaN vs number are handled and require:
-
minimumNumberandmaximumNumbergivenf(nan, number)must returnnumber
The magnitude functions do the comparison using abs(x) and abs(y), but return the original x or y, respectively (so maxNumMag(2, -3) returns -3, because abs(2) < abs(-3)).
However, there was an issue in that the 2008 definition differed from how many runtimes implementation max/min and didn't specify the behavior of -0 vs +0 which was also much contended.
The IEEE 754:2019 spec then explicitly defined:
-
maximum -
maximumMagnitude -
minimum -
minimumMagnitude
These functions are "inverted" from their *Number counterparts and given f(nan, number) must return nan.
This, along with the APIs now being "recommended" rather than "required" operations allows existing runtimes to conformingly specify how their APIs map to the IEEE 754 spec. The one deviation is that 2019 requires a well-defined behavior for -0 vs +0 (and given that -0 == +0 this is generally "not an issue" for APIs to change):
-
maximum,maximumNumber, and theirMagnitudecounterparts givenf(-0, +0)must return+0 -
minimum,minimumNumber, , and theirMagnitudecounterparts givenf(-0, +0)must return-0
F# is "free" to make its operations conform to "either" behavior, it is likewise free to expose both. In a different world LINQ might've made Enumable.Max/Min match Math.Max/Min and then we could've exposed Enumerable.MaxNumber/MinNumber (matching the names that now exist for Max, MaxMagnitude, MaxNumber, and MaxNumberMagnitude which are exposed on Double, Half, and Single), but hindsight always comes after the fact 😄
seemingly making NaN the lowest number, so it only affects Min, not Max.
Missed this bit. This does seem like a bug and an undesirable one at that. If someone is willing to log an issue, then we should probably look at this and make a determination.