Question about RunTime Errors

Open wwells opened this issue 9 years ago • 1 comments

In the function library I see this in the description of "+" and some other functions: Details:

Float and double overflows do not produce runtime errors but result in positive or negative infinity, which would be carried through any subsequent calculations (see IEEE 754). Use impute.ensureFinite to produce errors from infinite or NaN values.

Runtime Errors:

#18000: Integer results above or below -2147483648 and 2147483647 (inclusive) produce an "int overflow" runtime error.
#18001: Long-integer results above or below -9223372036854775808 and 9223372036854775807 (inclusive) produce a "long overflow" runtime error.

So first it says that there are no runtime errors for overflow, then it says that there are. It would be good to decide on one of those. Or maybe there are only if impute.ensureFinite is used? How can one use it?

Apr 04 '16 18:04 wwells

The documentation is correct, but maybe unclear. Integer addition produces runtime exceptions because integers don't have an infinite value. Floating point addition produces IEEE infinity. (I just tested it.) This was a policy choice and there could be others, which the DMG could decide to add as new functions (+' or something).

The choices for policy are:

Let integers wrap around (big number plus big number equals negative number), which is what C/C++ and Java do by default. It's a terrible choice because it's completely not what a data analyst expects (it's done for hardware efficiency).
Use arbitrary-precision integers, like Python. Unfortunately, the Avro type system doesn't support this--- it only allows 32 and 64 bit integers.
Have integer addition always return a floating point result so that infinity is a possibility--- bad because analysts expect adding integers to produce integers.
Have floating point overflows raise runtime exceptions instead of returning infinity, for consistency. Not a bad choice, but there are other floating point functions, like m.exp, that produce infinity on large values. So what would become consistent between integral and floting point types would become inconsistent among floating point functions.
Others?

The documentation that exists is correct but could be clarified (update the third digit of the version number). The DMG group might decide to add additional policies as new functions (update the second digit). The existing functions can't be changed witout a deprecation phase, and this change can't involve a deprecation phase because it doesn't involve a change to method signatures: users wouldn't be able to choose between the deprcated form and the new form unless there's some different syntax for each.

Apr 04 '16 19:04 jpivarski