Eclector icon indicating copy to clipboard operation
Eclector copied to clipboard

Handle reading of floating point numbers better

Open scymtym opened this issue 5 years ago • 13 comments

There are at least two things to improve here:

  • Allow the client to make the floating point number itself by calling a new generic function with the individual pieces (desired type, sign, exponent, …).

  • Provide a default implementation that uses a more efficient and/or more numerically stable algorithm.

    Potentially relevant: https://dl.acm.org/doi/pdf/10.1145/93548.93557

scymtym avatar May 15 '20 13:05 scymtym

Aubrey Jaffer wrote a brief article on this: https://arxiv.org/abs/1310.8121v1

The code in "Reading" is in Scheme, but should be easily adaptable. It's only a few more lines than what Eclector has now. The round-quotient function described is probably doable as cl:round plus some extra adaptation.

If I understand correctly, this algorithm is less efficient than Clinger's, as it uses more and bigger intermediate bignums. But it might be an improvement over the current procedure.

Bike avatar May 15 '20 13:05 Bike

Thank you.

scymtym avatar May 16 '20 12:05 scymtym

upon further examination, no adaptation of round is actually needed. So you can pretty much use the code directly, substituting round for round-quotient, and float for exact->inexact and such. Plus you can use ash directly. Overall, it's

(defconstant +double-mantissa-digits+ 53)

(defun mantexp->dbl (mantissa exponent &key (base 10))
  (declare (type integer mantissa exponent))
  (if (minusp exponent)
      (let* ((scale (expt base (- exponent)))
             (bex (- (integer-length mantissa)
                     (integer-length scale)
                     +double-mantissa-digits+))
             (numerator (ash mantissa (- bex)))
             (quotient (round numerator scale)))
        (when (> (integer-length quotient) +double-mantissa-digits+)
          ;; too many bits of quotient
          (setf bex (1+ bex)
                quotient (round numerator (* scale 2))))
        (scale-float (float quotient 1d0) bex))
      (float (* mantissa (expt base exponent)) 1d0)))

Based on a quick test where I generate a million random pairs of mantissa and exponent, this results in exactly the same numbers that SBCL reads, so I think it works?

Bike avatar Mar 05 '21 21:03 Bike

Looking into the Jaffer article. Quite the version history there. Conversion from HTML to PDF, then rewrite in Java.

scymtym avatar Mar 07 '21 15:03 scymtym

With extensions for

  • both float types
  • negative zeros
  • "exploded" mantissa representation provided by the token lexer That is, -12.34 is represented as sign = -1 mantissa = 1234 decimal-exponent = 2

I ended up with the following:

;;; Based on "v1" version of Aubrey Jaffer (2018): Easy Accurate
;;; Reading and Writing of Floating-Point Numbers. Initial conversion
;;; to Common Lisp by Alex Wood.
(defun mantissa-and-exponent-to-float
    (sign mantissa decimal-exponent exponent-sign exponent float-type)
  (declare (type (member -1 1) sign)
           (type integer mantissa)
           (type (integer 0) decimal-exponent)
           (type (or null integer) exponent)
           (type (member -1 1) exponent-sign)
           (type (member single-float double-float float-type)))
  (macrolet
      ((convert (type)
         (let* ((prototype (ecase type
                             (single-float 0f0)
                             (double-float 0d0)))
                (precision (float-precision (ecase type
                                              (single-float 1f0)
                                              (double-float 1d0)))))
           `(labels ((negative-exponent (mantissa exponent)
                       (let* ((scale (expt 10 exponent))
                              (bex (- (integer-length mantissa)
                                      (integer-length scale)
                                      ,precision))
                              (numerator (ash mantissa (- bex)))
                              (quotient (round numerator scale)))
                         (when (> (integer-length quotient) ,precision)
                           (setf bex (1+ bex)
                                 quotient (round numerator (* scale 2))))
                         (scale-float (float quotient ,prototype) bex)))
                     (positive-exponent (mantissa exponent)
                       (float (* mantissa (expt 10 exponent)) ,prototype))
                     (uncertain-exponent (mantissa exponent)
                       (if (minusp exponent)
                           (negative-exponent mantissa (- exponent))
                           (positive-exponent mantissa exponent))))
              (cond ((eql mantissa 0) ; (negative) zero
                     (if (eql sign -1)
                         ,(- prototype)
                         ,prototype))
                    ((and (null exponent) (zerop decimal-exponent))
                     (break "cannot happen"))
                    ((null exponent)
                     (negative-exponent (* sign mantissa) decimal-exponent))
                    ((zerop decimal-exponent)
                     (if (eql exponent-sign 1)
                         (positive-exponent (* sign mantissa) exponent)
                         (negative-exponent (* sign mantissa) exponent)))
                    (t ; both, "decimal exponent" and "scientific exponent"
                     (uncertain-exponent
                      (* sign mantissa) (- (* exponent-sign exponent)
                                           decimal-exponent))))))))
    (ecase float-type
      (single-float (convert single-float))
      (double-float (convert double-float)))))

scymtym avatar Mar 07 '21 18:03 scymtym

I think this is where left off:

;;; Taken from SBCL code
;;; Truncate EXPONENT if it's too large for a float.
(defun truncate-exponent (exponent mantissa decimal-exponent)
  ;; Work with base-2 logarithms to avoid conversions to floats, and
  ;; convert to base-10 conservatively at the end.  Use the least
  ;; positive float, because denormalized exponent can be larger than
  ;; normalized.
  (let* ((max-exponent          (+ sb-vm:double-float-digits
                                   sb-vm:double-float-bias))
         (mantissa-bits         (integer-length mantissa))
         (decimal-exponent-bits (1- (integer-length decimal-exponent)))
         (magnitude             (- mantissa-bits decimal-exponent-bits)))
    (if (minusp exponent)
        (max exponent (ceiling (- (+ max-exponent magnitude))
                               #.(cl:floor (cl:log 10 2))))
        (min exponent (floor (- max-exponent magnitude)
                             #.(cl:floor (cl:log 10 2)))))))

;;; Based on "v1" version of Aubrey Jaffer (2018): Easy Accurate
;;; Reading and Writing of Floating-Point Numbers. Initial conversion
;;; to Common Lisp by Alex Wood.
(defun mantissa-and-exponent-to-float
    (sign mantissa decimal-exponent exponent-sign exponent float-type)
  (declare (type (member -1 1) sign)
           (type integer mantissa)
           (type (integer 0) decimal-exponent)
           (type (or null integer) exponent)
           (type (member -1 1) exponent-sign)
           (type (member single-float double-float) float-type)
           (optimize speed))
  (macrolet
      ((convert (type)
         (let* ((prototype (ecase type
                             (single-float 0f0)
                             (double-float 0d0)))
                (precision (float-precision (ecase type
                                              (single-float 1f0)
                                              (double-float 1d0)))))
           `(labels ((negative-exponent (mantissa exponent)
                       (let* ((exponent (truncate-exponent
                                         exponent mantissa decimal-exponent))
                              (scale (expt 10 exponent))
                              (bex (- (integer-length mantissa)
                                      (integer-length scale)
                                      ,precision))
                              (numerator (ash mantissa (- bex)))
                              (quotient (round numerator scale)))
                         (when (> (integer-length quotient) ,precision)
                           (setf bex (1+ bex)
                                 quotient (round numerator (* scale 2))))
                         (scale-float (float quotient ,prototype) bex)))
                     (positive-exponent (mantissa exponent)
                       (let ((exponent (truncate-exponent
                                        exponent mantissa decimal-exponent)))
                         (float (* mantissa (expt 10 exponent)) ,prototype)))
                     (uncertain-exponent (mantissa exponent)
                       (if (minusp exponent)
                           (negative-exponent mantissa (- exponent))
                           (positive-exponent mantissa exponent))))
              (cond ((eql mantissa 0)   ; (negative) zero
                     (if (eql sign -1)
                         ,(- prototype)
                         ,prototype))
                    ((null exponent)
                     (negative-exponent (* sign mantissa) decimal-exponent))
                    ((zerop decimal-exponent)
                     (if (eql exponent-sign 1)
                         (positive-exponent (* sign mantissa) exponent)
                         (negative-exponent (* sign mantissa) exponent)))
                    (t ; both, "decimal exponent" and "scientific exponent"
                     (uncertain-exponent
                      (* sign mantissa) (- (* exponent-sign exponent)
                                           decimal-exponent))))))))
    (ecase float-type
      (single-float (convert single-float))
      (double-float (convert double-float)))))

scymtym avatar Jun 16 '24 15:06 scymtym

Is the decimal-exponent referenced from the start of the mantissa or the end? Its ambiguous in your example for 12.34. For 12.345 is it 2 or 3?

From a conversion perspective 3 would probably be better because that just represents 10^(-3), whereas if you reference it from the start as 2, then the converter will probably need to know how many digits were in the string representation.

yitzchak avatar Jun 24 '24 19:06 yitzchak

Provided that the decimal-exponent is referenced from the end of the mantissa, then mantissa-and-exponent-to-float could be implemented via Quaviver:

(defun mantissa-and-exponent-to-float
    (sign mantissa decimal-exponent exponent-sign exponent float-type)
  (quaviver:integer-float (make-instance 'quaviver/liebler:client)
                          float-type
                          10
                          mantissa
                          (- (* exponent-sign (or exponent 0))
                             decimal-exponent)
                          sign))

Right now we have only one base 10 to base 2 client, quaviver/liebler. There is no state in the client instance so you can just reuse the instance across multiple mantissa-and-exponent-to-float calls.

yitzchak avatar Jun 25 '24 12:06 yitzchak

It is from the end like you suspected so everything should work out.

Reusing a single client instance would be a prerequisite since otherwise the consing and object initialization would possibly dominate.

I will probably compare the quaviver version against the Jaffer/SBCL code above at some point. Compared to the completely naive implementation that Eclector currently uses, the Jaffer-based code achieved around 20 - 30 % improvements in both run time and consing.

scymtym avatar Jun 25 '24 13:06 scymtym

You could also use the Eclector client and add the Quaviver as a mixin.

yitzchak avatar Jun 25 '24 13:06 yitzchak

You could also use the Eclector client and add the Quaviver as a mixin.

That's a good idea in principle but Eclector does not require client classes to have anything at all in their superclass list. So existing clients would have to be adjusted to add the appropriate mixin (or newly exposed superclass).

One idea could be to have something like (floating-point-behavior eclector-client) in order to retrieve an appropriate quaviver client object. This style was used in the Lisp Interface Library: interfaces could contain "sub-interfaces" for parametrizing individual aspects. I will have to think about possible solutions some more.

scymtym avatar Jun 25 '24 13:06 scymtym

Makes sense.

yitzchak avatar Jun 25 '24 13:06 yitzchak

I've added an implementation of Jaffer v7 to Quaviver. It hasn't been optimized or tested yet. It is doing bignum expt-5 right now which could be replaced with some table lookups for an easy speedup.

yitzchak avatar Jun 26 '24 01:06 yitzchak