Unboxed float arguments
This PR allows for reals to be passed to functions unboxed in xmm registers. An arbitrary number of unboxed floats can be passed to functions (up to 8 in xmm registers (xmm0-xmm7, the remaining on the stack).
The internal language LambdaExp supports both boxed reals (of type real) and unboxed floats (of type f64). Values of type f64 are not allowed to be stored inside data structures. The optimiser (found in OptLambda) performs a number of unboxing transformations:
- Inside a function, expressions and variables of type
realare converted to expressions of typef64at consumption sites and operations on values of typerealare translated into operations on values of typef64. Notice that values that are not only consumed (perhaps also stored in a data structure) are not represented unboxed. - When a function takes a tuple (records are translated into tuples) as argument and the tuple is consumed (see below), it is passed to the function unboxed.
- An element of type
realin the tuple is passed unboxed if it is consumed by the function; see below. - An optimisation seeks to uncurry curried functions when possible.
Notice that a tuple (or a real) is consumed by a function if the prospect is not used as a value in its own right (e.g., stored in a reference cell, passed boxed to another function, or used in a constructed value).
To determine if a tuple element of type real is consumed by a function, mutually recursive functions (bound in a FIX construct) are analysed simultaneously.
FIX b1...bn b ::= f xs = e
The i'th argument x:real of a function f is consumed if all occurrences of x in the body of f are consumed. More formally,
-
xis consumed inlet y = x in eif bothyis consumed ineandxis consumed ine -
xis consumed ineifxnot in fv(e) -
xis consumed in__real_to_f64 x -
xis consumed inf(..x..)iffconsumes the arguments for whichxis passed (assuming this property holds for the i'th position) -
xis consumed ing(..x..)ifgconsumes its i'th argument
Once it is determined that an argument x is consumed by a function f, appearances of x inside the body of f are replaced with the expression __f64_to_real x, which makes the contexts in which x appears type correct. Moreover, calls to f inside the body of f are adjusted with __real_to_f64 wrappers around the appropriate argument.
This PR also takes care of extending the register allocation algorithm to support both float registers and general purpose registers [1].
Support for (multiple) unboxed function results will be treated in another PR.
[1] Lal George and Andrew W. Appel. 1996. Iterated register coalescing. ACM Trans. Program. Lang. Syst. 18, 3 (May 1996), 300–324. DOI:https://doi.org/10.1145/229542.229546