vortex
vortex copied to clipboard
Introduce a BF16 DType
bf16 (aka bfloat16) is a floating point number format introduced by Google to improve storage utilization and computation speed for machine learning models. It has the roughly the same range as a standard IEEE 754 float32 but with much reduced precision (8-bit mantissa instead of 24). Due to its popularity, more and more hardware vendors now support specialized instructions for it, including recent AVX extensions and GPU vendors.
Open questions in my mind are:
- [ ] Is it an extension dtype?
- [ ] How does it canonicalize into Arrow? Seems like there were efforts to introduce it as an "official" extension but nothing materialized as far as I can tell.