Float8s.jl
Float8s.jl copied to clipboard
A number format that you can count with your fingers.
the new H100 from nvidia has 8-bit floats in two flavors: 4 bits for the exponent like Float8s.jl's Float8_4, and 5 bits. scroll down to "NVIDIA Hopper FP8 data format"...
just found this ```julia julia> Float8_4(0xff) NaN8_4 julia> Float32(Float8_4(0xff)) 0.0f0 julia> Float8(0xff) NaN8 julia> Float32(Float8(0xff)) 0.0f0 ``` Which obv shouldn't happen.
conversion
Just found this ```julia julia> Float8(0.015) Float8(0.0625) julia> Float8(0.02) Float8(0.015625) ``` although `0.015 bitstring(Float8(0.015)) "00000100" julia> bitstring(Float8(0.02)) "00000001" ```
``` ┌ Float8s │ WARNING: Constructor for type "Bool" was extended in `Float8s` without explicit qualification or import. │ NOTE: Assumed "Bool" refers to `Base.Bool`. This behavior is deprecated and...