MVE arm_scale_q15 does not operate as specified
The MVE implementation of arm_scale_q15 does not use a 2.30 intermediate, as per the documentation and other implementations, but truncates the multiplication immediately to 2.14. This reduces the output range and produces irregular steps when scaling up significantly, for example, in extremis:
Given scaleFract = 0x6666, shift = 13
| input | 2.30 intermediate | non-MVE result | 2.14 intermediate | MVE result |
|---|---|---|---|---|
| 0x0000 | 0x00000000 | 0x0000 | 0x0000 | 0x0000 |
| 0x0001 | 0x00006666 | 0x1999 | 0x0000 | 0z0000 |
| 0x0002 | 0x0000CCCC | 0x3333 | 0x0000 | 0x0000 |
| 0x0003 | 0x00013332 | 0x4CCC | 0x0001 | 0x4000 |
| 0x0004 | 0x00019998 | 0x6666 | 0x0001 | 0x4000 |
| 0x0005 | 0x0001FFFE | 0x7FFF | 0x0001 | 0x4000 |
I've not checked how far this extends to other MVE variants - this was the function I was using and noticed.
@kjbracey Thank for reporting this. It is a bug.
MVE versions are not strictly identical to the scalar versions and may give a slightly different result. It was required (sometimes) for vectorizing more easily.
But the difference should be minimal. Here, it is clearly not a small difference.
@kjbracey It looks like the scalar versions are also not consistent. q31 is first doing a >> 32 before applying the shift.
So it is not just a consistency issue with MVE versions.
So it does. Yet the high-precision intermediate is very clearly specified, and would seem to be a key feature of this specific function.
Although other characteristics might be suitable for most users - eg I replaced my call to q15 with one using a float16 intermediate because always getting a consistent 11 bits of precision was fine.
Maybe there could be an API fork between the "high precision, no loss near unity gain" expensive version as currently specified and a "pretty good precision" performance version.
I have changed the MVE implementations for arm_scale_q15 and arm_scale_q7. Now they behave like the scalar version but they take more cycles.
For the q31 (both scalar and MVE) : nothing has changed but I modified the documentation to make to make it clear it is doing a shift by 32 before applying the shift argument so there is a loss of accuracy.
Seems reasonable to me. I haven't tested it yet though. Thanks!