CMSIS-DSP icon indicating copy to clipboard operation
CMSIS-DSP copied to clipboard

MVE arm_scale_q15 does not operate as specified

Open kjbracey opened this issue 3 years ago • 3 comments

The MVE implementation of arm_scale_q15 does not use a 2.30 intermediate, as per the documentation and other implementations, but truncates the multiplication immediately to 2.14. This reduces the output range and produces irregular steps when scaling up significantly, for example, in extremis:

Given scaleFract = 0x6666, shift = 13

input 2.30 intermediate non-MVE result 2.14 intermediate MVE result
0x0000 0x00000000 0x0000 0x0000 0x0000
0x0001 0x00006666 0x1999 0x0000 0z0000
0x0002 0x0000CCCC 0x3333 0x0000 0x0000
0x0003 0x00013332 0x4CCC 0x0001 0x4000
0x0004 0x00019998 0x6666 0x0001 0x4000
0x0005 0x0001FFFE 0x7FFF 0x0001 0x4000

I've not checked how far this extends to other MVE variants - this was the function I was using and noticed.

kjbracey avatar Oct 19 '22 05:10 kjbracey

@kjbracey Thank for reporting this. It is a bug.

MVE versions are not strictly identical to the scalar versions and may give a slightly different result. It was required (sometimes) for vectorizing more easily.

But the difference should be minimal. Here, it is clearly not a small difference.

christophe0606 avatar Oct 20 '22 06:10 christophe0606

@kjbracey It looks like the scalar versions are also not consistent. q31 is first doing a >> 32 before applying the shift.

So it is not just a consistency issue with MVE versions.

christophe0606 avatar Oct 20 '22 07:10 christophe0606

So it does. Yet the high-precision intermediate is very clearly specified, and would seem to be a key feature of this specific function.

Although other characteristics might be suitable for most users - eg I replaced my call to q15 with one using a float16 intermediate because always getting a consistent 11 bits of precision was fine.

Maybe there could be an API fork between the "high precision, no loss near unity gain" expensive version as currently specified and a "pretty good precision" performance version.

kjbracey avatar Oct 20 '22 07:10 kjbracey

I have changed the MVE implementations for arm_scale_q15 and arm_scale_q7. Now they behave like the scalar version but they take more cycles.

For the q31 (both scalar and MVE) : nothing has changed but I modified the documentation to make to make it clear it is doing a shift by 32 before applying the shift argument so there is a loss of accuracy.

christophe0606 avatar Oct 24 '22 05:10 christophe0606

Seems reasonable to me. I haven't tested it yet though. Thanks!

kjbracey avatar Oct 25 '22 11:10 kjbracey