Use separate PQs in each cluster
Currently the same product quantizer is used for every cluster in IVF. However, the PQ doesn't use a lot of space (it's just 16 center points), so we might as well train a separate one for the data in each cluster.
The main disadvantage is that queries would have to compute a distance table for each PQ. It's unclear how much that currently is a bottle-neck compared to the actual pass 1 and pass 2 filtering.
An advantage is that we can quantize data[mask] - center instead of data[mask] as we do now.
I believe this is what QuickADC actually does.
By subtracting the "main component" of the points we thus gain the ability to scale up the scalars before we map to [-128, 127], allowing higher precision.
Currently the distance table computation takes way too long to consider using more than one, as is evident from this profiling screenshot.

I did some work trying to speed up distance_table(...) in 2c16ac4471974b30e54904da3070faadf58e352d, but unfortunately I wasn't able to improve it much. Maybe the whole thing needs to be moved to cython, as disappointing as that would be.