mdanalysis Multi-precision and dtypes

Discussion around #3544 and #3500 has raised some interesting points about floating point precision and dtypes.

The current strategy in MDA is to squash down to an np.float32 on box and coordinate creation, with no flexibility allowed, however some routines still do their accumulation in double precision. Additionally, the datatype is not always respected with some operations promoting back to a larger dtype, where this information could be passed through without data loss.

For example, when a box is created with:


boxtype, checked_box = check_box(np.array([1., 1., 1., 90., 90., 90.])) # input an np.float64 box
checked_box.dtype # narrowed to np.float32

finding the box volume re-promotes:

 dim = np.asarray(dimensions, dtype=np.float64)
    lx, ly, lz, alpha, beta, gamma = dim
...
    return volume

then if we are to use our f32 box again in the C level routine _ortho_pbc the following snippet widens it again.

    const double inverse_box[3] = {1.0 / (double) box[0], \
                                   1.0 / (double) box[1], \
                                   1.0 / (double) box[2]};

#3544 introduces some flexibility in datatypes for box manipulation while retaining all the same default behaviour as before.

How do people feel about the possibility of adding more support for different datatypes and/or enabling more flexibility for data to be passed through? This would be a very gradual process and doesn't have to be complete, just for the things we think might benefit.

As I see it there are benefits and drawbacks either way.

Benefits

We move away from a more haphazard system of doing some things in some precision and others in others on a case by case basis where we shuffle datatypes back and forth.
More accuracy can be passed through if required for certain things.
If other packages with which we integrate offer or require higher precision, we are a guaranteed data loss step
related to above but if people do their simulations at higher precision we throw that all away (I imagine this is rare)
May aid in closer integration with other packages in compiled languages.
Supporting a fundamental datatype, while preserving current defaults seems like a decent idea.

Drawbacks

If it aint broke don't fix it, the current system is obviously satisfactory for most use cases.
large changes to core, a lot of effort with plenty that can go wrong for how much benefit?
Is input data even of a high enough precision to warrant respecting dtype? Not sure how strong this argument as can't forsee all uses.
Double precision is slower
How much do users actually gain in accuracy in a typical calculation, is 1e^-? difference in a distance worth the trouble?
Would we have to add a large chunk of tests? All the regression tests with FP values would be different under a different precision model.

I would welcome a robust discussion.

Any proposed change would be gradual as with #3544 and preserve current defaults. :)

Feb 24 '22 11:02 hmacdope

Discussion here is best continued in #3707

Jun 07 '22 00:06 hmacdope

Closing in favour of #3707

Oct 07 '23 05:10 hmacdope