[Feat] Meas class for gaussian error propagation.
I sometimes want to do Gaussian error propagation and find the syntax for creating a scalar cov_Obs a bit inconvenient. For this reason I propose a Meas class that simplifies this.
import pyerrors as pe
value = 1.1
standard_error = 0.23
meas = pe.Meas(value, standard_error)
covo = pe.cov_Obs(value, standard_error ** 2, meas.names[0])
assert covo == meas
Standard errors do not need to be squared and if no name is provided, a uuid is generated instead. I also propose to automatically evaluate the gamma method when an Obs exclusively consists of cov_Obs / Meas as the error estimation should be unambigous and the performance oberhead should be minimal.
In the process I also found a small bug in the cov_Obs implementation.
Hi,
thanks for the suggestion! What was the bug in the existing implementation? The missing 'float()'?
In general, I think that it is handy to have a wrapper that makes ones life easier. However, I ask myself if assigning automatically chosen random names to observables is something that one should do. I know that we already do it for priors in fits and there it would be very hard to circumvent it. Here, there could be the possibility that people introduce their external observable with the new class and destroy their error propagation because they add the corresponding uncertainty multiple times. What exactly is the use case you were aiming at?
It is certainly a very good idea to perform the error propagation automatically if no MC data is present!
Hi @jkuhl-uni & @s-kuberski ,
What was the bug in the existing implementation? The missing 'float()'?
Yes, we currently compute the gradients for __pow__ via autograd which cannot cope with int, so I had to add the float cast to make it work. We could actually provide the gradients for pow directly which should be faster. I can prepare a PR for that.
if this is something that we want I ask myself if assigning automatically chosen random names to observables is something that one should do.
My underlying assumption for the random uuid is that, unlike Monte Carlo data, constants with errors are usually statistically independent of all other sources of error. If that is the case it might be easier to automatically assign a random string instead of having the user come up with unique identifiers. The option to assign a dedicated name of course still exists.
By adding the Meas class we could slightly increase the scope of pyerrors as a tool for Gaussian error propagation which feels a bit cumbersome with the current UI. I was considering adding another example to detail this feature but if you feel like it is too dangerous to blindly assign a random name, I'm happy to remove it or change the default.
We could actually provide the gradients for pow directly which should be faster. I can prepare a PR for that.
That sounds good!
By adding the Meas class we could slightly increase the scope of pyerrors as a tool for Gaussian error propagation which feels a bit cumbersome with the current UI.
I agree that the current UI is not perfectly suited for quick error propagation, because I chose to ask for the covariance and because we require a name. If one would like to allow users to easily perform similar tasks as they would do in the 'uncertainties' package, the proposed change would be the perfect solution, I think.
My underlying assumption for the random uuid is that, unlike Monte Carlo data, constants with errors are usually statistically independent of all other sources of error. If that is the case it might be easier to automatically assign a random string instead of having the user come up with unique identifiers. The option to assign a dedicated name of course still exists.
As soon as someone runs a script twice and saves the output somewhere to process it afterwards, I would think that a dedicated naming scheme is very important to ensure proper error estimation. If someone is not aware of this, it could easily happen that they input, say, the physical pion mass using the new class and end up with uncorrelated contributions from the same observable. I'd think that names really matter as soon as one does a proper analysis.
So probably it boils down to anticipating what users do with the package that would still provide all the means to do everything properly and the question if not asking for a name per default might make it easier to do mistakes. I'm not really sure about how to proceed.
As soon as someone runs a script twice and saves the output somewhere to process it afterwards, I would think that a dedicated naming scheme is very important to ensure proper error estimation.
If your main concern is about saving and reusing objects with automatically generated names, we could simply prevent such Obs to be exported. I added an explicit check to the json module which throws a ValueError in these cases. Do you feel like this is safe enough or are there still other silent failure modes?
I'm still not fully convinced, but I don't want to block it, if you'd like to have the feature in. I still fear that people create the same object twice with different uuid in their analysis, when they are not as good organized as you ;). Basically, I think that it is somewhat bad style if the final object that comes out of the analysis contains a contribution that is not clearly identifiable. It goes a bit against one of the basic ideas... So I would think that the feature would only be good for some back of the envelope error propagation, not for real world analyses.
In case we add the class, I would think that it would be good to expand the docsctring, clearly stating the restrictions of the class and pointing to the covobs class for "proper" error propagation. Also, I was thinking about the name. Why did you choose "scalar measurement"? Somehow, I would relate a measurement to something that comes out of a MC chain. I don't have a good suggestion for an alternative, though.
Maybe we could make the name non-optional and still use an uuid if None or an empty string is passed? This would at least prompt users to think about names. Of course, it again adds a bit more luggage to the easy creation of these objects. At least, one would not have to come up with names, if one just wants to do a bit of error propagation where everything can be assumed to be uncorrelated.
I also thought about runtime warnings like
Warning: No name given, using UUID.
However, I think those would be pretty ugly and not helpful since a) the class is used as intended and a warning would still be issued and b) they could vanish in too many notifications.