nmrpy Add NMR data model, integration with EnzymeML, and peak assignment features

Add md-models-based research data model for NMR
- Collect NMR parameters from instrument metadata
- Record processing steps preformed in NMRpy
- Save both the raw data and the latest version of the processed data
- Assign species from a list or from an EnzymeML document to the peaks
Add EnzymeML integration using pyenzyme library
- Load EnzymeML document into a FidArray
- Have EnzymeML species available within Fid object
- Save calculated concentrations to and return the EnzymeML document
Add widgets to assign peaks either on the entire FidArray or single Fid objects
- Use EnzymeML species if document available
- Pass list of EnzymeML species if no document has been loaded yet
- Use list of simple strings instead of working with EnzymeML standard
- Use only a slice of Fids when assigning on the entire FidArray by passing an index list

Jan 20 '25 15:01 torogi94

@torogi94 Thanks for this! I need some time to look through this. But from a brief inspection the following come to mind:

Dependencies: utils.py imports sympy as well as pyenzyme, yet none of them are declared in requirements.txt. We should discuss dependencies and which of these are needed for core functionality. One option is to have optional dependencies for certain functionalities. Pyenzyme itself has a multitude of dependencies. I am not keen for the core NMRPy library to have a huge list of dependences. Associated with that the version requirement dependency hell.
Related: how stable are pyenzyme, md-models and the other dependencies from your group? E.g. Pyenzyme vs. Pyenzyme2? My concern is long-term maintenance.
The merge conflict needs to be resolved.
Looking at data_objects.py as well as plottling.py there are a very large number of changes relating to linting and formatting (e.g. single vs. double quotes, linebreaks etc.). Are these linting changes done in a separate commit? As it is, it's very tedious to separate the real code changes from "cosmetic" linting.

Jan 20 '25 19:01 jmrohwer

Thank you for looking over it! Here, a few comments:

Dependencies: utils.py imports sympy as well as pyenzyme, yet none of them are declared in requirements.txt. We should discuss dependencies and which of these are needed for core functionality. One option is to have optional dependencies for certain functionalities. Pyenzyme itself has a multitude of dependencies. I am not keen for the core NMRPy library to have a huge list of dependences. Associated with that the version requirement dependency hell.

Good catch! I really did forget to declare the new dependencies in the requirements.txt file. I think you are right that we should not necessarily introduce them to the core of NMRpy. How about we declare them as optional nmrpy[enzymeml]? For the NMRpy data model, however, it would be easier if we could make pydantic a core dependency. If I am not mistaken, it is the only one required for the data model to work. If not, I can restructure my code, so the data model part becomes entirely optional, too, though.

Related: how stable are pyenzyme, md-models and the other dependencies from your group? E.g. Pyenzyme vs. Pyenzyme2? My concern is long-term maintenance.

We are preparing a new stable pyenzyme (v2) release which we can then pin in the requirements. In my experience, it is already running better and will be more maintainable than the previous v1 version. With regard to md-models, the NMRpy data model is not dependent at all on the md-models version. Actually, it is not even a dependency needed, as all functionalities are handled via pydantic, which is itself a very well maintained library!

The merge conflict needs to be resolved.

Absolutely. I’ll merge main into my branch and fix the conflict manually so you can have a clean merge without conflicts!

Looking at data_objects.py as well as plottling.py there are a very large number of changes relating to linting and formatting (e.g. single vs. double quotes, linebreaks etc.). Are these linting changes done in a separate commit? As it is, it's very tedious to separate the real code changes from "cosmetic" linting.

Sorry for the noisy diff, it seems I accidentally let my linter loose on the two files and did not even notice... I’ll revert the linting changes from this PR and submit them in a separate PR (like discussed).

Jan 23 '25 15:01 torogi94

Good catch! I really did forget to declare the new dependencies in the requirements.txt file. I think you are right that we should not necessarily introduce them to the core of NMRpy. How about we declare them as optional nmrpy[enzymeml]? For the NMRpy data model, however, it would be easier if we could make pydantic a core dependency. If I am not mistaken, it is the only one required for the data model to work. If not, I can restructure my code, so the data model part becomes entirely optional, too, though.

I agree with making pydantic a core dependency and the others as optional in nmrpy[enzymeml].

On another note. I also have a conda build recipe as well as a CI job that builds and distributes a conda package. This has been very easy to date because NMRPy is a pure python package and there are/were conda packages available for all of the dependencies (channel conda-forge). Is PyEnzyme released as a conda package? I would like to continue distributing conda packages, just as we do for PySCeS as well.

We are preparing a new stable pyenzyme (v2) release which we can then pin in the requirements. In my experience, it is already running better and will be more maintainable than the previous v1 version. With regard to md-models, the NMRpy data model is not dependent at all on the md-models version. Actually, it is not even a dependency needed, as all functionalities are handled via pydantic, which is itself a very well maintained library!

Okay great.

Sorry for the noisy diff, it seems I accidentally let my linter loose on the two files and did not even notice... I’ll revert the linting changes from this PR and submit them in a separate PR (like discussed).

Give me a ping once you are ready with all of this then I'll proceed with the detailed code review :smile:

Jan 24 '25 07:01 jmrohwer

I think I have addressed all points mentioned so far, so you may continue with the detailed code review now!

On another note. I also have a conda build recipe as well as a CI job that builds and distributes a conda package. This has been very easy to date because NMRPy is a pure python package and there are/were conda packages available for all of the dependencies (channel conda-forge). Is PyEnzyme released as a conda package? I would like to continue distributing conda packages, just as we do for PySCeS as well.

I am in contact with @JR-1991 about the conda-forge release. While it was not planned initially, he says we can make it happen! I will come back to you about this when we are preparing the PyPI release of v2. For now, the GitHub version is still used in the new optional requirements for nmrpy[enzymeml]. Of course, I will change that to the stable PyPI version as soon as it is available!

Jan 29 '25 09:01 torogi94

I have addressed a few more issues we discussed in pull requests #14 and #15 and merged them with this pull request:

With #14, handling of multiple EnzymeML measurements is now possible.
With #15, data handling within NMRpy has been optimized to keep memory requirements and computation times low.

May 05 '25 15:05 torogi94

@jmrohwer: In https://github.com/NMRPy/nmrpy/pull/18, I have added handling of t0 values in MeasurementData of EnzymeML documents to the library. Initial values can now be set using FidArray.add_t0_to_enzymeml(). Both an option for interactive assignment using the new T0Adder widget and script-like handling when setting gui=False are available. The user has the option to either using the existing t1 values as t0, if no initials have been measured, or provide measured t0 values instead. Furthermore, an offset value to account for delays in measurements can be applied to the time array.

Aug 27 '25 14:08 torogi94