Feature web (a.k.a. BioSimSpace 2023.0.0)
This is the update of BioSimSpace that is compatible with Sire 2023.0.0.
The changes are minimal, i.e. updating to using the new (PEP8) module names of sire (but not the PEP8 function names - hence why sire.use_mixed_api() is used).
The other changes are to the build system. This now mirrors sire's updated build system. This supports building Windows packages, plus gives more control over the dependency relationship between BioSimSpace and Sire.
This won't pass its tests until Sire 2023.0.0 is available.
We now have a 2023.3.0 Sire package, so I'll re-run the build against that. We can also update the Sire build to use the KCOMBU package, when ready.
Ooops, realised that you'll need to uncomment sire in the requirements.txt file.
@msuruzhon: I've also added you as a reviewer for this PR since it touches many files within the Exscientia Sandpit. These shouldn't change functionality, only updating the import structure to deal with the new Sire API. (To use the old names, we now need to using sire.legacy.)
The PR builds on top of the Sire 2023.0.0 release, that was recently merged from this PR. This modernises the Sire API to a PEP8 consistent naming convention in a non-backwards compatible way, hence the need to use sire.legacy for the old API. While the tests still pass, a few have been marked as XFAIL since they used pickled objects that are no-longer valid. I'll update these when I get a chance.
Let me know if you have any questions.
Cheers.
Just posting a TODO list for myself for next week:
- [x] Re-pickle objects for SOMD pert file regression test. (Could also use
save/read-PerturbableSystemwith files instead. - [x] Update source installation instructions based on Sire 2023.0.0 installation process.
@lohedges - thanks for all of your work to merge this in.
@msuruzhon - thanks for reviewing and for your understanding for this merge. I've tried my best to create as little disruption as possible, but this is quite a big change under the hood. I apologise in advance if we've missed anything and this causes any issues. The testing all looks good, so I am relatively confident, assuming we have good coverage.
As Lester said, one of the bonuses of this merge is that we will make Sire (and eventually BioSimSpace) easier for people to learn, as the API will be more pythonic. We have also really cleaned up the search functionality in Sire, while will make things easier too. And I've fully updated all of the build infrastructure so that everything is modernised (latest CMake), it is more pythonic in style (uses setup.py), merges the conda and hand-installed installation pathways (which should make dependency management and testing easier), and (finally!) properly brings in Windows support (we have Windows conda packages!).
Okay, having re-worked the tests for the SOMD pert file setup to use files rather than pickled objects I have discovered an issue with the new Sire search functionality. When searching with the keyword pertubable, you get back a Molecule in Sire, which is converted to individual atoms in BioSimSpace. Previously you would get back a molecule, which would be converted to a residue if it only contained one residue, which allowed you to obtain the residue number of the perturbable molecule:
Currently we have:
# Sire land.
system._sire_object.search("perturbable")
SelectResult( size=1,
0 : Molecule( :2 num_atoms=8 num_residues=1 )
)
# BioSimSpace
system.search("perturbable")
<BioSimSpace.SearchResult: nResults=8>
for result in system.search("perturbable"):
print(result)
<BioSimSpace.Atom: name='C', molecule=2 index=0>
<BioSimSpace.Atom: name='C', molecule=2 index=1>
<BioSimSpace.Atom: name='H', molecule=2 index=2>
<BioSimSpace.Atom: name='H', molecule=2 index=3>
<BioSimSpace.Atom: name='H', molecule=2 index=4>
<BioSimSpace.Atom: name='H', molecule=2 index=5>
<BioSimSpace.Atom: name='H', molecule=2 index=6>
<BioSimSpace.Atom: name='H', molecule=2 index=7>
Previously we had:
# Sire land.
system._sire_object.search("perturbable")
Molecule( 8 version 501 : nAtoms() = 8, nResidues() = 1 )
# BioSimSpce
system.search("perturbable")
<BioSimSpace.SearchResult: nResults=1>
print(system.search("perturbable"))[0]
<BioSimSpace.Residue: name='LIG', molecule=8, index=0, nAtoms=8>
The same thing happens when searching for water molecules using the keyword water. This is used during solvation to check for existing water molecules, e.g. crystal waters, so I assume that this functionality would be broken too. (Previously both Sire and BioSimSpace would return molecules when searching for water, now what is returned by Sire/BioSimSpace is inconsistent. Sometimes I get residues, sometimes atoms, ...)
Also, searches such as resname ALA return a bunch of atoms, rather than the residues with that name.
I'll need to think about how to fix this, since it would be good if the behaviour was consistent with the existing search implementation. I'm not sure if this needs to be fixed in Sire, or whether we can workaround it in BioSimSpace. Anyway, I'm glad that I caught this before merging.
I see the issue. We are now passing the result of a Select operation to BioSimSpace._SireWrappers.SearchResult, not a SelectResult object. This means that using __getitem__ on the result extracts items from the object, rather than the SelectResult container. For example, if the result returned a residue, then indexing it will extract atoms. (Previously you would extract the residue from the SelectResult container.) I'll see if I can fix this.
You can fix things by wrapping the output of Select in a Sire.Mol.SelectResult object before passing to BioSimSpace._SireWrappers.SearchResult. This reproduces the old behaviour and things now pass. However, I don't know whether this would restrict the use of the new search functionality within Sire, e.g. bonds, etc. I guess we could keep consistency for now (we've only exposed limited searches, which can be cast to the objects wrapper by BioSimSpace) and could expose more complex functionality in future.
Actually, the above doesn't work in all cases, e.g. fixes the perturbable issue, but fails for water. We'll need a different solution.
Okay, we can do the following:
try:
# Query the Sire system.
search_result = _SireMol.Select(query)(self._sire_object, property_map)
except Exception as e:
msg = "'Invalid search query: %r : %s" % (query, e)
if _isVerbose():
raise ValueError(msg) from e
else:
raise ValueError(msg) from None
# Try to convert to a select result object.
try:
select_result = search_result.to_select_result()
except:
try:
select_result = _SireMol.SelectResult(search_result)
except:
msg = "'Unable to generate SelectResult: %r : %s" % (query, e)
if _isVerbose():
raise ValueError(msg) from e
else:
raise ValueError(msg) from None
return _SearchResult(select_result)
(Where _SearchResult has been reverted to handle objects of type _SireMol.SelectResult.)
The same thing can be applied for the search functionality in BioSimSpace._SireWrappers.Molecule, too.
Yes, the BioSimSpace integration is a little janky because BioSimSpace uses the Select object directly. This returns a SelectResult that isn't processed down to the smallest molecule view type.
In Sire, the Select object is used behind the scenes by the [] operator. The SelectResult is automatically converted to the smallest molecule view type by the __from_select_result function.
https://github.com/michellab/Sire/blob/2d430cdb9e8caf1a57ae3960b83ddd779ec2754b/src/sire/mol/init.py#L123
It may be better to use sire.mol.__from_select_result(search_result) in the second try block to get what is needed?
(I'll try to find some time tomorrow to have a play with this)
I've taken a look and it isn't a problem with the new search functionality. The change is because now Sire always iterates over atoms in a molecule. I had changed the code to make iteration more predictable. The new Selector_X_ classes and the new logic means that some older behaviour has changed. Iterating over a molecule, segment, or residue will always iterate over their contained atoms. Only iterating over a chain will iterate over the contained residues.
The other change was that you can now specify what you want to iterate over using .atoms(), .residues(), .chains() etc, and these even work if the search returned multiple molecules. Following the "explicit is better than implicit", the search code should be something like
for result in system.search("perturbable").residues():
print(result)
if you want the residues. I will find some time this evening to make the changes so that things work :-)
I've added the equivalent .atoms(), .residues() etc functions to BioSimSpace's SearchResult object. The test passes in the main branch locally. I will look to do the same fix in the sandpit...
Just to note that the following test is hanging locally for me test/IO/test_openff_gromacs.py::test_molecule_combine. This passes on devel, using BioSImSpace built against Sire 2022.3.0.
Quitting out of the test gives the following warning:
../../.conda/envs/sire-dev/lib/python3.9/site-packages/openff/utilities/exceptions.py:52
/home/lester/.conda/envs/sire-dev/lib/python3.9/site-packages/openff/utilities/exceptions.py:52: DeprecationWarning: MissingOptionalDependency is deprecated. Use MissingOptionalDependencyError instead.
warnings.warn(
test/IO/test_openff_gromacs.py::test_molecule_combine
/home/lester/.conda/envs/sire-dev/lib/python3.9/site-packages/_pytest/threadexception.py:73: PytestUnhandledThreadExceptionWarning: Exception in thread Thread-1
Traceback (most recent call last):
File "/home/lester/.conda/envs/sire-dev/lib/python3.9/site-packages/openff/utilities/utilities.py", line 69, in wrapper
importlib.import_module(package_name)
File "/home/lester/.conda/envs/sire-dev/lib/python3.9/importlib/__init__.py", line 127, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "<frozen importlib._bootstrap>", line 1030, in _gcd_import
File "<frozen importlib._bootstrap>", line 1007, in _find_and_load
File "<frozen importlib._bootstrap>", line 984, in _find_and_load_unlocked
ModuleNotFoundError: No module named 'openff.interchange'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/lester/.conda/envs/sire-dev/lib/python3.9/threading.py", line 980, in _bootstrap_inner
self.run()
File "/home/lester/.conda/envs/sire-dev/lib/python3.9/threading.py", line 917, in run
self._target(*self._args, **self._kwargs)
File "/home/lester/Code/BioSimSpace/python/BioSimSpace/Parameters/_process.py", line 79, in _wrap_protocol
protocol_function(process._molecule,
File "/home/lester/Code/BioSimSpace/python/BioSimSpace/Parameters/Protocol/_openforcefield.py", line 283, in run
omm_system = forcefield.create_openmm_system(off_topology)
File "/home/lester/.conda/envs/sire-dev/lib/python3.9/site-packages/openff/utilities/utilities.py", line 75, in wrapper
return function(*args, **kwargs)
File "/home/lester/.conda/envs/sire-dev/lib/python3.9/site-packages/openff/toolkit/typing/engines/smirnoff/forcefield.py", line 1119, in create_openmm_system
interchange = self.create_interchange(
File "/home/lester/.conda/envs/sire-dev/lib/python3.9/site-packages/openff/utilities/utilities.py", line 71, in wrapper
raise MissingOptionalDependencyError(library_name=package_name)
openff.utilities.exceptions.MissingOptionalDependencyError: The required openff.interchange module could not be imported. Try installing the package by running `conda install -c conda-forge openff-interchange`
warnings.warn(pytest.PytestUnhandledThreadExceptionWarning(msg))
-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! KeyboardInterrupt !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Comparing OpenFF dependencies in both environments gives the same package versions.
devel:
~/sire.app/bin/conda list openff devel ✭ ◼
# packages in environment at /home/lester/.conda/envs/sire-dev:
#
# Name Version Build Channel
openff-amber-ff-ports 0.0.3 pyh6c4a22f_0 conda-forge
openff-forcefields 2.0.0 pyh6c4a22f_0 conda-forge
openff-toolkit-base 0.11.0 pyhd8ed1ab_1 conda-forge
openff-units 0.1.7 pyh6c4a22f_1 conda-forge
openff-utilities 0.1.5 pyh6c4a22f_0 conda-forge
feature-web:
(sire-dev) conda list openff
# packages in environment at /home/lester/.conda/envs/sire-dev:
#
# Name Version Build Channel
openff-amber-ff-ports 0.0.3 pyh6c4a22f_0 conda-forge
openff-forcefields 2.0.0 pyh6c4a22f_0 conda-forge
openff-toolkit-base 0.11.0 pyhd8ed1ab_1 conda-forge
openff-units 0.1.7 pyh6c4a22f_1 conda-forge
openff-utilities 0.1.5 pyh6c4a22f_0 conda-forge
That's strange - I believe that the test passes when run in GitHub Actions? Could it be an issue with your local environment?
Yes, most likely. I forgot to add that it did previously work locally, the hang occurred after a full rebuild and the latest edits. Will try again when I get a chance. Will also re-trigger the CI to see if it's still passing.
Just to mention that, as reported here this test hang is reproducible for Python 3.8 and Python 3.9 on both Linux and macOS (Python 3.7 passes). The issue isn't Sire 2023.0, or any of the edits here. The test wouldn't have failed in the original CI run since it requires GROMACS, which isn't installed on the runner, hence the test is skipped.
I'll try to figure out the reason for the hang. In all cases, CTRL-C'ing out of the hang gives the traceback reported above, i.e.:
Traceback (most recent call last):
File "/home/lester/.conda/envs/biosimspace-dev/lib/python3.9/site-packages/openff/utilities/utilities.py", line 69, in wrapper
importlib.import_module(package_name)
File "/home/lester/.conda/envs/biosimspace-dev/lib/python3.9/importlib/__init__.py", line 127, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "<frozen importlib._bootstrap>", line 1030, in _gcd_import
File "<frozen importlib._bootstrap>", line 1007, in _find_and_load
File "<frozen importlib._bootstrap>", line 984, in _find_and_load_unlocked
ModuleNotFoundError: No module named 'openff.interchange'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/lester/.conda/envs/biosimspace-dev/lib/python3.9/threading.py", line 980, in _bootstrap_inner
self.run()
File "/home/lester/.conda/envs/biosimspace-dev/lib/python3.9/threading.py", line 917, in run
self._target(*self._args, **self._kwargs)
File "/home/lester/.conda/envs/biosimspace-dev/lib/python3.9/site-packages/BioSimSpace/Parameters/_process.py", line 79, in _wrap_protocol
protocol_function(process._molecule,
File "/home/lester/.conda/envs/biosimspace-dev/lib/python3.9/site-packages/BioSimSpace/Parameters/Protocol/_openforcefield.py", line 283, in run
omm_system = forcefield.create_openmm_system(off_topology)
File "/home/lester/.conda/envs/biosimspace-dev/lib/python3.9/site-packages/openff/utilities/utilities.py", line 75, in wrapper
return function(*args, **kwargs)
File "/home/lester/.conda/envs/biosimspace-dev/lib/python3.9/site-packages/openff/toolkit/typing/engines/smirnoff/forcefield.py", line 1119, in create_openmm_system
interchange = self.create_interchange(
File "/home/lester/.conda/envs/biosimspace-dev/lib/python3.9/site-packages/openff/utilities/utilities.py", line 71, in wrapper
raise MissingOptionalDependencyError(library_name=package_name)
openff.utilities.exceptions.MissingOptionalDependencyError: The required openff.interchange module could not be imported. Try installing the package by running `conda install -c conda-forge openff-interchange`
warnings.warn(pytest.PytestUnhandledThreadExceptionWarning(msg))
It turns out that we now need to add openff-interchange as a dependency on Python 3.8+. I presume that they must have split out some functionality. I'll try to see if any other OpenFF functionality no longer works.
Okay, I've made a few edits to remove some redundant imports and tidy some comments. One thing that I've noticed is that a search with no results now raises a ValueError, rather than giving an empty SelectResult, i.e. one with zero results. There are a few places in BioSimSpace where we perform a search then check that search.nResults() == 0. I'll need to go through and replace these with appropriate try/except blocks.
(The new behaviour makes more sense to me, since it is good to know that a search is empty/invalid immediately.)
Bah, it turns out that the behaviour is inconsistent, i.e. it sometimes returns an empty search if the string is valid but there is no result, but sometimes returns a ValueError. For example:
# Search for an unknown element.
system.search("element Z")
ValueError: 'Invalid search query: 'element Z' : SireMol::parse_error: Failed to parse any of the selection 'element Z'. (call sire.error.get_last_error_details() for more info)
# Search for a valid element, using an invalid property map.
system.search("element H", property_map={"element" : "cheese"})
ValueError: 'Invalid search query: 'element H' : 'Nothing matched the search.'
# Now the same thing on the first molecule in the system.
# Search for an invalid element.
molecule.search("element Z")
# Same error as before.
ValueError: 'Invalid search query: 'element Z' (SireMol::parse_error: Failed to parse any of the selection 'element Z'. (call sire.error.get_last_error_details() for more info))
# Search for a valid element, using an invalid property map.
molecule.search("element H", property_map={"element" : "cheese"})
# No error, rather an empty result is returned.
<BioSimSpace.SearchResult: nResults=0>
I'll try to figure out why this is, since we need the behaviour to be consistent.
Ah, this is because BioSimSpace._SireWrappers.Molecule.search() returns an empty result when the search fails, whereas the other wrappers raise the exception:
try:
# Query the Sire molecule.
search_result = query(self._sire_object, property_map)
except KeyError:
# Nothing matched - return an empty result
search_result = _SireMol.SelectorMol()
I'll change this for consistency.
Okay, I've needed to make quite a few edits to handle the new search functionality. However, it looks like the existing searches used to find atoms when applying different types of restraint (using system.getRestraintAtoms()) are no longer valid. I'll need to work out why this is, then adapt the search strings to whatever the equivalent is for Sire 2023.0.0. (I'll also add some tests so that validate the various restraint search terms that we allow.)
(Definitely highlighting a lack of unit tests in certain areas where functionality was blasted out for a workshop.)
This is now fixed. There were some issues with formatting for combining searches via a logical and. We also no longer need to do searches like atoms in ... since atoms are returned by default.