ProDy icon indicating copy to clipboard operation
ProDy copied to clipboard

Read in arbitrary coordinate set (x, y, z)

Open b2jia opened this issue 4 years ago • 9 comments

Is it possible to read in an array of [x, y, z] coordinates for ProDy analysis? I have tried parsePDB(xyz_array_) but to no avail, it throws a TypeError: TypeError: can only concatenate list (not "str") to list.

Along the same line, is it possible for a particular coordinate to be [nan, nan, nan], where the residue wasn't observed? I'm unable to find in ProDy documentation how to indicate sparsity. My overarching goal is to build an Ensemble from a list of arrays, in which each array is a series of coordinates [x, y, z] corresponding to one structure from a pool of heterogeneous structures.

b2jia avatar Nov 08 '21 18:11 b2jia

Probably there’s a bug that needs fixing there, possibly to do with Python 2 vs 3.

NaN is probably something that needs handling specifically.

jamesmkrieger avatar Nov 08 '21 21:11 jamesmkrieger

I see! Thanks for this response!

I don't know if it is aspirational, but it would be incredibly cool to infer the main modes of vibration from a heterogeneous sampling of the same molecule (be it a list of [x, y, z] coordinates, instead of a .pdf file) under different configurations!

b2jia avatar Nov 08 '21 23:11 b2jia

Hi Bojing,

if you prepare the coordinates in a file with XYZ format, parsePDB should work just fine. The error you ran into is because parsePDB expects to take a file name (a str) as input.

SHZ66 avatar Nov 09 '21 02:11 SHZ66

Thank you both for your feedback! I revisited this problem after a few days. Based on the documentation, parsePDB either searches for a PDB identifier, or looks for a filename. I tried the latter - saving an array as a file ie. test_pts.npy and loading it in, but this doesn't do the trick. I have also tried setting the header off, but this doesn't work either. What is the recommended syntax for reading in an arbitrary list of xyz coordinates?

b2jia avatar Nov 14 '21 06:11 b2jia

You could maybe try creating a new AtomGroup object and setting the coordinates manually from your array or even just using your array directly. ANM.buildHessian and PCA.buildCovariance can take coordinate arrays directly for sure

jamesmkrieger avatar Nov 14 '21 09:11 jamesmkrieger

Thank you for this feedback! Is it possible to convert an AtomGroup into a Atomic instance, for ensemble analysis?

For my own edification, what's under the hood of the Ensemble Analysis? Say I were to create multiple instances of Hessian matrices, how does buildPDBEnsemble integrate all the structural information across the ensemble? Does it use an algorithm like GeoStaS to estimate domains and domain trajectories?

https://pubs.acs.org/doi/abs/10.1021/ct300206j

b2jia avatar Nov 16 '21 07:11 b2jia

Firstly, Atomic is the parent class of AtomGroup (as well as Selection, Atom, and many others). You can find some of that kind of information in the recent ProDy 2.0 paper including figure S1.

buildPDBEnsemble doesn’t use Hessian matrices, just coordinates. It simply builds an ensemble of structures that can be analysed further. Usually the first way to do this is via PCA.

We also have ensemble normal mode analysis for signature dynamics analysis, which starts from calcEnsembleENMs and can calculate the Hessian matrices for each member of the ensemble if you choose ANM as the model.

Please read over our website and tutorials and papers for explanations of all these things.

jamesmkrieger avatar Nov 16 '21 07:11 jamesmkrieger

Thanks for mentioning GeoStaS. I'll need to look into it more carefully. We do have something that is perhaps similar called dynamical domain decomposition based on a spectral clustering using GNM modes (http://prody.csb.pitt.edu/tutorials/cryoem_tutorial/domain_decomposition.html).

We also have interfaces for parsing domains from PFam and CATH

jamesmkrieger avatar Apr 19 '22 16:04 jamesmkrieger

Did you ever figure this out? If not, I can maybe try and it and give you some example code in a few days or weeks.

jamesmkrieger avatar Apr 19 '22 16:04 jamesmkrieger

Thanks for mentioning GeoStaS. I'll need to look into it more carefully. We do have something that is perhaps similar called dynamical domain decomposition based on a spectral clustering using GNM modes (http://prody.csb.pitt.edu/tutorials/cryoem_tutorial/domain_decomposition.html).

We also have interfaces for parsing domains from PFam and CATH

I think what we have and GeoStaS should be fairly similar

jamesmkrieger avatar Jan 31 '23 15:01 jamesmkrieger

You could perhaps do something like this to get coordinates into an atom group:

In [1]: from prody import *

In [2]: ag = parsePDB("mdm2.pdb")
@> 1449 atoms and 1 coordinate set(s) were parsed in 0.01s.

In [3]: writeArray("mdm2_coords.txt", ag.getCoords())
Out[3]: 'mdm2_coords.txt'

In [4]: ag2 = AtomGroup()

In [5]: coords2 = parseArray("mdm2_coords.txt")

In [6]: ag2.setCoords(coords2)

jamesmkrieger avatar Feb 01 '23 15:02 jamesmkrieger