Can't seen to make Bingo NoSQL documentation work
Situation I have been trying to use Bingo NoSQL on python to perform search operations on a few molecules. I start with a simple .csv file with a bunch of molecules (smiles) matched with an id.
Package version Python version : Python 3.10.12 epam.indigo version : 1.19.0
Bingo.searchSim() I was following this documentation, and i realised that the searchSim doesn't actually require a query molecule, but a molecule (created with Indigo.loadMolecule()) isntead.
1. Problem Description:
The Bingo NoSQL API documentation (https://lifescience.opensource.epam.com/bingo/user-manual-nosql.html) contains misleading or outdated instructions regarding the searchSim() method.
The documentation states that the method performs similarity searches using a "query molecule" (e.g., a SMILES string). However, the actual implementation requires a molecule object created through Indigo.loadMolecule().
2. Steps to Reproduce:
- Install Python 3.10.12 and the
epam.indigolibrary version 1.19.0:pip install epam.indigo==1.19.0 - Prepare a
.csvfile with molecules in SMILES format:id,smiles 1,CCO 2,CCC 3,CCN - Use the Bingo NoSQL API to create a database and insert molecules:
from bingo import Bingo from indigo import Indigo indigo = Indigo() bingo_db = Bingo.createDatabaseFile(indigo, "test.bingo", "molecule") with open("molecules.csv", "r") as file: next(file) # Skip header for line in file: id, smiles = line.strip().split(',') mol = indigo.loadMolecule(smiles) bingo_db.insert(mol, id) - Attempt to call the
searchSim()method according to the documentation by passing a SMILES string directly without creating a molecule object:# Incorrect usage (per documentation) query = "CCO" matcher = bingo_db.searchSim(query, minSim=0.7, maxSim=1.0) - Observe the resulting error:
TypeError: searchSim() argument should be of type 'IndigoObject', not 'str'
3. Expected Behavior:
As stated in the documentation:
- The
searchSim()method should accept a "query molecule" as a string (e.g., SMILES) and perform similarity searches correctly. - The API should automatically convert the string into an
Indigo.loadMolecule()object or allow direct use of the string without additional user steps. - The method should return a list of IDs of similar molecules (e.g.,
[1, 2]).
4. Actual Behavior:
- The
searchSim()method requires a molecule object, created usingIndigo.loadMolecule(). - Attempting to pass a string results in the following error:
TypeError: searchSim() argument should be of type 'IndigoObject', not 'str' - The documentation provides incorrect usage examples, leading users to confusion.
- The error occurs 100% of the time when the method is used according to the documentation without creating a molecule.
5. Analysis of the Problem:
-
Root Cause:
- The documentation is inaccurate or outdated. The method requires a molecule object instead of a string, which isn’t specified clearly in the user manual.
- The API doesn’t support automatic conversion of string data (e.g., SMILES) into a molecule object, forcing users to search for alternate solutions.
-
Affected Modules:
-
Documentation: Contains outdated or misleading information about input parameters for the
searchSim()method. - Bingo NoSQL API: Lacks mechanisms for input type validation or conversion of strings into molecule objects.
-
Documentation: Contains outdated or misleading information about input parameters for the
-
Lifescience Context:
- The bug could limit users' ability to utilize the library for molecular data analysis. This diminishes trust in the library, especially in chemical and biological fields where accuracy is critical.
6. Suggested Solutions:
High-Level Solution:
-
Update Documentation:
- Rewrite the section describing the
searchSim()method in the user manual, clearly specifying the need to create query molecules usingIndigo.loadMolecule(). - Provide accurate code examples for the method’s usage with the correct parameters.
- Rewrite the section describing the
Technical Solution:
-
API Modification:
Enable automatic conversion of SMILES strings into molecule objects within thesearchSim()method:def searchSim(self, query, minSim, maxSim, sim_type): if isinstance(query, str): query = self.indigo.loadMolecule(query) # Automatic conversion elif not isinstance(query, IndigoObject): raise TypeError("searchSim() argument must be IndigoObject or SMILES string") return self._searchSimInternal(query) -
Documentation Enhancement:
Update the user manual with examples like:from bingo import Bingo from indigo import Indigo indigo = Indigo() bingo_db = Bingo.loadDatabaseFile(indigo, "test.bingo") query_smiles = "CCO" # SMILES string query_mol = indigo.loadMolecule(query_smiles) # Create Molecule object matcher = bingo_db.searchSim(query_mol, minSim=0.7, maxSim=1.0) while matcher.next(): print(f"ID: {matcher.getCurrentId()}, Similarity: {matcher.getCurrentSimilarityValue()}") matcher.close() -
Error Handling Enhancements:
Add comprehensible error messages for invalid input:searchSim() expected an Indigo Molecule object. Use Indigo.loadMolecule() to create one from SMILES.