cblaster icon indicating copy to clipboard operation
cblaster copied to clipboard

cblaster remote error with large clusters

Open HackenDirker opened this issue 2 years ago • 1 comments

Hi, I've been trying to run cblaster remote searches with full-sized BGCs (>100kb), and every time I do, it raises a ValueError saying no hits were found after searching for only a few minutes:

[12:01:29] INFO - Starting cblaster in remote mode
[12:01:29] INFO - Launching new search
[12:01:30] INFO - Request Identifier (RID): AZTWTGEW013
[12:01:30] INFO - Request Time Of Execution (RTOE): 12s
[12:01:42] INFO - Polling NCBI for completion status
[12:01:42] INFO - Checking search status...
[12:02:42] INFO - Checking search status...
[12:03:42] INFO - Checking search status...
[12:04:42] INFO - Checking search status...
[12:05:42] INFO - Checking search status...
Traceback (most recent call last):
  File "/home/hackenbd/.local/bin/cblaster", line 8, in <module>
    sys.exit(main())
  File "/home/hackenbd/.local/lib/python3.8/site-packages/cblaster/main.py", line 432, in main
    cblaster(
  File "/home/hackenbd/.local/lib/python3.8/site-packages/cblaster/main.py", line 318, in cblaster
    rid, results = remote.search(
  File "/home/hackenbd/.local/lib/python3.8/site-packages/cblaster/remote.py", line 368, in search
    poll(rid)
  File "/home/hackenbd/.local/lib/python3.8/site-packages/cblaster/remote.py", line 244, in poll
    if check(rid):
  File "/home/hackenbd/.local/lib/python3.8/site-packages/cblaster/remote.py", line 174, in check
    raise ValueError("Search completed, but found no hits")

A couple of reasons this is strange:

  1. It often doesn't seem to run long enough to complete an entire search. This makes me think that something is happening with the NCBI API that's cutting the search short and causing it to throw an error.

  2. The original organism that contains some of the clusters that have errored out is in NCBI. I.e., we should always get hits against at least that organism, and we aren't.

  3. I've used cblaster to look for the core biosynthetic proteins in some of these larger clusters, extracted +/-100kb from the hit locations, run antismash, and rerun cblaster locally with that region in the db. When I do it this way, cblaster has no trouble identifying the cluster as homologous. Again, this makes me think it's something to do with the NCBI API.

Would you happen to have any suggestions on how to make cblaster more reliable for remote searches of large clusters?

Thanks for your time!

HackenDirker avatar Jul 13 '23 12:07 HackenDirker

I've had this issue too - e.g. cpk. I have found a couple of homologs that should be picked up by searching for the PKS alone - I think I used vanilla BLASTP for this, but the homolog clusters should have met the CBLASTER settings.

Tim-Kirkwood avatar Jun 20 '24 20:06 Tim-Kirkwood