Include BOLD API extra data for corrected list of top hits?
Hello!
I've been using BOLDigger for a while and I am so appreciative to have this tool at my fingertips! I'm recently exploring the option to generate a top hit list using the BOLDigger settings with the associated flags and correcting identifications using the BOLD API. I really like that the program has built in the functions to look at conflicting taxonomy and missing species names among the closely related (>98% similarity) top hits.
However, I'm finding that the loss of additional data for these lists (like the process ID, BIN, location, of the top hit is unfortunate as those pieces can really help guide the exploration of taxonomic assignments for my data.
So, that said, this isn't really an "issue" or a "bug", but rather a request to add this feature. Is this possible at all?
Thanks again for creating such an awesome program! Monica
Dear Monica,
When selecting a top hit with the BOLDigger method I'd say it is a collection of hits that has common taxonomy that is selected rather than any individual hit. I wonder how you'd image such a feature? Just look up the first hit in the top20 hits that matches the taxonomy? On the other hand, that's what the flags are for...
Maybe if you could give me a clear example of the feature you imagine I'll consider adding it as an option.
best Dominik
Hi Dominik,
Thanks for your very prompt reply!
Now that I read your explanation of the top hit, I think this makes sense, that it's the best ID in a collection of hits, and so it doesn't necessarily reflect one specific identification.
Presumably, if the top hit is a collection of hits, they would belong to the same BIN? I wonder if including the BIN of the tops hits might be possible then, rather than a specific Process ID. One could always figure this out from the list of top 20 hits, but it might be nice to automate it, in a way. So anyways, not a requirement, but something to ponder!
Monica
Hi Monica,
Interesting thought about the BIN. I'm actually not sure if this holds true for all of the hits, will take a look into this as soon as I have sufficient time, which might be a while, but will add this as a feature request to my list. Also a collection of BINs would probably be helpful :)
best Dominik
Of course we all have more pressing needs! I completely understand this isn't a priority :)
Just one more note before we rest the topic for now - In the top hit list, there is a column for Public vs Private data - how is this determined if it's not directly linked to a ProcessID? Just curious!
Monica
Hi Monica, it is the first hit of the collection. Following my reasoning from before this also makes little sense :D
Fixed with boldigger2.