Suggestion for ktClassifyBLAST: Change bit-score threshold to RELATIVE value
Currently you can specify an absolute bitscore threshold for determining "best hits" in ktClassifyBLAST. Hits differing from the highest bit-score by more than this value will not be considered.
This works find when e.g. classifying sequencing reads, which should have more or less the same length across the dataset. This does NOT work well, however, when classifying a set of proteins (of variable length) extracted from an assembly. Since the bit-score changes with the length of the query sequence, BLAST hits of a large protein (>1000 aa length) may show bit-score differences of several hundred and still have sufficient identity to be considered "significant". For short proteins (e.g. 50 aa length) small changes in absolute Bit-score value would seem much more drastic.
Therefore it would seem better to set the threshold as a percentage of the highest score (e.g. as float ranging from 0.1-0.99). Would it feasible/possible to change the script in this way?