HLA typing quality score evaluation
Dear SpecHLA Team,
I successfully run SpecHLA on hundreds of WXS samples. Now im struggling with the evaluation of the results. On the attached picture there is the quality scores imported for one sample from hla_detailed_results.txt. Could you please help me how to interpret the numbers since i havent found a clear guide for that. I was expecting the numbers to range from 0-100 representig accuracy, but the high numbers at the end of the data makes me question my assumptions.
I have checked manually the original .txt file, the same number occurs there: DRB1*03:01;1371.000;0.077;0.064;0.047
Thank you very much for the help,
Benjamin
Dear Benjamin,
Thank you for your question.
For SpecHLA results:
-
For the first few genes, the score is capped at 100 and represents sequence similarity.
-
For the latter genes (typically class II like DRB1), the score is calculated as sequence length × sequence similarity, so the values can exceed 100.
This is expected behavior and not an error. Let us know if you have further questions!
Dear Mr Wang,
Thank you for the fast response. I would like to ask what cutoffs should I use for the different genes to decide if the HLA genotyping was accurate and I can use the result for downstream analyses? I was wondering if is this the right score to use for such a decision?
At another sample I left with ambiguous results for he DP1: "DPB103:01;99.868;0.111;0.042;0.029" "DPB129:01;99.868;0.000;0.001;0.001"
How should I interpret such result when the score is very similar for two different alleles?
Thank you, Benjamin
For the first question, it is hard to say what cutoff to use. Maybe some manual observasion would help. Also, a high depth (i.e. more supporting reads) will lead to a more accurate result. For the second question, if the similarity is similar, then the population-specific allele frequency would be usefull. In your case, as DPB103:01 has a high frequency than DPB129:01, it is better to select DPB1*03:01.