poretools icon indicating copy to clipboard operation
poretools copied to clipboard

Interpretation of the qualpos

Open athulmenon opened this issue 9 years ago • 3 comments

How to interpret the qualpos output. We are seeing so many + symbol in the plot.

athulmenon avatar May 17 '16 12:05 athulmenon

This is a box plot from matplotlib.pyplot. You can read about it here: http://matplotlib.org/api/pyplot_api.html

The line in the box is the median, the ends of the box are the 25th and 75th percentiles, the whiskers extend beyond to some set default. Other data points beyond the whiskers are considered outliers by default and show up as +.

The whiskers should probably just be set to show the range (min, max) of the data.

I made that code along time ago to show that the quality score distribution stays pretty much the same across the length of the reads. Be careful -- if you use it with all possible reads, then there sometimes is a subset of template reads that are really long (longer than the longest 2D and complements) and low quality, so it makes it look like the quality drops off at a certain length. However, if you plot the qual_v_pos for only those reads you can see the (low) quality stays the same across the length of the reads.

JohnUrban avatar May 17 '16 13:05 JohnUrban

Dear JohnUrban Thanks. It's cool. one more query. so some reads (outliers reads) have higher quality Q>30 ?. if you see the outliers reads are very higher quality. How it's possible?

athulmenon avatar May 18 '16 04:05 athulmenon

I am not sure I fully understand the question. Are you asking: How is it possible that some reads have Quality Scores >30?

qualpos is looking at the individual quality assigned to each base for every read (qualities vs position in read). Some bases get very high quality scores as determined by ONT. However, the mean quality score of a read (which is used for filtering 2D reads into pass and fail bins, for example) is usually found inside pretty predictable bounds. 1D mean quality scores are usually between something like 1-6 and 2D mean quality scores are usually between ~ 6-12. Maybe you find the individual outlier quality socres surprising because you are used to seeing the mean quality scores. (?)

I'd be happy to clarify further.

JohnUrban avatar May 19 '16 12:05 JohnUrban