pullseq icon indicating copy to clipboard operation
pullseq copied to clipboard

seqdiff, unexpected results

Open andrem01 opened this issue 9 years ago • 1 comments

Hi Brian,

I write with regards to the seqdiff command where I'm unable to produce the expected results.

For example, I duplicated (i.e., cp) a fastq file and ran the following command seqdiff -1 file1.fq -2 file1b.fq -s and received the following summary output:

first_file_total = 4255201 first_file_uniq = 0 second_file_total = 4255201 second_file_uniq = 0 common = 2250574

I then created test fastq files with 7 reads; the only difference being a deletion of the first 4 bases in the first read of the duplicate fastq file. I received the following output summary (expected values in parentheses):

first_file_total = 7 first_file_uniq = 7 (1) second_file_total = 0 (7) second_file_uniq = 0 (1) common = 0 (6)

Any thoughts or help would be greatly appreciated.

andrem01 avatar Jun 29 '16 04:06 andrem01

Hi Brian,

I tested the script a little further and discovered the length of sequences need to be the same in both files (rather than deleting bases I replaced with Ns). I write to ask if seqdiff can be modified to be flexible with the length of sequences between files. For example, to test if two workflows for generating fastq files are producing the same output fastq (as the sequences may be trimmed differently).

Many thanks, Andre

andrem01 avatar Jun 29 '16 23:06 andrem01