seqdiff, unexpected results
Hi Brian,
I write with regards to the seqdiff command where I'm unable to produce the expected results.
For example, I duplicated (i.e., cp) a fastq file and ran the following command
seqdiff -1 file1.fq -2 file1b.fq -s
and received the following summary output:
first_file_total = 4255201 first_file_uniq = 0 second_file_total = 4255201 second_file_uniq = 0 common = 2250574
I then created test fastq files with 7 reads; the only difference being a deletion of the first 4 bases in the first read of the duplicate fastq file. I received the following output summary (expected values in parentheses):
first_file_total = 7 first_file_uniq = 7 (1) second_file_total = 0 (7) second_file_uniq = 0 (1) common = 0 (6)
Any thoughts or help would be greatly appreciated.
Hi Brian,
I tested the script a little further and discovered the length of sequences need to be the same in both files (rather than deleting bases I replaced with Ns). I write to ask if seqdiff can be modified to be flexible with the length of sequences between files. For example, to test if two workflows for generating fastq files are producing the same output fastq (as the sequences may be trimmed differently).
Many thanks, Andre