Demultiplexing single-end reads with combinatorial dual indexes
cutadapt 4.2 with Python 3.10.9
Attempted to demultiplex nanopore single-end reads with combinatorial dual indexes, but it didn't work and created a file called {name1}-{name2}.fastq.gz
cutadapt -g file:barcodes_i5.fasta -a file:barcodes_i7.fasta -o {name1}-{name2}.fastq.gz partial_tcr.fastq
# barcodes_i5.fasta
>S502
AATGATACGGCGACCACCGAGATCTACACCTCTCTAT
# barcodes_i7.fasta
>N701
TCGCCTTAATCTCGTATGCCGTCTTCTGCTTG
On the other hand this seems to work:
cutadapt -g file:barcodes_pairs.fasta -o {name}.fastq.gz partial_tcr.fastq
It generated file A1.fastq.gz
# barcodes_pairs.fasta
>A1
AATGATACGGCGACCACCGAGATCTACACCTCTCTAT...TCGCCTTAATCTCGTATGCCGTCTTCTGCTTG
Excuse my ignorance, but are the i5 and i7 indices also used in Nanopore sequencing? I thought these were Illumina indices.
The {name1} and {name2} template variables only work for paired-end data. name1 refers to the adapter found on R1 and name2 refers to the adapter found on R2.
For demultiplexing single-end reads based on a 5' and a 3' adapter occurrence, Cutadapt currently offers linked adapters. A problem at the moment is that you need to list all possible adapter combinations "by hand", see https://github.com/marcelm/cutadapt/issues/625#issuecomment-1145196377. Your barcodes.fasta file would start like this:
>S502_N701
AATGATACGGCGACCACCGAGATCTACACCTCTCTAT...TCGCCTTAATCTCGTATGCCGTCTTCTGCTTG
Then you can use -a file:barcodes.fasta -o {name}.fastq.gz input.fastq.gz.
Because having to list out all combinations is annoying and also inefficient, I plan to improve this, see #633, but haven’t had the time to do so, yet.
Thanks for your reply! Yes, we performed Nanopore sequencing on a library originally prepared for Illumina sequencing.
After some thought I think it might be a good thing to specify all the adapter combinations that we should expect. I ended up creating a script to get all possible adapter combinations. But this approach of listing all possible adapter combinations does seem inefficient if it performs an alignment for each linked adapter.
It would be more efficient to look for all the unique 3' and 5' adapters separately as you plan to do, then demultiplex based on combinations. When you implement the --link-adapters feature, it would be great if it takes a table of expected barcode pairs, in case the best matching pair does not actually exist in the sample.