TranscriptClean icon indicating copy to clipboard operation
TranscriptClean copied to clipboard

Massive drop on read length - over 70%

Open AlexFryd opened this issue 8 months ago • 0 comments

Hello and thank you for developing TranscriptClean!!!

I have a small issue/observation after running the basic command line for my dataset (ONT cDNA, aligned with minimap2).

Here is my bash script:

`sam_dir="/scratch/prj/bcn_pd_pesticides/Long-Reads-ALS/Alex_Nanopore_ALS/minimap2/" genome="/users/k2476200/minimap2/Homo_sapiens.GRCh38.dna_sm.primary_assembly.fa" outdir="/scratch/prj/bcn_pd_pesticides/Long-Reads-ALS/Alex_Nanopore_ALS/minimap2/sam_QC"

mkdir -p "$outdir"

Loop through SAM files in barcode01 and barcode02

for sam_file in "$sam_dir"/barcode*/*.sam; do # Extract barcode ID (e.g., barcode01 or barcode02) barcode=$(basename "$(dirname "$sam_file")")

echo "Processing $barcode..."

Run Minimap2 with your FASTQ files and genome reference

python3.12 TranscriptClean.py --sam "$sam_file" --threads 8 --genome "$genome" --outprefix "$outdir/${barcode}_qc" done ` After sorting-indexing with samtools I ran Nanoplot on the sorted bam files before and after cleaning and that is the quality overview:

Before cleaning:

Image

After cleaning:

Image

As you can see, although the mean/median quality of reads significantly improved, the mean-median read length and count dropped immensely.

Is this a data issue or a command issue?

Thanks in advance!

Best, Alex

AlexFryd avatar May 12 '25 09:05 AlexFryd