bxtools icon indicating copy to clipboard operation
bxtools copied to clipboard

Most file are empty after split

Open JihedC opened this issue 3 years ago • 0 comments

Hi! Thank you for developing this tools. I would like to use the function split in order to generate a bam file per single cell.

The structure of my bam file (obtained from Cell Ranger) is:

samtools view $BAM/possorted_genome_bam.bam | head
A00379:517:HWLKKDSX2:1:1542:2483:6668   16      chr1    3018437 0       150M1S  *       0       0       TCTTTATTCCTTCCTTGACCAAGGTATCATTGAACAGAGTGTTGTTCAGTCTCCACGTAAATGTTGGCTTTCTATTATTTATGTTGTTATTGAAGATCAGCCTTAGTCCATGGTGATCTGATAGGATGCATGGGACAATTTCGAAATTTTC       FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF       NH:i:7  HI:i:1  AS:i:136        nM:i:6  RG:Z:WT1_GEX_PC_mm10_introns:0:1:HWLKKDSX2:1  RE:A:I  xf:i:0  CR:Z:CTAGCCTAGGAATTAC   CY:Z:FFFFFFFFFFFFFFFF   CB:Z:CTAGCCTAGGAATTAC-1 UR:Z:ACCCAACACG UY:Z:FFFFFFFFFF UB:Z:ACCCAACACG

So I don't have a BX tag. I would like instead to use the corrected barcode tag "CB"

So I used the command:

bxtools split $BAM/possorted_genome_bam.bam -a test --tag CB > $OUTPUT/count.txt

This is where it didn't work properly, this command generated many BAM files from which 30 contained reads and more than 7000 were empty files.

The files that contain reads show this error message:

samtools view test.GAATAAGTCTGAGGGA-1.bam | head -n 5
[W::bam_hdr_read] EOF marker is absent. The input is probably truncated
A00379:517:HWLKKDSX2:2:1224:19768:20102 1024    chr1    6214342 255     151M    *       0       0       ATTTCGGGGCAGCAGATGAGGGCCCCAGATCTGTGCTGGTGCTCACTCGTCAGCCTCCGGTTCCCCTGTTGGGGCTGCCCCAGGTTTGGCGAGGTCGGTCTGCCGCGGCCAGAAGGTCACGCTCACCTTGGGGCCGTCCAAGGCAAGCACC       FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF,FFFFF       NH:i:1  HI:i:1  AS:i:149        nM:i:0  RG:Z:WT1_GEX_PC_mm10_introns:0:1:HWLKKDSX2:2  TX:Z:ENSMUST00000159618,+98,151M;ENSMUST00000191825,+801,151M   GX:Z:ENSMUSG00000090031 GN:Z:4732440D04Rik      fx:Z:ENSMUSG00000090031       RE:A:E  xf:i:17 CR:Z:GAATAAGTCTGAGGGA   CY:Z:FFFFFFFFFFFFFFFF   CB:Z:GAATAAGTCTGAGGGA-1 UR:Z:GCTCATCGCT UY:Z:FFFFFFFFFF UB:Z:GCTCATCGCT

Do you have an idea what the problem can be?

Thank in advance for your help!

JihedC avatar Apr 04 '22 08:04 JihedC