Rsamtools icon indicating copy to clipboard operation
Rsamtools copied to clipboard

Unrecognized nucleotide 'B' error when sequence field in BAM file omitted

Open ChangqingW opened this issue 1 year ago • 1 comments

Describe the bug When using Rsamtools::pileup and the bam file provided contains records where the sequence field is omitted (as *), this error message is shown:

Error in value[[3L]](cond) : Unrecognized nucleotide 'B'

To Reproduce Please provide minimal R code to reproduce the example.

A bam file with a record without sequence:

@HD     VN:1.6  SO:coordinate
@SQ     SN:chr1 LN:248956422
read1   0       chr1    100     60      5M      *       0       0       ACTGT   FFFFF   NM:i:0  AS:i:5
read1   256     chr1    5000    30      5M      *       0       0       *       *       NM:i:0  AS:i:4

and

Rsamtools::pileup('example.bam')

Expected behavior Suggesting the user this is due to sequence field unavailable, or skipping the record.

sessionInfo

> sessionInfo()
R version 4.4.2 (2024-10-31)
Platform: x86_64-pc-linux-gnu
Running under: Arch Linux

Matrix products: default
BLAS:   /usr/lib/libblas.so.3.12.0
LAPACK: /usr/lib/liblapack.so.3.12.0

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
 [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

time zone: Australia/Melbourne
tzcode source: system (glibc)

attached base packages:
[1] stats4    stats     graphics  grDevices utils     datasets  methods
[8] base

other attached packages:
[1] Rsamtools_2.20.0     Biostrings_2.72.1    XVector_0.44.0
[4] GenomicRanges_1.56.1 GenomeInfoDb_1.40.1  IRanges_2.38.1
[7] S4Vectors_0.42.1     BiocGenerics_0.50.0

loaded via a namespace (and not attached):
 [1] R6_2.5.1                codetools_0.2-20        zlibbioc_1.50.0
 [4] GenomeInfoDbData_1.2.12 parallel_4.4.2          UCSC.utils_1.0.0
 [7] bitops_1.0-8            compiler_4.4.2          httr_1.4.7
[10] tools_4.4.2             jsonlite_1.8.8          crayon_1.5.3
[13] BiocParallel_1.38.0

ChangqingW avatar Jan 29 '25 09:01 ChangqingW

In case anyone else encountered the same error, it is probably secondary alignments missing sequences, skipping them in pileup might help:

Rsamtools::pileup(bam,
  scanBamParam = Rsamtools::ScanBamParam(flag = Rsamtools::scanBamFlag(isSecondaryAlignment = FALSE)))

ChangqingW avatar Jan 29 '25 09:01 ChangqingW