TranscriptClean icon indicating copy to clipboard operation
TranscriptClean copied to clipboard

invalid literal for int() with base 10: '1307=1' and UserWarning: Problem parsing transcript with ID 'transcript/10670'

Open philge opened this issue 3 years ago • 3 comments

Hi,

I am getting issues like below when I run TranscriptClean "Correcting transcripts... invalid literal for int() with base 10: '1307=1' invalid literal for int() with base 10: '15=1' invalid literal for int() with base 10: '1094=1' invalid literal for int() with base 10: '1094=1' invalid literal for int() with base 10: '1093=1' invalid literal for int() with base 10: '1509=1' invalid literal for int() with base 10: '511=1' invalid literal for int() with base 10: '91=15588' invalid literal for int() with base 10: '77=19737' invalid literal for int() with base 10: '77=19737' .."

Also, "/data_disk2/software/TranscriptClean-2.0.3/TranscriptClean.py:339: UserWarning: Problem parsing transcript with ID 'transcript/10670' warnings.warn("Problem parsing transcript with ID '" + /data_disk2/software/TranscriptClean-2.0.3/TranscriptClean.py:339: UserWarning: Problem parsing transcript with ID 'transcript/10345' warnings.warn("Problem parsing transcript with ID '" + /data_disk2/software/TranscriptClean-2.0.3/TranscriptClean.py:339: UserWarning: Problem parsing transcript with ID 'transcript/11633' warnings.warn("Problem parsing transcript with ID '" + /data_disk2/software/TranscriptClean-2.0.3/TranscriptClean.py:339: UserWarning: Problem parsing transcript with ID 'transcript/11869' warnings.warn("Problem parsing transcript with ID '" + /data_disk2/software/TranscriptClean-2.0.3/TranscriptClean.py:339: UserWarning: Problem parsing transcript with ID 'transcript/23980' warnings.warn("Problem parsing transcript with ID '" + /data_disk2/software/TranscriptClean-2.0.3/TranscriptClean.py:339: UserWarning: Problem parsing transcript with ID 'transcript/224' warnings.warn("Problem parsing transcript with ID '" +"

Can you please help me to fix the issue?

Thanks Philge

philge avatar Mar 10 '22 13:03 philge

Hi, I'm pretty sure that the "problem parsing transcript..." warnings are being caused by the "invalid literal..." errors, but prior versions of TranscriptClean have buried the stack trace of the thrown errors in a try / except block. If you install the latest version, I am certain it will still throw an error but it will be more informative. Would you be able to run it with the latest commits and copy the output from that run here? Should make it easier to debug.

fairliereese avatar Mar 16 '23 19:03 fairliereese

@fairliereese Hi, I have the same ERROR and ran with the latest commits. Here's the output:

Traceback (most recent call last): File "xxx/miniconda3/envs/isoseq/lib/python3.7/multiprocessing/process.py", line 297, in _bootstrap self.run() File "xxx/miniconda3/envs/isoseq/lib/python3.7/multiprocessing/process.py", line 99, in run self._target(*self._args, **self._kwargs) File "xxx/TranscriptClean/newest_version/TranscriptClean/TranscriptClean.py", line 576, in run_chunk buffer_size=options.buffer_size) File "xxx/TranscriptClean/newest_version/TranscriptClean/TranscriptClean.py", line 364, in batch_correct options, refs) File "xxx/TranscriptClean/newest_version/TranscriptClean/TranscriptClean.py", line 409, in correct_transcript refs.sjAnnot) File "xxx/TranscriptClean/newest_vestion/TranscriptClean/TranscriptClean.py", line 336, in transcript_init transcript = Transcript(sam_fields, genome, sjAnnot) File "xxx/newest_verstion/TranscriptClean/transcript.py", line 48, in init self.NM, self.MD = self.getNMandMDFlags(genome) File "xxx/TranscriptClean/newest_verstion/TranscriptClean/transcript.py", line 289, in getNMandMDFlags operations, counts = self.splitCIGAR() File "xxx/TranscriptClean/newest_verstion/TranscriptClean/transcript.py", line 126, in splitCIGAR return splitCIGARstr(self.CIGAR) # alignTypes, counts File "xxx/TranscriptClean/newest_verstion/TranscriptClean/transcript.py", line 586, in splitCIGARstr counts = [int(i) for i in counts] File "xxx/TranscriptClean/newest_verstion/TranscriptClean/transcript.py", line 586, in counts = [int(i) for i in counts] ValueError: invalid literal for int() with base 10: '130=20054'

Could you please help to fix this issue? Thanks.

FeliciaJiangBio avatar Mar 27 '23 08:03 FeliciaJiangBio

Thanks for running with the newest commits, I now know what line is throwing the error but am still not entirely sure what's causing it. If you would be able to send me a snippet of your input SAM file (that still causes an error when you try to run it), that would be really helpful.

Alternatively, you can send me the cigar string of one of your transcripts that you know is causing the error.

fairliereese avatar Mar 27 '23 16:03 fairliereese