galig icon indicating copy to clipboard operation
galig copied to clipboard

.mem and sam files generated as empty

Open kirasato0211 opened this issue 5 years ago • 45 comments

Hi, I am using ASGAL tool to find MET14 deletion and EGFR variation events in the samples. I am running genome wide analysis for these two genes. ASGAL run is successful but I am getting .mem and sam file as empty and hence no events reported in the final files.

command I used is as below:

./asgal --multi -g genome.fa -a annotation2.gtf -s sample1.fastq.gz -s2 sample2.fastq.gz -t transcript.fa --allevents -o output

could you please help me in this case as soon as possible?

kirasato0211 avatar Dec 07 '20 11:12 kirasato0211

Hi, can you please send the tree view of the output folder (command tree is good enough)? Moreover, where did you get the gtf from? Can you share it?

ldenti avatar Dec 07 '20 12:12 ldenti

Hi, Below is a tree view of output folder.

.
|-- ASGAL
|   |-- ENSG00000105976.events.csv
|   |-- ENSG00000105976.mem
|   |-- ENSG00000105976.sam
|   |-- ENSG00000146648.events.csv
|   |-- ENSG00000146648.mem
|   `-- ENSG00000146648.sam
|-- ASGAL.csv
|-- annos
|   |-- ENSG00000105976.gtf
|   |-- ENSG00000105976.gtf.db
|   |-- ENSG00000105976.gtf.sg
|   |-- ENSG00000146648.gtf
|   |-- ENSG00000146648.gtf.db
|   `-- ENSG00000146648.gtf.sg
|-- logs
|   |-- ASGAL
|   |   |-- ENSG00000105976
|   |   `-- ENSG00000146648
|   |-- salmon_index.log
|   |-- salmon_quant.log
|   `-- samtools.log
|-- refs
|   `-- chr7.fa
|-- salmon
|   |-- salmon.bam
|   |-- salmon.bam.bai
|   |-- salmon_index
|   |   |-- duplicate_clusters.tsv
|   |   |-- hash.bin
|   |   |-- header.json
|   |   |-- indexing.log
|   |   |-- quasi_index.log
|   |   |-- refInfo.json
|   |   |-- rsd.bin
|   |   |-- sa.bin
|   |   |-- txpInfo.bin
|   |   `-- versionInfo.json
|   `-- salmon_out
|       |-- aux_info
|       |   |-- ambig_info.tsv
|       |   |-- expected_bias.gz
|       |   |-- fld.gz
|       |   |-- meta_info.json
|       |   |-- observed_bias.gz
|       |   |-- observed_bias_3p.gz
|       |   `-- unmapped_names.txt
|       |-- cmd_info.json
|       |-- libParams
|       |   `-- flenDist.txt
|       |-- lib_format_counts.json
|       |-- logs
|       |   `-- salmon_quant.log
|       `-- quant.sf
`-- samples
    |-- ENSG00000105976.fa
    `-- ENSG00000146648.fa

12 directories, 45 files

I have fetched the annotations for MET and EGFR from gencode annotations and give it as input for ASGAL command.

kirasato0211 avatar Dec 07 '20 13:12 kirasato0211

Can you check if the .fa files in the samples directory and the .gtf files in the annos directory are empty or not? Moreover, can you share the salmon.bam file?

ldenti avatar Dec 07 '20 13:12 ldenti

I can see the .fa files in the samples directory are non-empty but one of the file in the annos directory is empty and other one is non-empty. Please find attached here zip file which contains input annotations file(annotation2.gtf), the .gtf files from annos directory and the salmon.bam file. Attachment.zip

kirasato0211 avatar Dec 08 '20 04:12 kirasato0211

In annotation2.gtf both the transcripts of each gene share the same transcript_id. This produces unpredictable behaviours (gffutils library combines the two trancripts in a single transcript).

I manually edited the annotations (you can find it here) and now asgal output is non empty.

Let me know if this new annotation fixed your problem.

ldenti avatar Dec 08 '20 10:12 ldenti

Thank you for pointing it out. It helped me to run ASGAL successfully. However, there is no event reported in the output. I am expecting MET exon 14 skipping event to be reported in the output. I am sure that the samples i am using have MET14 deletion event and the EGFR variation.

Please have a look at the modified annotation file i am using. annotation.zip If i am interested in MET exon 14 skipping event to be reported in the output, do i need to mention exon 14 annotations in gtf file?

Could you please help me to understand how ASGAL report events?

kirasato0211 avatar Dec 08 '20 11:12 kirasato0211

mmm that's strange (when I used the salmon.bam you shared with me, I found 2 events). So there must be some other issue.

Are the .sam files empty?

(if you haven't done it yet) I suggest removing the .gtf.db files in the annos folder since they may refer to the old (incomplete) annotations and rerun (or you can just change output folder).

ASGAL reports a csv describing the events: one line per event with type of event (e.g. ES for exon skipping), genomic coordinates, and other information (the example you can find here describes better the output format).

ldenti avatar Dec 08 '20 11:12 ldenti

could you please let me know how you found 2 events using salmon.bam file? SAM files are not empty. I tried removing .gtf.db files in the annos folder and also tried to change output folder. Unfortunately, it didnt help me. Still there are no events reported in the output file.

kirasato0211 avatar Dec 08 '20 11:12 kirasato0211

I created the two samples using

samtools fastq -1 sample_1.fq -2 sample_2.fq salmon.bam

and then I ran asgal (as you did) on the edited annotation I sent you.

I checked the annotation you sent me and it contains only 1 transcript per gene (I'm using the initial one with 2 transcripts per gene). I think this is the reason why you are not getting any event.

ldenti avatar Dec 08 '20 12:12 ldenti

I run ASGAL with the edited annotation file you have sent to me and it didnt work. Then i used the salmon.bam file to generate the samples as you mentioned in above command and then run ASGAL. Unfortunately, it didnt help me.

Please send me the commands you have used to get the output.

kirasato0211 avatar Dec 08 '20 12:12 kirasato0211

I used these files (annotation, transcripts, and samples). The reference I used is chr7 only, downloaded from ensembl (link). Edit: you have to change the header of the fasta entry from >7 to >chr7 otherwise asgal crashes.

I then ran:

asgal --multi -g Homo_sapiens.GRCh38.dna.chromosome.7.fa -a annotation2.edit.gtf -s sample_1.fq -s2 sample_2.fq -t transcripts2.fa --allevents -o output

and I obtained these events:

Type,Start,End,Support,Transcripts,file
ES,55019366,55155829,16,ENST00000275493.7,output/ASGAL/ENSG00000146648.events.csv
ES,116771655,116774880,24,ENST00000318493.11,output/ASGAL/ENSG00000105976.events.csv

Let me know if these files work for you

ldenti avatar Dec 08 '20 18:12 ldenti

Thank You. These files worked for me. The genome file i was using earlier has chr7 in the header. However it has repeat masked bases and 50 bases per line. May be that was causing issue.

Thank you for helping in this case. I have tested it on positive samples and will soon test it on negative samples too.

kirasato0211 avatar Dec 09 '20 12:12 kirasato0211

Hi, please notice that the official asgal release (not the one on the agilent hub) is now at version 1.1.2. May you please try that version? It should suffice to run

sudo docker run -v "$PWD"/asgalgw_data:/data registry-dev.scs.agilent.com/algolab/galig:v1.1.1

Best

gdv avatar Mar 09 '21 14:03 gdv

could you please let me know from where i can get the latest asgal release (version 1.1.2)?

kirasato0211 avatar Mar 10 '21 05:03 kirasato0211

Sorry, I did not update the command. It should be

sudo docker run -v "$PWD"/asgalgw_data:/data algolab/asgal

The latest docker image is at https://hub.docker.com/r/algolab/asgal

Best regards

gdv avatar Mar 10 '21 05:03 gdv

Thank you. I am able to run ASGAL using this image. However, came across below error.

subprocess.CalledProcessError: Command '['/galig/bin/SpliceAwareAligner', '-g', '/data/output/refs/chr7.fa', '-a', '/data/output/annos/ENSG00000105976.gtf', '-s', '/data/output/samples/ENSG00000105976.fa', '-l', '15', '- e', '3', '-o', '/data/output/ASGAL/ENSG00000105976.mem']' died with <Signals.SIGILL: 4>. """

could you please let me know where its going wrong?

kirasato0211 avatar Mar 10 '21 08:03 kirasato0211

could you please let me know why i am getting below error while running ASGAL tool in docker?

subprocess.CalledProcessError: Command '['/galig/bin/SpliceAwareAligner', '-g', '/data/output/refs/chr7.fa', '-a', '/data/output/annos/ENSG00000105976.gtf', '-s', '/data/output/samples/ENSG00000105976.fa', '-l', '15', '- e', '3', '-o', '/data/output/ASGAL/ENSG00000105976.mem']' died with <Signals.SIGILL: 4>.

kirasato0211 avatar Mar 11 '21 08:03 kirasato0211

Are you using the example data?

ldenti avatar Mar 11 '21 08:03 ldenti

No. I am using own dataset.

kirasato0211 avatar Mar 11 '21 08:03 kirasato0211

Did it work on the example data? Can you also please send the entire log output to stderr?

ldenti avatar Mar 11 '21 09:03 ldenti

I received same error on example data also. I used data from example/input for test. please find attached here the error message from console.

I am unable to upload the error file here. Hence copying the console log here.

ubuntu@rnaseq-asgal1:~$ sudo docker run -v "$PWD"/asgalgw_data:/data algolab/asgal:v1.1.2 Starting with UID:GID 0:0 [ Mar 11, 2021 - 9:12:13AM ] args Namespace(allevents=False, annoPath='/data/annotation.gtf', debug=False, e='3', l='15', multiMode=False, outputPath='/data/output', refPath='/data/genome.fa', sample1Path='/data/sample_ 1.fa', sample2Path='-', split_only=False, threads='2', transPath='-', verbose=False, w='3') [ Mar 11, 2021 - 9:12:13AM ] Opening input annotation... [ Mar 11, 2021 - 9:12:13AM ] Indexing... [ Mar 11, 2021 - 9:12:13AM ] Reading input annotation... [ Mar 11, 2021 - 9:12:13AM ] number of genes 1 [##################################################] 1/1 [ Mar 11, 2021 - 9:12:13AM ] Done. [ Mar 11, 2021 - 9:12:13AM ] Running ASGAL on 1 gene... multiprocessing.pool.RemoteTraceback: """ Traceback (most recent call last): File "/usr/lib/python3.6/multiprocessing/pool.py", line 119, in worker result = (True, func(*args, **kwds)) File "/usr/lib/python3.6/multiprocessing/pool.py", line 44, in mapstar return list(map(*args)) File "/galig/asgal", line 390, in asgal_command_one_gene command_check_return(asgal_CMDs['align'], log, log, verbose=args.verbose) File "/galig/asgal", line 62, in command_check_return completed_process.check_returncode() File "/usr/lib/python3.6/subprocess.py", line 389, in check_returncode self.stderr) subprocess.CalledProcessError: Command '['/galig/bin/SpliceAwareAligner', '-g', '/data/genome.fa', '-a', '/data/annotation.gtf', '-s', '/data/sample_1.fa', '-l', '15', '-e', '3', '-o', '/data/output/sample_1-FBgn0040370. mem']' died with <Signals.SIGILL: 4>. """

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "/galig/asgal", line 585, in main() File "/galig/asgal", line 580, in main runASGAL(args, genes, chr_genes_dict) File "/galig/asgal", line 442, in runASGAL pool.map(asgal_command_one_gene, params_list) File "/usr/lib/python3.6/multiprocessing/pool.py", line 266, in map return self._map_async(func, iterable, mapstar, chunksize).get() File "/usr/lib/python3.6/multiprocessing/pool.py", line 644, in get raise self._value subprocess.CalledProcessError: Command '['/galig/bin/SpliceAwareAligner', '-g', '/data/genome.fa', '-a', '/data/annotation.gtf', '-s', '/data/sample_1.fa', '-l', '15', '-e', '3', '-o', '/data/output/sample_1-FBgn0040370. mem']' died with <Signals.SIGILL: 4>. ubuntu@rnaseq-asgal1:~$

kirasato0211 avatar Mar 11 '21 09:03 kirasato0211

I tried the example data with a clean installation of docker and asgal:v1.1.2 and it worked. Maybe older versions are creating some issue? Can you please remove/delete all old asgal images and try again?

ldenti avatar Mar 11 '21 10:03 ldenti

Moreover, since v1.1.1 worked for you in december: are you using a different machine or os? Have you update docker recently?

ldenti avatar Mar 11 '21 10:03 ldenti

I have tried command line ASGAL tool for few samples in december and that worked for me. now i want to try it on Docker so that i can run it on more samples. The machine i have used and now i am using are different but OS is same .i.e ubuntu 18.04.

kirasato0211 avatar Mar 11 '21 10:03 kirasato0211

I have installed docker on the machine where command line ASGAL was running successfully. Even on that machine i have came across same error.

[ Mar 12, 2021 - 6:06:14AM ] Running ASGAL on 2 genes... multiprocessing.pool.RemoteTraceback: """ Traceback (most recent call last): File "/usr/lib/python3.6/multiprocessing/pool.py", line 119, in worker result = (True, func(*args, **kwds)) File "/usr/lib/python3.6/multiprocessing/pool.py", line 44, in mapstar return list(map(*args)) File "/galig/asgal", line 390, in asgal_command_one_gene command_check_return(asgal_CMDs['align'], log, log, verbose=args.verbose) File "/galig/asgal", line 62, in command_check_return completed_process.check_returncode() File "/usr/lib/python3.6/subprocess.py", line 389, in check_returncode self.stderr) subprocess.CalledProcessError: Command '['/galig/bin/SpliceAwareAligner', '-g', '/data/output/refs/chr7.fa', '-a', '/data/output/annos/ENSG00000146648.gtf', '-s', '/data/output/samples/ENSG00000146648.fa', '-l', '15', '-e', '3', '-o', '/data/output/ASGAL/ENSG00000146648.mem']' died with <Signals.SIGSEGV: 11>. """

kirasato0211 avatar Mar 12 '21 06:03 kirasato0211

please let me know if you can help me in this case.

kirasato0211 avatar Mar 15 '21 05:03 kirasato0211

How much RAM does the machine have? Can you share with us the files that you have used?

gdv avatar Mar 15 '21 07:03 gdv

RAM : 16 GB

i am unable to upload files here. it would be great if you can use the files attached to previous comments on this thread. i am using same for current run.

kirasato0211 avatar Mar 15 '21 08:03 kirasato0211

I used the same files I linked in https://github.com/AlgoLab/galig/issues/12#issuecomment-740812653 and it worked. After setting up the inputs I ran:

docker run -v "$PWD"/2genes:/data algolab/asgal:v1.1.2

Can you please send here:

  • the output of docker -v
  • the output of docker info
  • the output of docker images -a | grep asgal
  • the logs folder you can find in the output folder created inside the folder containing the input you are using

ldenti avatar Mar 15 '21 09:03 ldenti

the output of docker -v

Docker version 19.03.6, build 369ce74a3c


sudo docker info

Client: Debug Mode: false

Server: Containers: 2 Running: 0 Paused: 0 Stopped: 2 Images: 1 Server Version: 19.03.6 Storage Driver: overlay2 Backing Filesystem: extfs Supports d_type: true Native Overlay Diff: true Logging Driver: json-file Cgroup Driver: cgroupfs Plugins: Volume: local Network: bridge host ipvlan macvlan null overlay Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog Swarm: inactive Runtimes: runc Default Runtime: runc Init Binary: docker-init containerd version: runc version: init version: Security Options: apparmor seccomp Profile: default Kernel Version: 4.15.0-136-generic Operating System: Ubuntu 18.04 LTS OSType: linux Architecture: x86_64 CPUs: 4 Total Memory: 15.17GiB Name: rnaseq-asgal1 ID: VLPG:C3ZT:P3Z7:JMY2:WJCN:VNM5:ZDRJ:BFIZ:7UNI:3AZA:OUAW:R6T3 Docker Root Dir: /var/lib/docker Debug Mode: false Registry: https://index.docker.io/v1/ Labels: Experimental: false Insecure Registries: 127.0.0.0/8 Live Restore Enabled: false

WARNING: No swap limit support


sudo docker images -a | grep asgal

algolab/asgal v1.1.2 f67ef7b50dd0 3 months ago 1.23GB


the logs folder you can find in the output folder created inside the folder containing the input you are using

~/asgalgw_data/output/logs$ ls -la total 20 drwxr-xr-x 3 root root 4096 Mar 11 06:33 . drwxr-xr-x 8 root root 4096 Mar 11 06:33 .. drwxr-xr-x 2 root root 4096 Mar 11 06:33 ASGAL -rw-r--r-- 1 root root 1215 Mar 11 06:30 salmon_index.log -rw-r--r-- 1 root root 3768 Mar 11 06:32 salmon_quant.log -rw-r--r-- 1 root root 0 Mar 11 06:32 samtools.log ~/asgalgw_data/output/logs$ cd ASGAL ~/asgalgw_data/output/logs/ASGAL$ ls -la total 8 drwxr-xr-x 2 root root 4096 Mar 11 06:33 . drwxr-xr-x 3 root root 4096 Mar 11 06:33 .. -rw-r--r-- 1 root root 0 Mar 11 06:33 ENSG00000105976 -rw-r--r-- 1 root root 0 Mar 11 06:33 ENSG00000146648

kirasato0211 avatar Mar 15 '21 09:03 kirasato0211