.mem and sam files generated as empty
Hi, I am using ASGAL tool to find MET14 deletion and EGFR variation events in the samples. I am running genome wide analysis for these two genes. ASGAL run is successful but I am getting .mem and sam file as empty and hence no events reported in the final files.
command I used is as below:
./asgal --multi -g genome.fa -a annotation2.gtf -s sample1.fastq.gz -s2 sample2.fastq.gz -t transcript.fa --allevents -o output
could you please help me in this case as soon as possible?
Hi, can you please send the tree view of the output folder (command tree is good enough)? Moreover, where did you get the gtf from? Can you share it?
Hi, Below is a tree view of output folder.
.
|-- ASGAL
| |-- ENSG00000105976.events.csv
| |-- ENSG00000105976.mem
| |-- ENSG00000105976.sam
| |-- ENSG00000146648.events.csv
| |-- ENSG00000146648.mem
| `-- ENSG00000146648.sam
|-- ASGAL.csv
|-- annos
| |-- ENSG00000105976.gtf
| |-- ENSG00000105976.gtf.db
| |-- ENSG00000105976.gtf.sg
| |-- ENSG00000146648.gtf
| |-- ENSG00000146648.gtf.db
| `-- ENSG00000146648.gtf.sg
|-- logs
| |-- ASGAL
| | |-- ENSG00000105976
| | `-- ENSG00000146648
| |-- salmon_index.log
| |-- salmon_quant.log
| `-- samtools.log
|-- refs
| `-- chr7.fa
|-- salmon
| |-- salmon.bam
| |-- salmon.bam.bai
| |-- salmon_index
| | |-- duplicate_clusters.tsv
| | |-- hash.bin
| | |-- header.json
| | |-- indexing.log
| | |-- quasi_index.log
| | |-- refInfo.json
| | |-- rsd.bin
| | |-- sa.bin
| | |-- txpInfo.bin
| | `-- versionInfo.json
| `-- salmon_out
| |-- aux_info
| | |-- ambig_info.tsv
| | |-- expected_bias.gz
| | |-- fld.gz
| | |-- meta_info.json
| | |-- observed_bias.gz
| | |-- observed_bias_3p.gz
| | `-- unmapped_names.txt
| |-- cmd_info.json
| |-- libParams
| | `-- flenDist.txt
| |-- lib_format_counts.json
| |-- logs
| | `-- salmon_quant.log
| `-- quant.sf
`-- samples
|-- ENSG00000105976.fa
`-- ENSG00000146648.fa
12 directories, 45 files
I have fetched the annotations for MET and EGFR from gencode annotations and give it as input for ASGAL command.
Can you check if the .fa files in the samples directory and the .gtf files in the annos directory are empty or not? Moreover, can you share the salmon.bam file?
I can see the .fa files in the samples directory are non-empty but one of the file in the annos directory is empty and other one is non-empty. Please find attached here zip file which contains input annotations file(annotation2.gtf), the .gtf files from annos directory and the salmon.bam file. Attachment.zip
In annotation2.gtf both the transcripts of each gene share the same transcript_id. This produces unpredictable behaviours (gffutils library combines the two trancripts in a single transcript).
I manually edited the annotations (you can find it here) and now asgal output is non empty.
Let me know if this new annotation fixed your problem.
Thank you for pointing it out. It helped me to run ASGAL successfully. However, there is no event reported in the output. I am expecting MET exon 14 skipping event to be reported in the output. I am sure that the samples i am using have MET14 deletion event and the EGFR variation.
Please have a look at the modified annotation file i am using. annotation.zip If i am interested in MET exon 14 skipping event to be reported in the output, do i need to mention exon 14 annotations in gtf file?
Could you please help me to understand how ASGAL report events?
mmm that's strange (when I used the salmon.bam you shared with me, I found 2 events). So there must be some other issue.
Are the .sam files empty?
(if you haven't done it yet) I suggest removing the .gtf.db files in the annos folder since they may refer to the old (incomplete) annotations and rerun (or you can just change output folder).
ASGAL reports a csv describing the events: one line per event with type of event (e.g. ES for exon skipping), genomic coordinates, and other information (the example you can find here describes better the output format).
could you please let me know how you found 2 events using salmon.bam file? SAM files are not empty. I tried removing .gtf.db files in the annos folder and also tried to change output folder. Unfortunately, it didnt help me. Still there are no events reported in the output file.
I created the two samples using
samtools fastq -1 sample_1.fq -2 sample_2.fq salmon.bam
and then I ran asgal (as you did) on the edited annotation I sent you.
I checked the annotation you sent me and it contains only 1 transcript per gene (I'm using the initial one with 2 transcripts per gene). I think this is the reason why you are not getting any event.
I run ASGAL with the edited annotation file you have sent to me and it didnt work. Then i used the salmon.bam file to generate the samples as you mentioned in above command and then run ASGAL. Unfortunately, it didnt help me.
Please send me the commands you have used to get the output.
I used these files (annotation, transcripts, and samples). The reference I used is chr7 only, downloaded from ensembl (link). Edit: you have to change the header of the fasta entry from >7 to >chr7 otherwise asgal crashes.
I then ran:
asgal --multi -g Homo_sapiens.GRCh38.dna.chromosome.7.fa -a annotation2.edit.gtf -s sample_1.fq -s2 sample_2.fq -t transcripts2.fa --allevents -o output
and I obtained these events:
Type,Start,End,Support,Transcripts,file
ES,55019366,55155829,16,ENST00000275493.7,output/ASGAL/ENSG00000146648.events.csv
ES,116771655,116774880,24,ENST00000318493.11,output/ASGAL/ENSG00000105976.events.csv
Let me know if these files work for you
Thank You. These files worked for me. The genome file i was using earlier has chr7 in the header. However it has repeat masked bases and 50 bases per line. May be that was causing issue.
Thank you for helping in this case. I have tested it on positive samples and will soon test it on negative samples too.
Hi, please notice that the official asgal release (not the one on the agilent hub) is now at version 1.1.2. May you please try that version? It should suffice to run
sudo docker run -v "$PWD"/asgalgw_data:/data registry-dev.scs.agilent.com/algolab/galig:v1.1.1
Best
could you please let me know from where i can get the latest asgal release (version 1.1.2)?
Sorry, I did not update the command. It should be
sudo docker run -v "$PWD"/asgalgw_data:/data algolab/asgal
The latest docker image is at https://hub.docker.com/r/algolab/asgal
Best regards
Thank you. I am able to run ASGAL using this image. However, came across below error.
subprocess.CalledProcessError: Command '['/galig/bin/SpliceAwareAligner', '-g', '/data/output/refs/chr7.fa', '-a', '/data/output/annos/ENSG00000105976.gtf', '-s', '/data/output/samples/ENSG00000105976.fa', '-l', '15', '- e', '3', '-o', '/data/output/ASGAL/ENSG00000105976.mem']' died with <Signals.SIGILL: 4>. """
could you please let me know where its going wrong?
could you please let me know why i am getting below error while running ASGAL tool in docker?
subprocess.CalledProcessError: Command '['/galig/bin/SpliceAwareAligner', '-g', '/data/output/refs/chr7.fa', '-a', '/data/output/annos/ENSG00000105976.gtf', '-s', '/data/output/samples/ENSG00000105976.fa', '-l', '15', '- e', '3', '-o', '/data/output/ASGAL/ENSG00000105976.mem']' died with <Signals.SIGILL: 4>.
Are you using the example data?
No. I am using own dataset.
Did it work on the example data? Can you also please send the entire log output to stderr?
I received same error on example data also. I used data from example/input for test. please find attached here the error message from console.
I am unable to upload the error file here. Hence copying the console log here.
ubuntu@rnaseq-asgal1:~$ sudo docker run -v "$PWD"/asgalgw_data:/data algolab/asgal:v1.1.2 Starting with UID:GID 0:0 [ Mar 11, 2021 - 9:12:13AM ] args Namespace(allevents=False, annoPath='/data/annotation.gtf', debug=False, e='3', l='15', multiMode=False, outputPath='/data/output', refPath='/data/genome.fa', sample1Path='/data/sample_ 1.fa', sample2Path='-', split_only=False, threads='2', transPath='-', verbose=False, w='3') [ Mar 11, 2021 - 9:12:13AM ] Opening input annotation... [ Mar 11, 2021 - 9:12:13AM ] Indexing... [ Mar 11, 2021 - 9:12:13AM ] Reading input annotation... [ Mar 11, 2021 - 9:12:13AM ] number of genes 1 [##################################################] 1/1 [ Mar 11, 2021 - 9:12:13AM ] Done. [ Mar 11, 2021 - 9:12:13AM ] Running ASGAL on 1 gene... multiprocessing.pool.RemoteTraceback: """ Traceback (most recent call last): File "/usr/lib/python3.6/multiprocessing/pool.py", line 119, in worker result = (True, func(*args, **kwds)) File "/usr/lib/python3.6/multiprocessing/pool.py", line 44, in mapstar return list(map(*args)) File "/galig/asgal", line 390, in asgal_command_one_gene command_check_return(asgal_CMDs['align'], log, log, verbose=args.verbose) File "/galig/asgal", line 62, in command_check_return completed_process.check_returncode() File "/usr/lib/python3.6/subprocess.py", line 389, in check_returncode self.stderr) subprocess.CalledProcessError: Command '['/galig/bin/SpliceAwareAligner', '-g', '/data/genome.fa', '-a', '/data/annotation.gtf', '-s', '/data/sample_1.fa', '-l', '15', '-e', '3', '-o', '/data/output/sample_1-FBgn0040370. mem']' died with <Signals.SIGILL: 4>. """
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/galig/asgal", line 585, in
I tried the example data with a clean installation of docker and asgal:v1.1.2 and it worked. Maybe older versions are creating some issue? Can you please remove/delete all old asgal images and try again?
Moreover, since v1.1.1 worked for you in december: are you using a different machine or os? Have you update docker recently?
I have tried command line ASGAL tool for few samples in december and that worked for me. now i want to try it on Docker so that i can run it on more samples. The machine i have used and now i am using are different but OS is same .i.e ubuntu 18.04.
I have installed docker on the machine where command line ASGAL was running successfully. Even on that machine i have came across same error.
[ Mar 12, 2021 - 6:06:14AM ] Running ASGAL on 2 genes... multiprocessing.pool.RemoteTraceback: """ Traceback (most recent call last): File "/usr/lib/python3.6/multiprocessing/pool.py", line 119, in worker result = (True, func(*args, **kwds)) File "/usr/lib/python3.6/multiprocessing/pool.py", line 44, in mapstar return list(map(*args)) File "/galig/asgal", line 390, in asgal_command_one_gene command_check_return(asgal_CMDs['align'], log, log, verbose=args.verbose) File "/galig/asgal", line 62, in command_check_return completed_process.check_returncode() File "/usr/lib/python3.6/subprocess.py", line 389, in check_returncode self.stderr) subprocess.CalledProcessError: Command '['/galig/bin/SpliceAwareAligner', '-g', '/data/output/refs/chr7.fa', '-a', '/data/output/annos/ENSG00000146648.gtf', '-s', '/data/output/samples/ENSG00000146648.fa', '-l', '15', '-e', '3', '-o', '/data/output/ASGAL/ENSG00000146648.mem']' died with <Signals.SIGSEGV: 11>. """
please let me know if you can help me in this case.
How much RAM does the machine have? Can you share with us the files that you have used?
RAM : 16 GB
i am unable to upload files here. it would be great if you can use the files attached to previous comments on this thread. i am using same for current run.
I used the same files I linked in https://github.com/AlgoLab/galig/issues/12#issuecomment-740812653 and it worked. After setting up the inputs I ran:
docker run -v "$PWD"/2genes:/data algolab/asgal:v1.1.2
Can you please send here:
- the output of
docker -v - the output of
docker info - the output of
docker images -a | grep asgal - the
logsfolder you can find in theoutputfolder created inside the folder containing the input you are using
the output of docker -v
Docker version 19.03.6, build 369ce74a3c
sudo docker info
Client: Debug Mode: false
Server: Containers: 2 Running: 0 Paused: 0 Stopped: 2 Images: 1 Server Version: 19.03.6 Storage Driver: overlay2 Backing Filesystem: extfs Supports d_type: true Native Overlay Diff: true Logging Driver: json-file Cgroup Driver: cgroupfs Plugins: Volume: local Network: bridge host ipvlan macvlan null overlay Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog Swarm: inactive Runtimes: runc Default Runtime: runc Init Binary: docker-init containerd version: runc version: init version: Security Options: apparmor seccomp Profile: default Kernel Version: 4.15.0-136-generic Operating System: Ubuntu 18.04 LTS OSType: linux Architecture: x86_64 CPUs: 4 Total Memory: 15.17GiB Name: rnaseq-asgal1 ID: VLPG:C3ZT:P3Z7:JMY2:WJCN:VNM5:ZDRJ:BFIZ:7UNI:3AZA:OUAW:R6T3 Docker Root Dir: /var/lib/docker Debug Mode: false Registry: https://index.docker.io/v1/ Labels: Experimental: false Insecure Registries: 127.0.0.0/8 Live Restore Enabled: false
WARNING: No swap limit support
sudo docker images -a | grep asgal
algolab/asgal v1.1.2 f67ef7b50dd0 3 months ago 1.23GB
the logs folder you can find in the output folder created inside the folder containing the input you are using
~/asgalgw_data/output/logs$ ls -la total 20 drwxr-xr-x 3 root root 4096 Mar 11 06:33 . drwxr-xr-x 8 root root 4096 Mar 11 06:33 .. drwxr-xr-x 2 root root 4096 Mar 11 06:33 ASGAL -rw-r--r-- 1 root root 1215 Mar 11 06:30 salmon_index.log -rw-r--r-- 1 root root 3768 Mar 11 06:32 salmon_quant.log -rw-r--r-- 1 root root 0 Mar 11 06:32 samtools.log ~/asgalgw_data/output/logs$ cd ASGAL ~/asgalgw_data/output/logs/ASGAL$ ls -la total 8 drwxr-xr-x 2 root root 4096 Mar 11 06:33 . drwxr-xr-x 3 root root 4096 Mar 11 06:33 .. -rw-r--r-- 1 root root 0 Mar 11 06:33 ENSG00000105976 -rw-r--r-- 1 root root 0 Mar 11 06:33 ENSG00000146648