Exomiser icon indicating copy to clipboard operation
Exomiser copied to clipboard

exomiser does not output gnomAD and 1KG frequency annotations

Open seru71 opened this issue 7 years ago • 9 comments

Hi,

After downloading exomiser 10.0.1 and 1802_hg19 dataset, I ran the NA19722_601952_AUTOSOMAL_RECESSIVE_POMP_13_29233225_5UTR_38 example. Everything went fine, except that in the output several variant frequency annotations were missing. Here is the header for _AD.variants.tsv file:

#CHROM POS REF ALT QUAL FILTER GENOTYPE COVERAGE FUNCTIONAL_CLASS HGVS EXOMISER_GENE CADD(>0.483) POLYPHEN(>0.956|>0.446) MUTATIONTASTER(>0.94) SIFT(<0.06) REMM DBSNP_ID MAX_FREQUENCY DBSNP_FREQUENCY EVS_EA_FREQUENCY EVS_AA_FREQUENCY EXAC_AFR_FREQ EXAC_AMR_FREQ EXAC_EAS_FREQ EXAC_FIN_FREQ EXAC_NFE_FREQ EXAC_SAS_FREQ EXAC_OTH_FREQ EXOMISER_VARIANT_SCORE EXOMISER_GENE_PHENO_SCORE EXOMISER_GENE_VARIANT_SCORE EXOMISER_GENE_COMBINED_SCORE CONTRIBUTING_VARIANT

GNOMAD, 1KG, UK10K are specified in the YAML file, but missing from the output. Should I download these frequency databases separately?

Cheers,

seru71 avatar Mar 26 '18 09:03 seru71

GNOMAD, 1KG, UK10K are specified in the YAML file, but missing from the output. Should I download these frequency databases separately?

No, you don't need to do that, they are part of the existing distribution. You can see them in the HTML output.

Given the inflexibility of TSV we're considering a new JSON output in the upcoming release which will contain the newer data sources.

julesjacobsen avatar Apr 18 '18 13:04 julesjacobsen

Thank you for the answer @julesjacobsen . Indeed, I can see them in the HTML output. So TSV output has only a subset of annotation columns present in HTML?

seru71 avatar Apr 19 '18 12:04 seru71

Correct, TSV doesn't contain all the data. How are you trying to use this? Is it part of an informatics pipeline or for display to clinicians? As I said previously we're looking at JSON as this is more amenable to having data added without breaking other people's parsers. What would be your preference?

julesjacobsen avatar Apr 19 '18 13:04 julesjacobsen

I have been trying it out attracted by the possibility of annotating variants with the REMM score. Looked at the tsv first, because it was easier to filter the variants there.

JSON is great for programmatic use, but not so convenient to manipulate using Unix shell. Having both would be awesome.

seru71 avatar Apr 20 '18 14:04 seru71

Do you just want the REMM score for a variant? If so tabix would be a better choice than running the whole of exomiser. Running exomiser just to annotate variants isn't really what it was designed to do as it will take a lot of time and RAM to do this.

julesjacobsen avatar Apr 20 '18 15:04 julesjacobsen

Maybe for annotating variants without prioritization jannovar might be a better choice.

Jannovar can annotate several other sources like dbNSFP. ReMM directly is not implemented yet but, if needed, I can easily add this function. Becaus ReMM just needs the position in the genome it is always the fastest to use directly tabix (without any alt allele comparison which will be needed if you use CADD for example).

Jules Jacobsen [email protected] schrieb am Fr., 20. Apr. 2018, 17:01:

Do you just want the REMM score for a variant? If so tabix http://www.htslib.org/doc/tabix.html would be a better choice than running the whole of exomiser. Running exomiser just to annotate variants isn't really what it was designed to do as it will take a lot of time and RAM to do this.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/exomiser/Exomiser/issues/258#issuecomment-383124793, or mute the thread https://github.com/notifications/unsubscribe-auth/AI1nsGBLfVRerqf1Pt3kS9agWPP_s6iKks5tqfhLgaJpZM4S6yF7 .

visze avatar Apr 20 '18 15:04 visze

Json would be a great output format for us.

On Apr 20, 2018, at 11:13 AM, Max [email protected] wrote:

Maybe for annotating variants without prioritization jannovar might be a better choice.

Jannovar can annotate several other sources like dbNSFP. ReMM directly is not implemented yet but, if needed, I can easily add this function. Becaus ReMM just needs the position in the genome it is always the fastest to use directly tabix (without any alt allele comparison which will be needed if you use CADD for example).

Jules Jacobsen [email protected] schrieb am Fr., 20. Apr. 2018, 17:01:

Do you just want the REMM score for a variant? If so tabix http://www.htslib.org/doc/tabix.html would be a better choice than running the whole of exomiser. Running exomiser just to annotate variants isn't really what it was designed to do as it will take a lot of time and RAM to do this.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/exomiser/Exomiser/issues/258#issuecomment-383124793, or mute the thread https://github.com/notifications/unsubscribe-auth/AI1nsGBLfVRerqf1Pt3kS9agWPP_s6iKks5tqfhLgaJpZM4S6yF7 .

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or mute the thread.

DGMichael avatar Apr 20 '18 17:04 DGMichael

@julesjacobsen annotating with REMM wasn't the sole purpose. Also wanted to try it out on a few undiagnosed WGS samples where we're looking for some new clues.

I agree that using it to annotate variants with one score is an overkill. For annotation I have been using mostly Annovar, so I could easily convert REMM db into an Annovar annotation file. Using Exomiser I killed two birds with one stone:)

seru71 avatar Apr 23 '18 07:04 seru71

@seru71 Cool, that's exactly the right use-case! @DGMichael good to hear.

julesjacobsen avatar Apr 24 '18 09:04 julesjacobsen