Decomposition plots are empty
I'm running sigprofileExtractor (v1.1.7, python 3.9.2) and get all expected outputs, except the *_Decomposition_Plots.pdf are all empty. I get 3 SBS signatures and 2 ID signatures that are decomposed to COSMIC signatures, according to the De_Novo_map_to_COSMIC_ID83/SBS96.csv files.
For example:
De_Novo_map_to_COSMIC_ID83.csv
De novo extracted, Global NMF Signatures, L1 Error %, L2 Error %, KL Divergence, Cosine Similarity, Correlation
Signature 83-A, Signature ID2 (64.90%) & Signature ID7 (35.10%), 42.26, 22.15, 0.353, 0.98, 0.97
Signature 83-B, Signature ID1 (55.22%) & Signature ID12 (44.78%), 62.98, 20.18, 1.728, 0.98, 0.98
The command I am running is:
sig.sigProfilerExtractor("vcf", "results", /path/to/VCFs, reference_genome="dog", opportunity_genome="dog", context_type="96,DINUC,ID", exome=True, minimum_signatures=1, maximum_signatures=25, nmf_replicates=500, cpu=12)
Thanks, Kim
Dear Kim,
Thanks so much for your question and sorry for our late reply. Could you please test if you are still having this issue with the most recent v1.1.10?
Hi Marcos,
I've tested my data with v1.1.10 and the plots are still empty. However, when running with test data from the Quick Start Example here, the plots were fine. I am running the analysis on canine data after installing a custom reference using SigprofilerMatrixGenerator, however, looking at the log file I see that this is not supported in SigprofilerExtractor:
The selected opportunity genome is canfam3.1. COSMIC signatures are available only for GRCh37/38, mm9/10 and rn6 genomes. So, the opportunity genome is reset to GRCh37.
Denovo Fitting .....
|████████████████████████████████████████| 87/87 [100%] in 1.0s (88.66/s)
Decomposing De Novo Signatures .....
The context-96 decomposition plots pages were not able to be generated.
The context-96 decomposition plots pages were not able to be generated.
The context-96 decomposition plots pages were not able to be generated.
So, the opportunity genome is defaulting to GRCh37. Is this the reason the plots are not generated? Is there a way I can analyse my canine data? According to this thread, one can create a custom databbase, but I cannot find the documentation for this.
Thanks!
Hi again Kim,
Thanks for the test, and I'm sorry you are still experiencing this issue. As you have noticed, the canfam3.1 reference genome is supported as the reference_genome (to generate the input mutational matrix via SigProfilerMatrixGenerator) but not as the opportunity_genome (because there are no COSMIC reference signatures available for this genome build). However, defaulting to GRCh37 should not be affecting the decomposition plot generation.
It would be great if you could send us your input matrix, job metadata file and the command used in order to be able to reproduce the issue. Happy to follow up by email if you prefer at [email protected].
Regarding the previous issue you mentioned, this is now outdated since we have moved the decomposition functionality to the new SigProfilerAssignment tool (in particular to the decompose_fit function). You should use your de novo extracted signatures as the signatures parameter and your custom database as the signature_database.
I hope this helps, and please let me know if you have further questions.
Thank you, Marcos. I will email you some example data and I'll give SigProfilerAssignment a try.
Just a quick follow-up question regarding the custom database. In order to re-normalize the human WSG signatures to dog, would I do: for each 96 trinucleotides, count the number seen in the dog (exome) and human (WGS) genomes, then find the ratio of human/dog for each trinucleotide as a scaling factor. I will then take the values in COSMIC_v3.2_SBS_GRCh38.txt and multiply by the appropriate factor. Finally, I will divide each value by the column sum for each signature in order to use this as the custom reference signature database. Is this correct? Thanks!
Just a quick follow-up question regarding the custom database. In order to re-normalize the human WSG signatures to dog, would I do: for each 96 trinucleotides, count the number seen in the dog (exome) and human (WGS) genomes, then find the ratio of human/dog for each trinucleotide as a scaling factor. I will then take the values in COSMIC_v3.2_SBS_GRCh38.txt and multiply by the appropriate factor. Finally, I will divide each value by the column sum for each signature in order to use this as the custom reference signature database. Is this correct? Thanks!
This is exactly the process that is needed. In fact, you have the context counts for the human genome available in the SigProfilerMatrixGenerator repo here. For the dog (exome) context counts, you can generate the files by using this script (please consider using the --genome and --exome flags).
Thank you, Marcos. I will email you some example data and I'll give SigProfilerAssignment a try.
We will follow up by email regarding the decomposition plots issue. For the moment, please continue using mutational matrices as input for your analysis, which avoids this problem.
Thanks!
Dear Kim,
We have found a minor bug leading to the issue with the decomposition plots. It has been fixed in the newest version v1.1.13. I hope this helps, and please feel free to reopen the ticket if you have further questions. Thanks again for your interest!