Running tool with custom CNV calls
Hi,
I'm having issues trying to run SigProfiler for CNV data using custom CNV calls. We have an in-house pipeline for CNV calls that is similar to the ASCAT approach, tailored to our WGS data. I tried to match the format of my calls to what ASCAT_NGS includes in their documentation ([https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6097604/]).
The columns I provided are (in this order): 'sample', 'segment_number', 'chromosome', 'start_position', 'end_position', 'major_normal', 'minor_normal', 'major_tumor', 'minor_tumor'. This would correspond to what ASCAT_NGS says is the output format for the 'copynumber.caveman.csv' file.
I'm running Sigprofiler as follows:
from SigProfilerExtractor import sigpro as sig
def main_function():
segment_file = "/home/kbrar/MOCHA_Jun_2024/somatic/cnv/adjcopies_segments/ascat_format_output.csv"
sig.sigProfilerExtractor("seg:ASCAT_NGS", "/home/kbrar/MOCHA_Jun_2024/somatic/cnv/sigprofiler_CNV_Jul2024", segment_file, reference_genome="GRCh38", opportunity_genome="GRCh38")
if __name__=="__main__":
main_function()
However, when I run sigprofiler as above, I get a KeyError. If I try to add the "Tumour TCN", "Normal BCN", and "Tumour BCN" columns as the errors indicate the tool is asking for, I get a KeyError with 'sample', which is definitely a column in the input CSV file. I've pasted that below.
Would you be able to clarify the exact input format needed for "ASCAT_NGS" type of input, or the specific columns required for any of the input types? I am able to adjust our input to whichever columns SigProfiler expects, but it is just unclear what exact columns should be provided for each respective input. If this could be clarified, I would be able to adjust my input column names to what is expected. Thanks!!
************** Reported Current Memory Use: 0.5 GB *****************
Traceback (most recent call last): File "/home/kbrar/miniforge3/envs/sigprofiler/lib/python3.11/site-packages/pandas/core/indexes/base.py", line 3802, in get_loc return self._engine.get_loc(casted_key) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "pandas/_libs/index.pyx", line 138, in pandas._libs.index.IndexEngine.get_loc File "pandas/_libs/index.pyx", line 165, in pandas._libs.index.IndexEngine.get_loc File "pandas/_libs/hashtable_class_helper.pxi", line 5745, in pandas._libs.hashtable.PyObjectHashTable.get_item File "pandas/_libs/hashtable_class_helper.pxi", line 5753, in pandas._libs.hashtable.PyObjectHashTable.get_item KeyError: 'sample'
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/home/kbrar/python_shell_scripts/sigprofiler_cnv_run.py", line 10, in
I also will mention that in some instances I get the key error "KeyError: '1:het:100kb-1Mb'" instead.
I was able to get it to work if I round all the copy numbers to whole numbers - is this required for SigProfiler?
Hi @kbrar4013,
You should not need to round copy numbers to whole numbers as this is not required for SigProfiler.
@azhark2, could you please confirm the format required for ASCAT_NGS? It would be great to add some examples to SigProfilerMatrixGenerator for the different formats. Thanks!
Hi @kbrar4013,
Thanks for reaching out and sorry for the delay.
If you are still having issues, can you please attach copynumber.caveman.csv (or at least the first few lines)?
Thanks, Azhar
Hi Azhar,
I was able to get the tool to work without issues after rounding the copy numbers. I saw that may not be necssary, so it would be great if you could help figure out how I can use the tool with raw copy number values without rounded output? I know ASCAT output is normally rounded, so perhaps that was the reason?
I also had a separate quick question: is there a difference between using SigProfilerAssignment to decompose signatures to the COSMIC reference signatures versus the output from SigProfilerExtractor that provides a folder with the suggested solution decomposed to COSMIC? Thanks again!
I don't actually have a copynumber.caveman.csv file, as the data is not actually from ASCAT but our own internal caller. I've attached a few lines of our data though, which I formatted to match ASCAT format. ascat_truncated_output.tsv.zip
The SigProfilerAssignment output for decompose_fit and the suggested solution decomposed to COSMIC in the SigProfilerExtractor output directory should be the same.
@azhark2 bumping this again so that you may see it and confirm the file format for ASCAT_NGS.
Hi, following up again. Can only still use the tool with rounded copy numbers - wondering if you could check the file I uploaded and let me know if that format is correct?