smudgeplot.py hetmers not producing kmerpairs_text.smu (.trim.ktab only)
hetmers step has completed, but not resulting in .smu file?
smudgeplot.py hetmers -L 12 -t 4 -o kmerpairs -tmp $TMP --verbose Fastk_Table
Running smudgeplot v0.4.0dev Task: hetmers Calling: hetmers (PloidyPlot kmer pair search) -okmerpairs -e12 -T4 -v -P/nesi/nobackup/ga02470/acanthoxyla/github/new-illumina-smudgeplots/AXG/fastk2/tmp Fastk_Table
The input table is untrimmed and not symmetric
Trimming k-mers in table with count < 12 Output table ./.trim.ktab already exists, continue? yes
Making trimmed table symmetric
Starting to count covariant pairs
Done!
This outputs a file called: .symx.ktab But not the .smu file that I was expecting.
I thought maybe we don't have the coverage for the -L 12 option so tried also -L 6 and same result.
Any suggestions why I am not generating the .smu file?
Options used for prior step:
FastK -v -t4 -k21 -M16 -T4 -P$TMP -c AXG_novaseq_R*.fq -NFastk_Table
Hi Gemma, the expected log looks like this:
Running smudgeplot v0.4.0dev
Task: hetmers
Calling: hetmers (PloidyPlot kmer pair search) -otest -e10 -T4 -v Fastk_Table
The input table is untrimmed and not symmetric
Trimming k-mers in table with count < 10
Making trimmed table symmetric
Starting to count covariant pairs
Count complete, plotting
About to save stuff
Saving stuff
Done!
Yours is missing
Count complete, plotting
About to save stuff
Saving stuff
Done!
I am not sure why, but I suspect it's something wrong with the k-mer database. If you make a k-mer histogram (that should take just seconds), does it look sane? Something like this should do the job...
Histex -G Fastk_Table > kmer.hist
Hi Kamil,
I tried your suggestion: Histex -G Fastk_Table > kmer.hist
Then plotted kmer.hist with genomescope.
The model didn't fit well, but as far as I can tell it looks "normal"
Do you have any further suggestions what I could try, to generate the .smu file?
So far I have these intermediate files in the dir:
Fastk_Table.hist Fastk_Table.ktab
.Fastk_Table.ktab.2 .Fastk_Table.ktab.4 .Fastk_Table.ktab.1 .Fastk_Table.ktab.3
..symx.ktab.1 .symx.ktab
@gemmacol That's puzzing. With Gene we can't think of a good reason this could/should happen. Did you by any chance have anything streamed to the stderr? @thegenemyers thinks there should be something, it should not crush silently.
Would you be able to upload for us the .ktab as well as all the .Fastk_Table.ktab.? We are unable to figure out what exactly is wrong with this with the information we have...
I have re-created the issue this time with a single slurm script and here provide the two files you asked for as well as the script and the log files gathered after re-running it:
-
This is the .symx.ktab : symx.zip
-
This is FastK_Table.ktab (the hidden files were too large to upload, this is what you were asking for?) : FastK_Table.zip
-
here is the exact script that was used: AXG.52229199.zip
And here a screenshot of the resulting files in time-reversed order (.smu still missing):
Many thanks, Gemma
I am afraid the hidden files are needed for the full k-mer table. @thegenemyers?
Here are the hidden files as a google drive link: https://drive.google.com/drive/folders/12keGxYN5jkY8t--Rksq_sZ3FrcXXnDuj?usp=sharing
Hello,
I do have the same issue with three samples of Salix polaris (467, EB30, EB31); .smu files are not written. These three sample are expected to be hexaploids.
Here are their Genomescope profiles:
My log files look like that:
Total Resources: 58:38.544u 9:01.752s 23:45.554w 284.8% 17MB
Running smudgeplot v0.3.0 oriel
Task: hetmers
Calling: PloidyPlot -oEB31.kmerpairs -e8 -T8 -v EB31.FastK_Table
The input table is untrimmed and not symmetric
Trimming k-mers in table with count < 8
Making trimmed table symmetric
Starting to count covariant pairs
Done!
Running smudgeplot v0.3.0 oriel
Task: plot
Calling: smudgeplot_plot.R -i "EB31.kmerpairs_text.smu" -o "EB31.trial_run" -col_ramp "viridis"
######################
## INPUT PROCESSING ##
######################
Error: The input file not found. Please use --help to get help
Execution halted
Done!```
Would you have any advise regarding these samples?
Hi @KamilSJaron , @gemmacol , @vincianem ,
I’m encountering the same issue. Have you been able to figure it out?
Cheers, Abdo
I did not find the solution yet and have been meaning to get back to this topic..
I am checking with Gene!
@abdo3a and @vincianem does either of you have a dataset that generates this error that is public / or not too giant to share? It would be good to have a smaller dataset to debug the problem.
hetmers step has completed, but not resulting in .smu file?
smudgeplot.py hetmers -L 12 -t 4 -o kmerpairs -tmp $TMP --verbose Fastk_TableRunning smudgeplot v0.4.0dev Task: hetmers Calling: hetmers (PloidyPlot kmer pair search) -okmerpairs -e12 -T4 -v -P/nesi/nobackup/ga02470/acanthoxyla/github/new-illumina-smudgeplots/AXG/fastk2/tmp Fastk_Table The input table is untrimmed and not symmetric Trimming k-mers in table with count < 12 Output table ./.trim.ktab already exists, continue? yes Making trimmed table symmetric Starting to count covariant pairs Done!
This outputs a file called: .symx.ktab But not the .smu file that I was expecting.
I thought maybe we don't have the coverage for the -L 12 option so tried also -L 6 and same result.
Any suggestions why I am not generating the .smu file?
Options used for prior step:
FastK -v -t4 -k21 -M16 -T4 -P$TMP -c AXG_novaseq_R*.fq -NFastk_Table
Gemma, why did you use -c when making the database? That is the homopolymer compression, not sure what would that do the k-mer pair search but it is quite possibly the source of the bug (Gene is digging into the code to figure out what exactly is going on)
Hi Kamil & Gene,
Thanks for looking into it again. Ah good question. I was following some advice from your email 21/11/24 "You might want to consider to use homopolymer compression when construcing the k-mer database, it might help to reduce errors." I could try repeating it without this option and see if it produces the same error going into smudgeplot.
-Gemma
Aha, it was me after all. I might have led you astray with this! If you want a quick fix, I would rerun the analysis without homopolymer compression and see where that gets you.
@abdo3a and @vincianem does either of you have a dataset that generates this error that is public / or not too giant to share? It would be good to have a smaller dataset to debug the problem.
Hi, Using the latest version of smudgeplot fixed the .smu issue in my case. Then, I do not know if my dataset is really useful.
Hi Kamil, removing the -c option from fastk solved it. I now have the .smu file. Thanks!
@abdo3a and @vincianem does either of you have a dataset that generates this error that is public / or not too giant to share? It would be good to have a smaller dataset to debug the problem.
Hi Kamil, thanks for checking Using the latest version of smudgeplot fixed the issue for me.
Thanks, I am glad we have a workaround before this get fixed completely!