smudgeplot smudgeplot.py hetmers not producing kmerpairs

hetmers step has completed, but not resulting in .smu file?

smudgeplot.py hetmers -L 12 -t 4 -o kmerpairs -tmp $TMP --verbose Fastk_Table

Running smudgeplot v0.4.0dev Task: hetmers Calling: hetmers (PloidyPlot kmer pair search) -okmerpairs -e12 -T4 -v -P/nesi/nobackup/ga02470/acanthoxyla/github/new-illumina-smudgeplots/AXG/fastk2/tmp Fastk_Table

The input table is untrimmed and not symmetric

Trimming k-mers in table with count < 12 Output table ./.trim.ktab already exists, continue? yes

Making trimmed table symmetric

Starting to count covariant pairs

Done!

This outputs a file called: .symx.ktab But not the .smu file that I was expecting.

I thought maybe we don't have the coverage for the -L 12 option so tried also -L 6 and same result.

Any suggestions why I am not generating the .smu file?

Options used for prior step: FastK -v -t4 -k21 -M16 -T4 -P$TMP -c AXG_novaseq_R*.fq -NFastk_Table

Dec 03 '24 02:12 gemmacol

Hi Gemma, the expected log looks like this:

Running smudgeplot v0.4.0dev
Task: hetmers
Calling: hetmers (PloidyPlot kmer pair search)  -otest -e10 -T4 -v Fastk_Table

  The input table is untrimmed and not symmetric

  Trimming k-mers in table with count < 10

  Making trimmed table symmetric

  Starting to count covariant pairs

  Count complete, plotting

  About to save stuff

  Saving stuff

Done!

Yours is missing

  Count complete, plotting

  About to save stuff

  Saving stuff

Done!

I am not sure why, but I suspect it's something wrong with the k-mer database. If you make a k-mer histogram (that should take just seconds), does it look sane? Something like this should do the job...

Histex -G Fastk_Table > kmer.hist

Dec 04 '24 11:12 KamilSJaron

Hi Kamil,

I tried your suggestion: Histex -G Fastk_Table > kmer.hist Then plotted kmer.hist with genomescope. The model didn't fit well, but as far as I can tell it looks "normal"

linear_plot

Do you have any further suggestions what I could try, to generate the .smu file?

So far I have these intermediate files in the dir:

Fastk_Table.hist Fastk_Table.ktab

.Fastk_Table.ktab.2 .Fastk_Table.ktab.4 .Fastk_Table.ktab.1 .Fastk_Table.ktab.3

..symx.ktab.1 .symx.ktab

Dec 06 '24 01:12 gemmacol

@gemmacol That's puzzing. With Gene we can't think of a good reason this could/should happen. Did you by any chance have anything streamed to the stderr? @thegenemyers thinks there should be something, it should not crush silently.

Would you be able to upload for us the .ktab as well as all the .Fastk_Table.ktab.? We are unable to figure out what exactly is wrong with this with the information we have...

Dec 06 '24 10:12 KamilSJaron

I have re-created the issue this time with a single slurm script and here provide the two files you asked for as well as the script and the log files gathered after re-running it:

This is the .symx.ktab : symx.zip
This is FastK_Table.ktab (the hidden files were too large to upload, this is what you were asking for?) : FastK_Table.zip
here is the exact script that was used: AXG.52229199.zip

And here a screenshot of the resulting files in time-reversed order (.smu still missing):

Many thanks, Gemma

Dec 08 '24 23:12 gemmacol

I am afraid the hidden files are needed for the full k-mer table. @thegenemyers?

Dec 09 '24 15:12 KamilSJaron

Here are the hidden files as a google drive link: https://drive.google.com/drive/folders/12keGxYN5jkY8t--Rksq_sZ3FrcXXnDuj?usp=sharing

Dec 09 '24 21:12 gemmacol

Hello,

I do have the same issue with three samples of Salix polaris (467, EB30, EB31); .smu files are not written. These three sample are expected to be hexaploids.

Here are their Genomescope profiles: 467_transformed_linear_plot EB30_transformed_linear_plot EB31_transformed_linear_plot

My log files look like that:

Total Resources:  58:38.544u  9:01.752s  23:45.554w  284.8%  17MB
Running smudgeplot v0.3.0 oriel
Task: hetmers
Calling: PloidyPlot  -oEB31.kmerpairs -e8 -T8 -v EB31.FastK_Table

  The input table is untrimmed and not symmetric

  Trimming k-mers in table with count < 8

  Making trimmed table symmetric

  Starting to count covariant pairs

Done!
Running smudgeplot v0.3.0 oriel
Task: plot
Calling: smudgeplot_plot.R -i "EB31.kmerpairs_text.smu" -o "EB31.trial_run" -col_ramp "viridis"

######################
## INPUT PROCESSING ##
######################
Error: The input file not found. Please use --help to get help
Execution halted

Done!``` 

Would you have any advise regarding these samples?

Dec 12 '24 15:12 vincianem

Hi @KamilSJaron , @gemmacol , @vincianem ,

I’m encountering the same issue. Have you been able to figure it out?

Cheers, Abdo

Jan 16 '25 20:01 abdo3a

I did not find the solution yet and have been meaning to get back to this topic..

Jan 29 '25 06:01 gemmacol

I am checking with Gene!

Jan 29 '25 18:01 KamilSJaron

@abdo3a and @vincianem does either of you have a dataset that generates this error that is public / or not too giant to share? It would be good to have a smaller dataset to debug the problem.

Feb 05 '25 15:02 KamilSJaron

hetmers step has completed, but not resulting in .smu file?

smudgeplot.py hetmers -L 12 -t 4 -o kmerpairs -tmp $TMP --verbose Fastk_Table

Running smudgeplot v0.4.0dev Task: hetmers Calling: hetmers (PloidyPlot kmer pair search) -okmerpairs -e12 -T4 -v -P/nesi/nobackup/ga02470/acanthoxyla/github/new-illumina-smudgeplots/AXG/fastk2/tmp Fastk_Table The input table is untrimmed and not symmetric Trimming k-mers in table with count < 12 Output table ./.trim.ktab already exists, continue? yes Making trimmed table symmetric Starting to count covariant pairs Done!

This outputs a file called: .symx.ktab But not the .smu file that I was expecting.

I thought maybe we don't have the coverage for the -L 12 option so tried also -L 6 and same result.

Any suggestions why I am not generating the .smu file?

Options used for prior step: FastK -v -t4 -k21 -M16 -T4 -P$TMP -c AXG_novaseq_R*.fq -NFastk_Table

Gemma, why did you use -c when making the database? That is the homopolymer compression, not sure what would that do the k-mer pair search but it is quite possibly the source of the bug (Gene is digging into the code to figure out what exactly is going on)

Feb 05 '25 21:02 KamilSJaron

Hi Kamil & Gene,

Thanks for looking into it again. Ah good question. I was following some advice from your email 21/11/24 "You might want to consider to use homopolymer compression when construcing the k-mer database, it might help to reduce errors." I could try repeating it without this option and see if it produces the same error going into smudgeplot.

-Gemma

Feb 06 '25 06:02 gemmacol

Aha, it was me after all. I might have led you astray with this! If you want a quick fix, I would rerun the analysis without homopolymer compression and see where that gets you.

Feb 06 '25 08:02 KamilSJaron

@abdo3a and @vincianem does either of you have a dataset that generates this error that is public / or not too giant to share? It would be good to have a smaller dataset to debug the problem.

Hi, Using the latest version of smudgeplot fixed the .smu issue in my case. Then, I do not know if my dataset is really useful.

Feb 06 '25 16:02 vincianem

Hi Kamil, removing the -c option from fastk solved it. I now have the .smu file. Thanks!

Feb 10 '25 07:02 gemmacol

@abdo3a and @vincianem does either of you have a dataset that generates this error that is public / or not too giant to share? It would be good to have a smaller dataset to debug the problem.

Hi Kamil, thanks for checking Using the latest version of smudgeplot fixed the issue for me.

Feb 10 '25 12:02 abdo3a

Thanks, I am glad we have a workaround before this get fixed completely!

Feb 11 '25 23:02 KamilSJaron

smudgeplot.py hetmers not producing kmerpairs_text.smu (.trim.ktab only)