smudgeplot icon indicating copy to clipboard operation
smudgeplot copied to clipboard

I am not able to interpret the results

Open H1889 opened this issue 7 months ago • 5 comments

Hi, thanks for a so nice tool. I have sequenced several single-cell samples of a protist species.

I have troubles understanding my smudgeplot. I have used follwing command to generate it FastK -v -t4 -k40 -M16 -T16 78bam_[12].fastq.gz -N78 smudgeplot.py hetmers -L 12 -t 8 -o 78 --verbose 78.ktab smudgeplot.py all -o 78_SMUDGE 78_text.smu I am running version 0.4.0

and it look like this:

Image

and de genomescope 2 histogram looks like:

Image

For instance, another sample:

Image

Image

The other samples look very similar.

Only two of the samples look more or less well:

Image

Image

and this one:

Image

Image

How should I understand my smudgeplots?

thank you very much

H1889 avatar Jun 26 '25 10:06 H1889

Hi, sequencing protists is hard!

If you look at the examples here https://pubmed.ncbi.nlm.nih.gov/39890468/ we show how "nice" and "messy" spectra look like. In the first two cases, did you not sequence your target well. There might be sequences from your target, but definitely not sequenced nicely on its own, with protists the most likely reason will be contamination that will very likely be overtaking the runs...

THe later two look a bit better, but they are still rather messy. The 3 shows two peaks and to me it's rather unclear which of them is your protist (they can't both be - they would be following the rules of stochiometry, see that paper for more details). I suspect the smudgeplot would not show you much if the right bump is a bacteria (more likely given the genome size) and the left one is your target (you filtered out everything with coverage <12x, so that means you filtered most of your target before making the smudgeplots).

Good news for you is that ploidy is usually not an issue for protists. (as long as it is not an amoeba or something). Even when they are polyploid, they usually have small enough genomes to assemble nicely. I would just try to stich the genomes together and run it thought blobtool kit to see what you have actually done. I would recommend to start with those that show peaks first, because for those you have at least a bit of an expecation for coverage and genome size.

HOpe this helps.

KamilSJaron avatar Jun 27 '25 09:06 KamilSJaron

Thank you very, very much for your comments, I will read that paper, it looks like it could be quite useful. I will do you said to me. Anyway, could be the protist genome haploid (or monoploid)?, there is only one peak and a very low heterozigosity, ranging from 0.0008 to 0.01. What is your oppinion? all the Best

H1889 avatar Jun 27 '25 10:06 H1889

Did you sequence a culture? It might be inbred anyway - meaning, the ploidy does not matter, you would see only one haplotype anyway.

KamilSJaron avatar Jun 27 '25 13:06 KamilSJaron

The samples are single-cell Greetings

H1889 avatar Jun 27 '25 16:06 H1889

What do you mean by single cell? You mean literally single cell amplified and sequenced? I am not aware of any technology that would allow us to meaningfully do that... How did you do it?!

Or do you mean that you had a cell suspension and you had individual cells tagged for sequencing? That I am sure would work, but I have never seen any data like this, so would be quite curious to hear about it...

KamilSJaron avatar Jun 27 '25 20:06 KamilSJaron