smudgeplot icon indicating copy to clipboard operation
smudgeplot copied to clipboard

Hexaploid interpretation and genome size inconsistency

Open Homap opened this issue 8 months ago • 1 comments

Hello,

I am sorry for posting this here as this is more a genomescope issue than smudgeplot. We know these samples are hexaploid from flow cytometry. I have two questions:

  1. How does one interpret hexaploid samples in terms of autopolyploidy vs. allopolyploidy from the histogram?
  2. An interesting observation is that all the analyses report the haploid genome size of about 450 Mb (across the tetraploid and hexaploid samples) while the genome assembly is about 780 Mb. Where does this discrepancy come from? For one of the hexaploid samples as attached, the haploid genome size is estimated to be about 230 Mb, considerably smaller than the genome assembly. I don't know how one should explain these differences.

Image

Image

I appreciate your help very much and sorry again if this post does not belong to this issue list. Thank you! Homa

Homap avatar May 19 '25 10:05 Homap

Hi Homa,

the second model does not fit the histogram, see this totorial about mecurialis: https://github.com/KamilSJaron/k-mer-approaches-for-biodiversity-genomics/wiki/Low-coverage#mercurialis

The estimated genome size is the estimate of the monoploid genome size (avoiding haploid not to mess it with haploid in sense of reduced, which can be in tetraploid species a genome with two sets anyway). So, I would expect your total genome to be 6*430. Of course, homologous regions might collapse during assembly (in fact most of the assemblers would try to collapse them as much as possible), so you are expected to end up with less.

THese collapses are related to genetic diversity, you see one large peak where you fit 1n coverage. If that's correct, then you would expect a lot more uncollapsed regions - because that would mean the genome is very very heterozygous.

Of course, the other alternative is that your model did not converge well. Perhaps a convergence problem similar to this: https://github.com/KamilSJaron/k-mer-approaches-for-biodiversity-genomics/wiki/Very-homozygous-diploid in that case the genome size would be estimated to be ~double, and heterozygosity would we rather small, you would expect an allo- origin if that model is right...

Allo- vs auto- is a long discussion. But I would say - consider all possible models that look sane given your data (where can be the 1n peak? What that would mean to heterozygosity and genome size? Any doubts about ploidy?) and then interrogate your assembly till it confidently proves one of the models.

KamilSJaron avatar May 20 '25 08:05 KamilSJaron