Demography.from_demes(): ancestral population sizes and migration rates missing
Hi,
I'm having issues converting my Demes graphs to msprime Demography objects; both population sizes and migration rates are missing unexpectedly. My Demes graphs match what I think I'm trying to specify, so it looks to me like a bug in msprime.Demography.from_demes() rather than in Demes.
I first encountered this problem with msprime 1.2.0, and upgraded to 1.3.0 in a fresh environment, with the exact same behaviour. The versions of key packages for this example are:
demes 0.2.3 pyhd8ed1ab_0 conda-forge
demesdraw 0.4.0 pyhd8ed1ab_0 conda-forge
msprime 1.3.0 py312h0767fef_0 conda-forge
python 3.12.1 hdf0ec26_1_cpython conda-forge
tskit 0.5.6 py312hf635c46_0 conda-forge
Missing population sizes
I've specified a Demes graph with three current demes (a, b, c) and their ancestors (ab_anc and abc_anc):
b = demes.Builder(time_units="generations")
b.add_deme("abc_anc", epochs=[{"start_size":1e6, "end_time":1e5}])
b.add_deme("ab_anc", ancestors=["abc_anc"], epochs=[{"start_size":1e6, "end_time":1e4}])
b.add_deme("a", ancestors=["ab_anc"], epochs=[{"start_size":1e6}])
b.add_deme("b", ancestors=["ab_anc"], epochs=[{"start_size":1e6}])
b.add_deme("c", ancestors=["abc_anc"], epochs=[{"start_size":1e6}])
graph = b.resolve()
The graph looks like what I was trying to create:
And indeed population sizes are what we specified:
[(pop_idx, pop.epochs[0].start_size) for pop_idx, pop in enumerate(graph.demes)]
[(0, 1000000.0),
(1, 1000000.0),
(2, 1000000.0),
(3, 1000000.0),
(4, 1000000.0)]
But when converting to an msprime Demography, population sizes are not preserved for the ancestral populations:
msprime.Demography.from_demes(graph)
This is not just a table formatting error either:
[Population(initial_size=0, growth_rate=0, name='abc_anc', description='', extra_metadata={}, default_sampling_time=100000.0, initially_active=False, id=0),
Population(initial_size=0, growth_rate=0, name='ab_anc', description='', extra_metadata={}, default_sampling_time=10000.0, initially_active=False, id=1),
Population(initial_size=1000000.0, growth_rate=0, name='a', description='', extra_metadata={}, default_sampling_time=0.0, initially_active=True, id=2),
Population(initial_size=1000000.0, growth_rate=0, name='b', description='', extra_metadata={}, default_sampling_time=0.0, initially_active=True, id=3),
Population(initial_size=1000000.0, growth_rate=0, name='c', description='', extra_metadata={}, default_sampling_time=0.0, initially_active=True, id=4)]
I tried also specifying "end_size"...
b = demes.Builder(time_units="generations")
b.add_deme("abc_anc", epochs=[{"start_size":1e6, "end_time":1e5}])
b.add_deme("ab_anc", ancestors=["abc_anc"], epochs=[{"start_size":1e6, "end_time":1e4}])
b.add_deme("a", ancestors=["ab_anc"], epochs=[{"start_size":1e6, "end_size":1e6}])
b.add_deme("b", ancestors=["ab_anc"], epochs=[{"start_size":1e6, "end_size":1e6}])
b.add_deme("c", ancestors=["abc_anc"], epochs=[{"start_size":1e6, "end_size":1e6}])
graph2 = b.resolve()
...but that made no difference.
Missing migration rates Adding some migration to the demes specification:
b = demes.Builder(time_units="generations")
b.add_deme("abc_anc", epochs=[{"start_size":1e6, "end_time":1e5}])
b.add_deme("ab_anc", ancestors=["abc_anc"], epochs=[{"start_size":1e6, "end_time":1e4}])
b.add_deme("a", ancestors=["ab_anc"], epochs=[{"start_size":1e6}])
b.add_deme("b", ancestors=["ab_anc"], epochs=[{"start_size":1e6}])
b.add_deme("c", ancestors=["abc_anc"], epochs=[{"start_size":1e6}])
b.add_migration(rate=1e-5, source="a", dest="b")
b.add_migration(rate=1e-5, source="b", dest="a")
b.add_migration(rate=1e-5, source="c", dest="ab_anc")
graph3 = b.resolve()
demesdraw.tubes(graph3)
Again, demes builds the graph that I intended to specify:
But the c -> ab_anc migration rate is missing:
Hi @simonharnqvist -- thanks for opening the issue. I believe everything is working as expected, and those missing population sizes and migration rates will appear as the populations come into existence.
If we specify demog = msprime.Demography.from_demes(graph3), and then look at demog.events, we should see
[PopulationParametersChange(time=10000.0, initial_size=1000000.0, growth_rate=None, population='ab_anc'),
PopulationSplit(time=10000.0, derived=['a', 'b'], ancestral='ab_anc'),
MigrationRateChange(time=10000.0, rate=0, source='b', dest='a'),
MigrationRateChange(time=10000.0, rate=0, source='a', dest='b'),
MigrationRateChange(time=10000.0, rate=1e-05, source='ab_anc', dest='c'),
PopulationParametersChange(time=100000.0, initial_size=1000000.0, growth_rate=None, population='abc_anc'),
PopulationSplit(time=100000.0, derived=['c', 'ab_anc'], ancestral='abc_anc'),
MigrationRateChange(time=100000.0, rate=0, source='ab_anc', dest='c')]
For example, population ab_anc is initialized at time zero with population size 0, since it won't exist until the split event involving populations a and b. At that time (10000), the population size is set the split event takes place. The appropriate migration rates are also turned on and off in the next few entries in the demog.events list (noting that the meaning of "source" and "dest" are reversed, since msprime looks backwards in time).
Hope this helps explain the behavior, but please let us know if there seem to still be issues or something else could be clarified.
I think @apragsdale is correct here, and it's just a quirk of how msprime represents these models. The 'debug()' method should resolve these problems - hopefully this section of the docs will clarify: https://tskit.dev/msprime/docs/stable/demography.html#debugging-tools
Does that help?
Thanks @apragsdale and @jeromekelleher for getting back to me so quickly. You are (of course) right that the population sizes aren't lost (and clearly I could have checked for that very easily - thanks for your patience).
Unless there is a logic to why models specified directly as a Demography object are displayed differently to those generated from Demes, can I suggest that the representations are standardised at some point to both show the full Demography table with all the parameter values? I find it very helpful to have the whole model represented, as it is when creating an msprime-native Demography.
That's an enhancement suggestion, however - so I'll close this bug report now. Thanks both!
What was your resolution here @simonharnqvist - does the debug() output show what you expect? I'm not sure there's a difference between demes and non-demes models, I think it's probably just how msprime does things generally. Maybe this is something we need to be clearer about in the docs?
@jeromekelleher - both .events and .debug() shows that msprime indeed creates the model I want
I'm not sure I've understood your point correctly here - this is not the behaviour if I specify a model directly in msprime. Let's create the first model without migration directly as an msprime Demography:
msprime_demography = msprime.Demography()
msprime_demography.add_population(initial_size=1e6, name="abc_anc")
msprime_demography.add_population(initial_size=1e6, name="ab_anc")
msprime_demography.add_population(initial_size=1e6, name="a")
msprime_demography.add_population(initial_size=1e6, name="b")
msprime_demography.add_population(initial_size=1e6, name="c")
msprime_demography.add_population_split(time=1e6, ancestral="abc_anc", derived=["ab_anc", "c"])
msprime_demography.add_population_split(time=1e5, ancestral="ab_anc", derived=["a", "b"])
msprime_demography.sort_events()
Here the representation of the Demography object includes all the population sizes (which IMO is far less confusing):
If I convert this to a Demes graph and then back to a Demography...
demesgraph = msprime_demography.to_demes()
msprime.Demography.from_demes(demesgraph)
... the representation now loses its population sizes:
If this is intended or expected behaviour, then explaning this in the docs would be very helpful - and perhaps clarifying to users whether this would have any effects on downstream simulations.
Ah thanks - that is confusing! I suspect there may be some slight semantic differences here in the Demes model vs msprime. Let's keep this open to see if we can do any better.