Refactor rng state so that simulation classes maintain internal rng
As discussed in https://github.com/quantumlib/Stim/pull/346 and https://github.com/quantumlib/Stim/pull/352, we want to have simulation and sampling classes own their rng. This impacts FrameSimulator and TableauSimulator. DemSampler already maintains its own rng. Fix for TableauSimulator will be in https://github.com/quantumlib/Stim/pull/346 and fix for FrameSimulator will be in the PR for https://github.com/quantumlib/Stim/issues/306.
I think there is a non trivial decision to make here. Do we want our pseudorandom number generators generated via mersenne_twister_engine( result_type value ); or mersenne_twister_engine( Sseq& s );?
Seeding with result_type values is easier to implement because we are just passing around a 64 bit value, and can initialize our std::mt19937_64 during simulation constructors.
Currently, we are using Sseq& which adds entropy to the seeds themselves. If we want to keep doing this and have the simulation classes own their RNG, then we are going to need to do either:
-
pass around
std::mt19937_64objects directly. I don't think this is ideal, since they are implemented like this:struct { result_type _M_x[state_size]; size_t _M_p;}. This means that every time we need to pass astd::mt19937_64we need to do a 2.5kB copy which isn't ideal. -
Pass around references to
std::mt19937_64objects. This is what we currently do. -
Pass around a reference to a
std::seed_seq(or maintain one globally :sweat_smile:). This feels the same as 3.
On a more philosophical level, i'm not really convinced we need to use std::seed_seq for scientific applications. Without coupling with an external information source, there is no way to distinguish between a good pseudorandom number generator and a real entropy source. Since we are doing physical simulations, we don't really care if someone is able to "crack" our generator. At least, that is what I have been assuming for the past few years :sweat_smile: :sweat_smile: :sweat_smile:
We can use std::unique_ptr if we need to pass the generator around a lot and want to have cheap move semantics. In general I would expect the generator to be created with the simulator, so there wouldn't be much passing around.
Keep in mind most of these simulators allocate a simd_bit_table and the minimum size of that thing is 256x256 bits = 8 KiB. A 2.5KiB copy at initialization is no big deal. We can try to arrange things so that the compiler is able to figure out it can do construction in-place with no copying, but even if it doesn't recognize that it's a negligible cost compared to the actual simulations. Like, taking a million shots from a distance 21 surface code run for 21 rounds will produce a terabyte of data. We don't have to care about an extra kilobyte of copying during setup in that context.
yes, that makes sense. I was overthinking the amount of copying required. Easy to have it so that the Constructors copy and all associated static functions take references like this: https://github.com/quantumlib/Stim/pull/354
removing rng sharing for TableauSimulator breaks TableauSimulator.correlated_error test, but everything else is fine!
hmm, not exactly sure what would be best approach for handling methods like PauliString::random, other stabilizer randomization methods and the corresponding python classes. Python classes are already initializing a new rng every time the randomization method is called, so if we just initialized a std::mt19937_64 on the stack inside methods like PauliString::random, it wouldn't really have much impact other than in the stim tests. There doesn't appear to be any places where simulator classes are calling PauliString::random where they might want to pass in their own rng.
One-shot methods should still take a reference. When calling them from python, use an externally seeded RNG.
Anyway, @viathor has dibs on this, since he got the ball rolling with exposing rng seed for the TableauSimulator python object. Thanks for all the discussion about it! Very informative