Allow getting cells, surfaces, materials by ID in constant-ish time
Description
This problem came about because I am making an openmc model wrapper that wraps an existing model allows easily adding things to it and then running. This relies a lot on loading a lot of surface/cell/material numbers from a yaml file and then creating a lot of pointers to the corresponding objects in the wrapper to allow the model to be quickly modified. Initially this was making hundreds of calls to model.geometry.get_all_cells() and was very slow. So I changed to making this call once and then using that dict repeatedly. However I don't like this solution much due to mutation.
model = openmc.model.Model.from_xml(...)
cells = model.geometry.get_all_cells()
cell = openmc.Cell(1000)
model.geometry.root_universe.add_cell(cell)
cells[1000]
This should lead to a KeyError (haven't verified yet but assuming so).
I dealt with problem for MontePy, and the problem you run into is that cell numbers can mutate outside of the dictionaries control and leads to various mutation issues.
We solved this with our own custom collections that have an internal dictionary cache that isn't trusted
The algorithm used to find an object by number/id quickly is to have an internal __num_cache.
- As items are added they are added to the cache:
def append(self, item):
self._items.append(item)
self.__num_cache[item.id] = item
- When an item is requested the item from the dictionary with that id. A
KeyErroris treated as a cache miss. - The
cell.idis verified as being theidrequested. If it doesn't match this is treated as a cache miss - In case of a cache miss an O(N) search is done over all items, and the entire cache is updated along the way.
- The cache is updated anytime there is reason to iterate over the whole collections.
proprosal
- Add a new wrapper collection for all numbered objects
- introduce to it a
__getitem__andgetmethod with the above caching system - Keep the
get_all_cellsmethods
Alternatives
Status quo, with user caching misses and paranoia.
In my implementation right after full initialization I make sure to del cells because I know it cannot be trusted anymore.
Compatibility
This would change some data types under the hood which could break some more fancy user scripts.
Further design work is likely needed to determine the full extent of the scope of the possible API changes.
@pshriwise this is what I was thinking of for statepoint.tallies in #2671.