Simulation Model-specific extensions
Description
For many of simulation models with a large userbase, we could think about organizing a set of SmartRedis extensions that aggregates the operations that are both necessary and common for the manipulation of data into and out of the model. This is perhaps a slightly more model-by-model approach to the more general #225 solution proposed by @spartee, i.e. we could leave aggregate as an abstract method intended to populated by the extension of Client.
This might be a more realistic approach than trying to derive a general solution. Instead, model developers could implement these methods from their better-informed point of view.
Justification
Users of the simulation model who will need to perform common tasks that might need model-specific solutions.
Implementation Strategy
This is more of a high-level design question whose solution should answer the following
- [ ] How do we provide clear direction for model developers/power users to naturally extend the functionality of existing SmartRedis objects
- [ ] How do we functionally organize these extensions? Do they live in their own repo, within this repo, or should they 'live' within the numerical mode code/analysis packages
- [ ] How do we delineate between needs that are common across a wide swath of models that should be part of the SmartRedis base classes versus model extensions
- [ ] How do we coordinate testing and development for the widely-used extensions
I like the idea of having model by model extensions, but I have a few worries
- How would they be packaged and tested? (similar to two of your points)
How do we functionally organize these extensions? Do they live in their own repo, within this repo, or should they 'live' within the numerical mode code/analysis packages How do we coordinate testing and development for the widely-used extensions
Like say for instance we made an extension for SmartRedis in MOM6 or NEMO, would we have to find some way to add those models to a CI? Would they live in the MOM6 source code? I would like to keep what we can in a place where we have nightly integration and unit tests that don't require building a simulation code.
We have a similar deal with the smartsim-lammps repo where we implemented a "dump" (I/O) style with SmartRedis. We've been wanting to contibute that back upstream for a while now, but we wanted to make sure we had the ability to test it regularly. One thought was to build it into a container for ease of testings.
Im already thinking that we could use the extras_require field for extensions for some of the less model specific extensions (i.e. pip install smartredis[xarray] for #224 )
- What languages?
Would there be a need to implement such features in languages other than Python? I was imagining that this feature would live in the Python side as thats where we expect users to do analysis. Would there need to be helpers in the compiled languages as well?
I tend to think that the work in #225 should be general enough to be a base to implement some of these more "tightly integrated" solutions. The biggest reason for this being that we can write performant code to pull the data with multiprocessing or possibly even MPI. I don't want to expect users to be able to write performant multiprocessing extensions that use shared memory or locking primitives on shared data structures.
It's my thought that the aggregate method could be a useful starting point where we abstract away some of the performance details.
How do we provide clear direction for model developers/power users to naturally extend the functionality of existing SmartRedis objects
Another reason I think a base method to build on is important. We can provide documentation and examples for this method where we utilize it to create something simulation/workload specific.
Like say for instance we made an extension for SmartRedis in MOM6 or NEMO, would we have to find some way to add those models to a CI? Would they live in the MOM6 source code? I would like to keep what we can in a place where we have nightly integration and unit tests that don't require building a simulation code.
100% agree that this is a tricky case since the CI strategy might be different for projects which are distributed across a number of centers. If we held the client extensions within CrayLabs vs the model code, we could probably get away with a simple unit test by storing the grid for a given model and making sure we can always reconstruct the grid. Since the definition of the grid is such a fundamental part of most models it doesn't change THAT often, so we'd just have to keep track of when they might do a major refactor.
Would there be a need to implement such features in languages other than Python? I was imagining that this feature would live in the Python side as thats where we expect users to do analysis. Would there need to be helpers in the compiled languages as well? I definitely think there will be a need for that, but in that case, I think the onus should be on the model side since that will be a more rapidly iterating part of the code. Similar to your argument for being performant on the Python/Analysis side of things, model developers are better positioned to make such extensions performant on their end.
It's my thought that the aggregate method could be a useful starting point where we abstract away some of the performance details.
I agree. I think that base method could just return a list/dictionary of all the requested objects that might have been posted for a given subset of clients (be it decomposition within a single simulation, across the entire ensemble. The model-side dev would be responsible for constructing something useful from that, but I agree that we're best situated to make that gather performant on our side. We already know that doing the naive thing of looping over all the possible key prefixes is VERY slow and I think we can exercise some basic functionality to help parallelize the gather.