pyvsc Cache randomization calls

Adds a cache of the objects generated by a do_randomize/randomize calls (VariableBoundVisitor, RandInfo, Randomizer, field_model_l, constraint_l, Boolector, and Boolector expressions) keyed on a string consisting of: randstate obj ID, PrettyPrintModel output with values included, constraint names with enable bit, and variable names with rand_mode.

Skips caching on calls using ModelGenerator or distribution constraints. ModelGenerator b/c every call is a new soft constraint, or at least that's what I saw in testing. Distribution constraints b/c it's expensive to generate the bounds, array constraints, and distribution constraints for every call just to decide not to cache.

Uses copy.deepcopy on field_model_l and constraint_l to keep constraint generation and expansion and Boolector objects independent. Everything is deepcopied except the val object inside FieldScalarModel, so that value lookups with the original rand_obj work, and the var Boolector object since that cannot be deepcopied or pickled. The do_randomize call also does some manipulation on FieldArrayModel to add a variable, latest_field_l, and a function, get_field_l, to help in returning the field_l. There are references to parents in variables which I think causes deepcopy to go much further than intended to the point of copying the entire parent rand_obj. Not sure if deepcopy is the way to go, but it was an easy way to fork the function and stop state from getting shared or overwritten.

Also adds a test for pre_randomize to exercise disabling/enabling constraints and make sure the caching doesn't mix them up.

Results in a modest speed up on test_perf.py (taking the 3rd run to avoid import overhead): master - 17.0s rand_cache - 8.0s

On test_dist.py, which should be skipping caching, although still needs the PrettyPrintModel call to do so. I've seen these vary by about 1s, so I think the overhead for the pretty print hopefully is small enough that it works for now. A faster way to explore constraints would hopefully make that magnitudes faster. master - 15.7 rand_cache - 14.9s

Feb 16 '23 00:02 alwilson

@alwilson, thanks for the all the work you've done on this! This looks like a sizable body of work, and it's going to take me a little while to do a review and get my head around it. While I'm doing so, could you collect the set of open issues that you're aware of (I saw a few 'TODO' notes in the diff)? It might also be interesting to the cache-enabled PyVSC out with the riscv-dv project to get another data point on improvements. We will need to determine whether to have caching on by default. To this end, can you collect timing for running the test suite with cache eanbled and with cache disabled?

Thanks again for your work towards improving PyVSC's performance!

Feb 16 '23 01:02 mballance

@mballance The riscv-dv pygen tests are looking like a good exercise of these changes! The FieldArrayModel hack I had to copy references to field_l still seems to be broken and I have more changes that seem to resolve it. The tests I tried also explode in memory usage and get slower as memory grows. It looks like their testbench deepcopies rand_obj in places, which I think is multiplying the randomization caches which already deepcopy too much. So I think there's some deepcopy work that needs to be done to prevent cases like this from happening. Also only caching after the 2nd or so call would help on memory usage for one-off calls never seen again.

I can summarize those TODO / HACK comments in another comment.

Feb 16 '23 17:02 alwilson

I tried out riscv-dv's pygen / pyflow but I've only gotten one of the tests to run successfully. I tried it on master as well and only got that first test mentioned in the README to work. I knocked down the number of instances and instructions to run a bit faster too, but the variance between successive runs is pretty wide. Hard to measure any improvement. I did add some debug statements and saw that only some randomization calls could be cached. I'll keep looking into it, but I bet it's the dist constraints and with constraints preventing reuse. python3 run.py --test=riscv_arithmetic_basic_test --simulator=pyflow --steps=gen

Do you know what state riscv-dv tests are in? I'm wondering if I picked the wrong tests or just happened to test the tree at a bad time. The failures I saw were solver failure issues that I didn't look too far into.

After this latest commit I've cleaned up more of the TODO / HACK comments since I was so liberal with them. I've listed them below along with their file/function. Most of the concerns are around deepcopy and syncing the FieldScalarModel and FieldArrayModel made in the deepcopy with the original references so that retreiving model values work, but making it so that we can also roll back those changes on the next randomize call. I also have some concerns about abusing the PrettyPrintModel to get a hashable string for caching. Seems like there should be a faster way to do so, and one that doesn't depend on a human-readable string that might change in the future depending on debug needs. Some scattered notes on the actual randomize function since there seems to be unused features that would make caching more complicated, but are disabled for now? And possibly unused self.btor references.

field_scalar_model.py:
def __deepcopy__(self, memo):
    # TODO This is a workaround for the deepcopy in do_randomize somewhere getting
    #      access to Boolector objects, which in turn can't be deepcopied.
    result.var = None

randomizer.py:
class Randomizer(RandIF):
    def __init__(self, randstate, debug=0, lint=0, solve_fail_debug=0, 
        # TODO Reset btor cache after so many uses to circumvent Boolector instance
        #      slowing down as more expressions permanently become part of the model/formula.
        self.btor_cache_uses = 100
        
    def randomize(self, ri : RandInfo, bound_m : Dict[FieldModel,VariableBoundModel], cache_enabled: bool):
        # TODO What is going on with max_fields? It would probably
        #      break this caching setup.
        #max_fields = 20

                btor = Boolector()
                # TODO Is self.btor used anywhere?
                # self.btor = btor

                    # TODO: Is this part of a disabled feature to solve randsets together?
                    # rs_i += 1
                    if n_fields > max_fields or rs.order != -1:
                        break

                # TODO Is self.btor used anywhere?
                # self.btor = btor

                # TODO Does some of this need to be done while caching, too?
                if not cache_enabled:
                    reset_v = DynamicExprResetVisitor()
                    for f in rs.all_fields():
                        f.set_used_rand(False, 0)
                        f.dispose() # Get rid of the solver var, since we're done with it
                        f.accept(reset_v)
                    # for f in rs.nontarget_field_s:
                    #     f.dispose()
                    for c in rs.constraints():
                        c.accept(reset_v)
                    RandSetDisposeVisitor().dispose(rs)

do_randomize:
            # HACK Clear out field_l in FieldArrayModel from previous cache
            for fm in field_model_l:
                if hasattr(fm, 'field_l'):
                    for f in fm.field_l:
                        if hasattr(f, 'field_l') and hasattr(f, 'old_field_l'):
                            # Revert to original value
                            f.field_l = f.old_field_l
                        elif hasattr(f, 'field_l'):
                            # Save off old, original value
                            f.old_field_l = f.field_l

            # Create a unique string for this call based on object ids and mode bits
            # TODO Is there more than rand_mode and constraint_mode to cover here?
            # TODO Can we cache the base constraints so that with constraints have a prebuilt
            #      model and such to build off of?
            call_hash = Randomizer.get_pretty_call_hash(randstate, field_model_l_og, constraint_l_og)

            # Skip dist constraints b/c they cost building bounds and array/dist constraints first
            # HACK What's the best way to detect if there are dist constraints?
            if ' dist { ' in call_hash:
                cache_enabled = False

            # Make copy of field and constraint models, together to keep FieldScalarModels the same
            # TODO The deepcopy() in FieldScalarModel keeps the val reference, is that the best way?
            if cache_enabled:
                (field_model_l, constraint_l) = copy.deepcopy((field_model_l, constraint_l))

        # HACK Fill out field_l in FieldArrayModels so that array lookups work in model
        if cache_enabled:
           field_model_l = Randomizer.randomize_cache[call_hash].field_model_l
           for fm_new, fm_og in zip(field_model_l, field_model_l_og):
               if hasattr(fm_og, 'field_l'):
                   for f_new, f_og in zip(fm_new.field_l, fm_og.field_l):
                       if hasattr(f_og, 'field_l'):
                           f_og.field_l = f_new.field_l


def get_pretty_call_hash(randstate, field_model_l, constraint_l):
            # TODO This pretty print is an expensive call. Need a better way
            #      to construct a unique ID/hash that doesn't depend on
            #      object lifetimes. Can some of this be cached?
            call_hash += ModelPrettyPrinter.print(fm, print_values=True)

Feb 17 '23 20:02 alwilson

Yeah, the runtimes for at least that one riscv-dv test I'm running is all over the place. Hard to compare even several runtimes.

riscv_arithmetic_basic_test

1000 instructions - 1 iteration - runtime (s)
master     avg: 24.6 - 18.6, 26.1, 23.1, 30.5
rand_cache avg: 21.1 -  9.7, 31.2, 13.0, 30.5

3000 instructions - 1 iteration - runtime (s)
master     avg: 61.8 - 97.5, 22.0, 111.4, 16.1
rand_cache avg: 56.2 - 95.9, 62.8,  21.0, 45.2

Feb 17 '23 20:02 alwilson