automat Benchmark suite.

Automat should have a benchmark suite so we can measure performance overhead.

Apr 10 '17 15:04 markrwilliams

At least one Automat user has found it to cause serious slow downs: https://github.com/meejah/txtorcon/issues/224

Apr 10 '17 15:04 markrwilliams

The thing that is slow in txtorcon is the "microdescriptor parser", which is here: https://github.com/meejah/txtorcon/blob/master/txtorcon/_microdesc_parser.py#L6

This parses a pretty simple line-based file (well, actually its streamed from the network) that has groups of (usually 4) related lines. A line starting with "r " starts a new "router" and other lines (if present) add information to it. On a decent 3GHz Xeon this takes 2+ seconds to parse a ~20k-line file in cpython. From cProfile most of the time is spent in MethodicalInput creating new onInput inner-functions. Commenting-out the "@preserveName" and "@wraps" decorators speeds up by about 5 times.

So, I think a benchmark that would simulate this would just track "inputs per second" or "state transitions per second" or similar.

BTW I tested against python3 and pypy as well, with similar differences (obviously, pypy was way faster overall).

Apr 10 '17 15:04 meejah

Okay, here's a simple test-case. I haven't worked it into an "actual benchmark", but did confirm that this shows similar ratio: ~56000 inputs/second vs ~230000 inputs/second by commenting-out the @wraps and @preserveName in MethodicalInput.

import time
import automat


class Simple(object):
    """                                                                                                                     
    """
    _m = automat.MethodicalMachine()

    @_m.input()
    def one(self, data):
        "some input data"

    @_m.state(initial=True)
    def waiting(self):
        "patiently"

    @_m.output()
    def boom(self, data):
        pass

    waiting.upon(
        one,
        enter=waiting,
        outputs=[boom],
    )

def transitions_per_second(machine, total):
    start = time.time()
    for x in range(total):
        machine.one(x)
    diff = time.time() - start
    return total / diff


print("{} transitions/s".format(transitions_per_second(Simple(), 100000)))

Apr 12 '17 21:04 meejah

What do you think of following trick in MethodicalInput.__get__:

    def __get__(self, oself, type=None):
        ...
        @preserveName(self.method)
        @wraps(self.method)
        def doInput(*args, **kwargs):
            ...

        setattr(oself, self._name(), doInput)

        return doInput

?

That is replace getter in object with its result. It makes given benchmark to run about 5 times faster.

Jul 21 '17 11:07 daa

On Jul 21, 2017, at 4:35 AM, daa [email protected] wrote:

What do you think of following trick in MethodicalInput.get:
def __get__(self, oself, type=None):
    ...
    @preserveName(self.method)
    @wraps(self.method)
    def doInput(*args, **kwargs):
        ...

    setattr(oself, self.name(), doInput)

    return doInput
?

That is replace getter in object with its result. It makes given benchmark to run about 5 times faster.

Hmm. I suspect it would make it slower on PyPy though. Have you tried there?

Jul 21 '17 15:07 glyph

I'm not sure if it's possible, but I think the best course would be completely eliminating the "create a new function + decorators on every __get__ call". I haven't looked deeply into what that would actually take, though, so feel free to ignore me ;)

In any case, I like the trick -- if only because it gets a pretty great speed-up without any deep changes :) as long as it's also faster on PyPy

Jul 21 '17 16:07 meejah

With PyPy benchmark improves even more than with CPython - I tried and got 30-50x speedup.

Jul 21 '17 16:07 daa

Recently I had an idea that MethodicalInput() could be decorator itself, so no tricks would be required to avoid continuous creation of doInput(), however it will make implementations of input(), output() and state() methods non-uniform. Your opinions?

Jul 21 '17 19:07 daa

Recently I had an idea that MethodicalInput() could be decorator itself

I don't see what benefit that would have. Can you expound? Perhaps on a dedicated ticket, since it seems totally unrelated to benchmarking?

Jul 23 '17 07:07 glyph