Low performance
I love what this library does but I have a bit of an issue. I have a use-case where I need to "evaluate" the same user-defined expression in a fast loop with a few dynamic input parameters that might change on each iteration.
I have tried both the native built-in eval and the safe eval from this library with a test scenario, evaluating the same expression with a few input parameters 100000 times, and I get a 50-60% worse performance with the safe eval. I think that's a good trade-off, however, with the unsafe eval I can take a shortcut:
If I put the expression into a lambda and evaluate it that way only once, I can use the lambda to call it on each subsequent occasion with the current parameters. This is over an order of magnitude faster than re-evaluating the expression every time.
Do you think it would be possible to implement a lambda evaluator in this library that would yield similar performance gains? Do you have an approach in mind for implementing lambdas?
That is an interesting question!
And probably would be a good thing to mention performance vs native eval somewhere in the docs too.
Would https://github.com/danthedeckie/simpleeval/pull/115 be the kind of thing you're thinking of?
I will have to test the performance with the new feature but it looks like it would be a great option.
@danthedeckie I used a short test code to compare the 2 methods.
Results on 0.9.12:
2.75s call <...>/test_evaluators.py::test_safe_evaluator
0.08s call <...>/test_evaluators.py::test_lambda_evaluator
On 0.9.13 with the preparsing:
1.03s call <...>/test_evaluators.py::test_safe_evaluator
0.08s call <...>/test_evaluators.py::test_lambda_evaluator
Looks like the performance is improved a lot with preparsing but it's still over an order of magnitude slower than the lambda hack with the standard eval unfortunately.
Looks like the performance is improved a lot with preparsing but it's still over an order of magnitude slower than the lambda hack with the standard eval unfortunately.
That's really encouraging that it's that much faster! Thank you!
To make it even faster still, I have a few ideas, but nothing easy, alas.
One really complex, but potentially massive improvement would be to compile to python bytecode, rather than step through the AST every time. But that's a lot of work.
There may well be a lot of smaller wins possible too - it's not really been designed with performance in mind. It would be really good to start a performance / benchmarking suite alongside the test suite.
Yeah, and to be clear, I think this is already faster than just using eval every time, the eval solution is only this fast because of wrapping the expression in a lambda, parsing that, then only making subsequent calls to the pre-parsed function. So maybe it would be possible to have the same performance benefits through implementing lambda evaluation in this library, if that can be done without repeated calls to eval. I could post my test code later if that's helpful for setting up some performance benchmarks.
That does sound really interesting - if you could post it that would be great!
@danthedeckie
"""
run with: `pytest --durations=0 test_evaluators.py`
"""
# std
from typing import Any, Dict, Protocol, Sequence, Type
# 3rd party
from simpleeval import DEFAULT_OPERATORS, DEFAULT_NAMES, EvalWithCompoundTypes
# interface
class Evaluator(Protocol):
def __init__(self, builtins: Dict[str, Any], keys: Sequence[str], expr: str): ...
def evaluate(self, **kw) -> Any: ...
# implementations
class RegularEvaluator:
def __init__(self, builtins: Dict[str, Any], keys, expr):
self.builtins = builtins
self.expr = expr
def evaluate(self, **kw):
# pylint: disable=eval-used
return eval(self.expr, {'__builtins__': self.builtins, **kw})
class LambdaEvaluator:
def __init__(self, builtins: Dict[str, Any], keys, expr):
self.keys = keys
params = ', '.join(keys)
constructed_expr = f'lambda {params}: {expr}'
self.callable = eval(constructed_expr, {'__builtins__': builtins}) # pylint: disable=eval-used
def evaluate(self, **kw):
names = {k: v for k, v in kw.items() if k in self.keys}
return self.callable(**names)
class SafeEvaluator:
def __init__(self, builtins: Dict[str, Any], keys, expr):
self.evaluator = EvalWithCompoundTypes(
operators=DEFAULT_OPERATORS,
functions={**DEFAULT_NAMES, **builtins},
)
self.parsed = self.evaluator.parse(expr)
self.expr = expr
def evaluate(self, **kw):
self.evaluator.names = kw
return self.evaluator.eval(self.expr, previously_parsed=self.parsed)
# helper functions
def try_evaluator(cls: Type[Evaluator]):
evaluator = cls(
builtins=dict(
str=str,
result=2,
joe=lambda: 'Joe'
),
keys=[
'x',
'y',
],
expr='" ".join([str(x + y - 1 == result and joe() == "Joe" or True is None), str(False)])',
)
for _ in range(100000):
assert evaluator.evaluate(x=1, y=2) == "True False"
# tests
def test_regular_evaluator():
try_evaluator(RegularEvaluator)
def test_lambda_evaluator():
try_evaluator(LambdaEvaluator)
def test_safe_evaluator():
try_evaluator(SafeEvaluator)
Thanks @bbk-riact
Thanks @danthedeckie for implementing the separate .parse() feature. I love it! :) Pre-parsing the AST and reusing it really speeds up the processing. I've decided to use an automatic cache which remembers the last-used expression and last-used AST tree. If the if expression != cached_expression, I call .parse() with the new expression. Works wonderfully. <3
closing this because of inactivity