mathics-core Expressions with undefined variables are slow

Expressions with undefined variables are 16x slower than the equivalents defined variables.

In[1]:= x=1
Out[1]= 1

In[2]:= Print[Timing[x+1+1.4+2]]; Print[Timing[y+1+1.4+2]]; Print[Timing[1+1.4+2+x]]; Print[Timing[1+1.4+2+y]]
{0.000742186, 5.4}
{0.0123144, 4.4 + y}
{0.000772454, 5.4}
{0.0121675, 4.4 + y}
Out[2]= None

One way to fix it is to have a hash table (python dict) with the numeric variables (in the future we can expand it to have a hash table for each type of value).

If the variable isn't there and we are expecting a numeric value we stop searching for the variable in other hash tables (currently we just have the huge definitions).

To we have an order of magnitude of the performance improvement here is the time with a hardcoded y:

In[1]:= Timing[1+1.4+2+y]
Out[1]= {0.00349792, 4.4 + y}

And by hardcoded I mean:

- items = items.numerify(evaluation).get_sequence()
+ items = [Integer(1), Real(1.4), Integer(2), Symbol("y")]

The hardcoded sum is almost 3.5x (71%) faster.

Sep 27 '21 10:09 TiagoCavalcante

Wow! Yes, this is something we really need to fix sooner than later.

Sep 27 '21 12:09 rocky

I have been thinking about this and for a good improvement there should be added 2 more hash tables: the numbers one and a hash table for only delayed rules (:=).

Sep 27 '21 15:09 TiagoCavalcante

I'm in favor of trying something simple in code in a branch as a feasibility study to get a more detailed picture of how this would work and have something more concrete to discuss.

I think of this sort of like how an artist or cartoonist might do a rough sketch before working on the finished painting or cartoon.

Sep 27 '21 16:09 rocky

@TiagoCavalcante , do you have a clue about why it takes so much time to add undefined symbols? I think that before adding more caching and hash-tables, we need to understand better why the evaluator has this behavior.

Sep 28 '21 11:09 mmatera

@mmatera I think it is because it needs to check all the definitions, one way to verify this is evaluate that without adding builtins.

Sep 28 '21 11:09 TiagoCavalcante

@mmatera I think it is because it needs to check all the definitions, one way to verify this is evaluate that without adding builtins.

Thanks @TiagoCavalcante ! I would like to see us do more of this kind of activity where we think of ways to verify hypotheses before spending a lot of time implementing them. This kind of test I think is easy to do.

Sep 28 '21 11:09 rocky

@rocky @mmatera just including the folders numbers and arithfns to include as @rocky suggested gave an improvement of only 4%, while removing arithfns too, gave an improvement of 96%, so there's a lot of slowness there.

Sep 29 '21 13:09 TiagoCavalcante

@TiagoCavalcante , could you check how much the size of the dictionaries in the definitions object changes when you add/remove arithfns?

Sep 29 '21 13:09 mmatera

@mmatera 1188 all modules. Just arithfns has size 151.

Sep 29 '21 14:09 TiagoCavalcante

@mmatera apply (from Plus) is getting called twice on 1+1.4+2+x.

Sep 29 '21 15:09 TiagoCavalcante

This happens because after the first application, the expression changes, to a new Plus[...] expression. Then, as the expression has changed (the return value is not None), the evaluator tries to reevaluate the expression. It would be different if apply would result in a number. Maybe this is why to add a+b with a and b undefined costs more than to evaluate it with a=2; b=3..

Sep 29 '21 16:09 mmatera

@mmatera do you think the re-evaluation can be avoided?

Sep 29 '21 16:09 TiagoCavalcante

I think it would be possible, but not desirable. A better approach would be to accelerate the pattern matching for special symbols like Plus, Times, Power, or List, or just cache the last pattern that matched.

Sep 29 '21 16:09 mmatera