Mathics icon indicating copy to clipboard operation
Mathics copied to clipboard

Huge memory footprint of expressions

Open teaalltr opened this issue 9 years ago • 8 comments

Using ByteCount (specifically, using asizeof module), one can see a huge consumption of memory to store Mathics expressions. For example, storing List[1, 2] takes about 38x the amount MMA takes. Things get worse if using a 64 bit CPython - the amount almost doubles. Storing Range[1000000] (of course, unrolled) takes for instance 584699560 bytes - it's almost 600 MB! - while MMA takes 8000144 bytes - 8 MB ca.). I guess CPython does some caching and optimizations - in fact there's a 5-10% difference between the value from Task Manager and the value from ByteCount.

That's a serious problem at all. May pickle and compression help?

teaalltr avatar Sep 16 '16 23:09 teaalltr

While concerning, this makes sense. We have to allocate a whole lot of Integer objects.

I think a partial solution to this is packed arrays. I did some testing and I think that's how MMA handles this too:

angus@thinkpad> math                                                          ~
Mathematica 10.4.1 for Linux x86 (64-bit)
Copyright 1988-2016 Wolfram Research, Inc.

In[1]:= x = Range[1000000];

In[2]:= ByteCount[x]
Out[2]= 8000144

In[3]:= x[[1000000]] = 1   
Out[3]= 1

In[4]:= ByteCount[x]
Out[4]= 8000144

In[5]:= x[[1000000]] = y   (* Array can no longer be packed *)
Out[5]= y

In[6]:= ByteCount[x]
Out[6]= 24000064

In the long term, reducing the memory usage of the core data structures could be done by e.g. rewriting in C or possibly even just allocating raw python ints:

$> python
Python 3.5.2 (default, Jun 28 2016, 08:46:01) 
[GCC 6.1.1 20160602] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> a = list(range(1000000))
>>> import sys
>>> sys.getsizeof(a)
9000112

sn6uv avatar Sep 17 '16 02:09 sn6uv

Or maybe even trying to use __slots__: http://tech.oyster.com/save-ram-with-python-slots/

sn6uv avatar Sep 17 '16 02:09 sn6uv

Also, on my system things aren't quite as bad (using #560):

In[1]:= ByteCount[Range[1000000]]
Out[1]= 297396384

sn6uv avatar Sep 17 '16 02:09 sn6uv

@rocky this still isn't closed so I think it hasn't been solved.

Maybe a NumPy array would help here?

TiagoCavalcante avatar Jun 20 '21 16:06 TiagoCavalcante

I don't think this has been addressed although the side issues have.

Working out all of the details in implementing Mathics Lists as Numpy arrays seems to me tricky. You'd need to know when it gains and would need to materialize the array on demand.

rocky avatar Jun 20 '21 16:06 rocky

You'd need to know when it gains and would need to materialize the array on demand.

@rocky what do you mean by that?

Working out all of the details in implementing Mathics Lists as Numpy arrays seems to me tricky.

Yes, but the memory footprint would be much smaller. I would try to implement that, but I don't know where do it, in the class List, RowBox?

TiagoCavalcante avatar Jun 20 '21 16:06 TiagoCavalcante

You'd need to know when it gains and would need to materialize the array on demand.

@rocky what do you mean by that?

Supposed I want to implement NumpyArrays for Python lists. But there is overhead to convert a Python list into a NumpyArray. And I guess I need to convert back from Numpy to Python. This is overhead. For many small Python lists this won't be of any benefit because just the time spent to convert in and out.

Working out all of the details in implementing Mathics Lists as Numpy arrays seems to me tricky.

Yes, but the memory footprint would be much smaller. I would try to implement that, but I don't know where do it, in the class List, RowBox?

Lists are fundamental in Mathics and WL. All expressions are lists. I think this would be very difficult and honestly I am not sure I would know how to do.

rocky avatar Jun 20 '21 17:06 rocky

Supposed I want to implement NumpyArrays for Python lists. But there is overhead to convert a Python list into a NumpyArray. And I guess I need to convert back from Numpy to Python. This is overhead. For many small Python lists this won't be of any benefit because just the time spent to convert in and out.

I think on this and implement it in mathics scanner would cut the overhead, as string to list has overhead, the same happen with string to NumPy but much smaller.

Lists are fundamental in Mathics and WL. All expressions are lists.

Thanks by the information, this will help.

I think this would be very difficult and honestly I am not sure I would know how to do.

I'll try to do it, I'll have time next week, the tests are going to help a lot here (help in my success or in my fail).

TiagoCavalcante avatar Jun 20 '21 19:06 TiagoCavalcante