columnify icon indicating copy to clipboard operation
columnify copied to clipboard

How to estimate memory usage?

Open okkez opened this issue 5 years ago • 4 comments

We are running columnify as a part of fluent-plugin-s3 compressor (msgpack to parquet) for these days. But columnify caused no memory error in some environments. So I want to estimate memory usage of columnify. Or is there a way to keep the memory usage constant regardless of the file size?

In my research, memory usage is proportional to file size. Large files use 5 to 6 times the file size in memory. For example, a large msgpack file (223MB) consumes memory about 1.3GB (ps command's RSS).

okkez avatar Jun 29 '20 08:06 okkez

I think the part consumes momery should formatted row data by FormatToMap() and we can estimate memory usage by counting sizes of the row data.

syucream avatar Jul 07 '20 12:07 syucream

I will try to repot the estimation result. I'm thinking to prepare something that writes estimation logs to stderr when it's called with -verbose flag or others. On the other hand, I guess we can reduce memory consumption with rethinking intermediate representation.

syucream avatar Jul 07 '20 12:07 syucream

Hmm it looks harder than I expected ... current parquet package highly depends on parquet-go and to suit it we have a redundant conversion at parquet.MarshalMap() will consumer many memory ...

syucream avatar Jul 07 '20 12:07 syucream

First I will dig the problem more with using pprof. https://github.com/reproio/columnify/issues/44

syucream avatar Jul 07 '20 13:07 syucream