How to estimate memory usage?
We are running columnify as a part of fluent-plugin-s3 compressor (msgpack to parquet) for these days. But columnify caused no memory error in some environments. So I want to estimate memory usage of columnify. Or is there a way to keep the memory usage constant regardless of the file size?
In my research, memory usage is proportional to file size. Large files use 5 to 6 times the file size in memory. For example, a large msgpack file (223MB) consumes memory about 1.3GB (ps command's RSS).
I think the part consumes momery should formatted row data by FormatToMap() and we can estimate memory usage by counting sizes of the row data.
I will try to repot the estimation result. I'm thinking to prepare something that writes estimation logs to stderr when it's called with -verbose flag or others.
On the other hand, I guess we can reduce memory consumption with rethinking intermediate representation.
Hmm it looks harder than I expected ... current parquet package highly depends on parquet-go and to suit it we have a redundant conversion at parquet.MarshalMap() will consumer many memory ...
First I will dig the problem more with using pprof. https://github.com/reproio/columnify/issues/44