pinot icon indicating copy to clipboard operation
pinot copied to clipboard

Explore alternative to LBuffer for larger than 2GB buffers

Open siddharthteotia opened this issue 3 years ago • 4 comments

For off heap buffers (direct or mmap'd), we currently use Lbuffer library if the size is more than 2GB (which is supported for raw forward index by the v3 and v4 writer).

We have seen production issues on at least 2 occasions where a race condition in the library (map followed by unmap and then map) can lead to SIGSEGV where the address the code is trying to access is 3GB beyond the total virtual address space mapped for Pinot segments on the host where crash was observed.

Not easily reproducible but has happened twice in production.

Also, the library is no longer being maintained by the original author so may be there is benefit in writing our own allocator for large buffers.

TODO - add a link to the race condition issue and add more details from the production issues we observed.

siddharthteotia avatar Aug 04 '22 07:08 siddharthteotia

Relates to #8529 as this library blocks JDK17 adoption.

richardstartin avatar Aug 04 '22 10:08 richardstartin

Thanks @richardstartin. Seems like JDK17 is also another motivation to consider moving away from this library.

cc @Jackie-Jiang @walterddr

siddharthteotia avatar Aug 04 '22 15:08 siddharthteotia

@siddharthteotia Have you figured out which part of the code can have race condition? We've run into similar issues (and I remember seeing SIGSEGV before at LI), but turns out it is caused by reading a segment that is already released. Is it possible that you are reading a released segment?

Jackie-Jiang avatar Aug 04 '22 18:08 Jackie-Jiang

I would recommend to give a try to Chronicle bytes. We could use it to replace larray (and therefore be able to run with Java 17) and also to be able to map files larger than 4GBs.

Chronicle Bytes is part of the Chronicle HFT libraries and it is the only buffer library that supports 63 bits references that is maintained.

The other option would be to wait until Project Panama Foreign Memory Access API, but that may take a while and it would force use either to use modern JVMs (which doesn't seem acceptable) or to use multi-release jars (which doesn't seem easy right now given the amount of shading we use)

gortiz avatar Aug 24 '22 05:08 gortiz

I've tried Chronicle Bytes in https://github.com/apache/pinot/pull/9842, but I after fixing some errors for a couple of days, I would recommend to do not use it. Some design decision are quite different to what someone using ByteBuffers or LArrays would expect (which is even worse when the methods are not properly documented), some features are not present (like copying to a ByteBuffer from a given offset), some others are not consistent (https://github.com/OpenHFT/Chronicle-Bytes/issues/478), some mistakes derive in the JVM been killed, etc. I think it is an amazing library, but it is not maintained as a generic byte buffer library but very focused on how people from OpenHFT use it, which I totally understand but would require some time inversion from us to be able to use it.

On the other hand, I've been trying to understand why LArray cannot be used in Java 17 and it seems to be related to the way @xerial creates ByteBuffers views. To do so, LArray uses a private constructor that has changed from in Java 17: A new parameter has been added. Xerial has already dealt with that issue here so I think it is feasible to do it in LArray. That would be fantastic because it would allow us to run with modern JVMs without needing to change a bit of code.

I'm still thinking that a proper Foreign Memory Access API (like JEP-424](https://openjdk.org/jeps/424)) is the way to go, but given that the solution is still not stable, the best we can do is to help Xerial to update his library.

gortiz avatar Nov 25 '22 08:11 gortiz

I've open https://github.com/apache/pinot/issues/10783 as a PEP and https://github.com/apache/pinot/pull/10528 as a PR in order to be able to run with Java 17 (and 21!)

gortiz avatar May 26 '23 07:05 gortiz

We may want to close this issue

gortiz avatar Sep 22 '23 09:09 gortiz

I think this other issue/PEP may be interesting for readers: https://github.com/apache/pinot/issues/11656

gortiz avatar Sep 22 '23 10:09 gortiz