protobuf icon indicating copy to clipboard operation
protobuf copied to clipboard

Java: use fastutil for maps and lists of primitives

Open Krumpet opened this issue 3 years ago • 5 comments

fastutil offers data structures for primitive Java types that are smaller in size and faster for reading and mutating. If protoc used e.g. IntArrayList rather than ArrayList<Integer> and provided getInt() then boxing and unboxing could be avoided.

What language does this apply to? generated Java code

Describe the problem you are trying to solve. improve read/write performance and size of proto messages

Describe the solution you'd like utilize data structures provided in fastutil (or another package providing similar benefits) and provide the richer get API so client code can work with unboxed primitives as much as possible.

Describe alternatives you've considered For smaller collections the difference is negligible, for larger ones we convert them from the source proto object so the penalty of using the regular ArrayList<Integer> is minimized

Krumpet avatar Nov 16 '22 18:11 Krumpet

Are you able to perform the experiment and share the details of your finding at this time?

esorot avatar Nov 30 '22 20:11 esorot

One example I found of intSet performance vs. HashSet<Integer>:

https://www.baeldung.com/fastutil#1-performance

I'll try to update with some tests for primitive lists as well.

Krumpet avatar Dec 01 '22 10:12 Krumpet

I created this repo to test things:

https://github.com/Krumpet/protobuf-performance

initial results testing insertion into fastutil's IntArrayList vs an ArrayList<Integer> (throughput, higher is better):

Benchmark                             (listSize)   Mode  Cnt         Score        Error  Units
IntListTest.fastutilIntListInsertion         100  thrpt    5  17714739.171 ± 596148.258  ops/s
IntListTest.fastutilIntListInsertion       10000  thrpt    5    123036.544 ±   5199.245  ops/s
IntListTest.fastutilIntListInsertion     1000000  thrpt    5      1164.040 ±     53.725  ops/s
IntListTest.genericIntListInsertion          100  thrpt    5   5128286.873 ±  61191.414  ops/s
IntListTest.genericIntListInsertion        10000  thrpt    5     37667.105 ±   1977.311  ops/s
IntListTest.genericIntListInsertion      1000000  thrpt    5        97.854 ±     19.564  ops/s

Krumpet avatar Dec 01 '22 19:12 Krumpet

I tried testing with Protobuf's internal IntList but it's implementation isn't public so I couldn't test it from outside.

Krumpet avatar Dec 01 '22 19:12 Krumpet

Is there any interest in this? This could be as simple as exposing the internal IntList which is a List<Integer> so existing users aren't affected.

Then, users who care about this performance gain can use the IntList and avoid auto-boxing.

Krumpet avatar Dec 15 '22 18:12 Krumpet

@esorot pinging to make sure this didn't fall by the wayside.

Krumpet avatar Jan 17 '23 14:01 Krumpet

Another ping :)

Krumpet avatar Feb 06 '23 17:02 Krumpet

fastutil is a 22.3MB dependency, so it's ~14x (!) larger than the already-enormous protobuf-java library.

I'm not affiliated with protobuf, but IMO public libraries should strife for zero dependencies, and adding something that large is an absolute no-go. Exposing the internal IntList would be more reasonable, but I think there are already several issues for that.

ennerf avatar Apr 16 '23 18:04 ennerf

There's the fastutil-core jar at 6MB with only some of the data structures (for ints, longs, and doubles). Exposing IntList (and other specialized primitive collections, if protobuf has them) is probably the better thing to do in this case, though. Can you link to these open issues? I'll see if I can find them myself as well.

Krumpet avatar Apr 25 '23 12:04 Krumpet

This seems like the biggest one https://github.com/protocolbuffers/protobuf/issues/3316

ennerf avatar Apr 25 '23 13:04 ennerf

We triage inactive PRs and issues in order to make it easier to find active work. If this issue should remain active or becomes active again, please add a comment.

This issue is labeled inactive because the last activity was over 90 days ago.

github-actions[bot] avatar Dec 28 '23 10:12 github-actions[bot]

This, as well as the issue @ennerf linked, are still relevant. Storing collections of primitives in dedicated containers and providing an API to retrieve them as such (i.e. an IntList) would help performance. I think internally this is already used but there's no API to access the collection without boxing.

Krumpet avatar Jan 04 '24 09:01 Krumpet

We triage inactive PRs and issues in order to make it easier to find active work. If this issue should remain active or becomes active again, please add a comment.

This issue is labeled inactive because the last activity was over 90 days ago.

github-actions[bot] avatar Apr 04 '24 10:04 github-actions[bot]

Yes, I would like to keep this active.

Krumpet avatar Apr 04 '24 10:04 Krumpet

Fastutil in java already works with some internal specialized IntList that doesn't box the objects within. It could expose that interface and let users get ints instead of Integers.

Krumpet avatar Apr 04 '24 10:04 Krumpet

We triage inactive PRs and issues in order to make it easier to find active work. If this issue should remain active or becomes active again, please add a comment.

This issue is labeled inactive because the last activity was over 90 days ago.

github-actions[bot] avatar Jul 05 '24 10:07 github-actions[bot]

Yep, still active and relevant. Exposing a specialized primitive collection, already in use within the code, demonstrably improves performance.

Krumpet avatar Jul 05 '24 10:07 Krumpet