Java: use fastutil for maps and lists of primitives
fastutil offers data structures for primitive Java types that are smaller in size and faster for reading and mutating. If protoc used e.g. IntArrayList rather than ArrayList<Integer> and provided getInt() then boxing and unboxing could be avoided.
What language does this apply to? generated Java code
Describe the problem you are trying to solve. improve read/write performance and size of proto messages
Describe the solution you'd like
utilize data structures provided in fastutil (or another package providing similar benefits) and provide the richer get API so client code can work with unboxed primitives as much as possible.
Describe alternatives you've considered
For smaller collections the difference is negligible, for larger ones we convert them from the source proto object so the penalty of using the regular ArrayList<Integer> is minimized
Are you able to perform the experiment and share the details of your finding at this time?
One example I found of intSet performance vs. HashSet<Integer>:
https://www.baeldung.com/fastutil#1-performance
I'll try to update with some tests for primitive lists as well.
I created this repo to test things:
https://github.com/Krumpet/protobuf-performance
initial results testing insertion into fastutil's IntArrayList vs an ArrayList<Integer> (throughput, higher is better):
Benchmark (listSize) Mode Cnt Score Error Units
IntListTest.fastutilIntListInsertion 100 thrpt 5 17714739.171 ± 596148.258 ops/s
IntListTest.fastutilIntListInsertion 10000 thrpt 5 123036.544 ± 5199.245 ops/s
IntListTest.fastutilIntListInsertion 1000000 thrpt 5 1164.040 ± 53.725 ops/s
IntListTest.genericIntListInsertion 100 thrpt 5 5128286.873 ± 61191.414 ops/s
IntListTest.genericIntListInsertion 10000 thrpt 5 37667.105 ± 1977.311 ops/s
IntListTest.genericIntListInsertion 1000000 thrpt 5 97.854 ± 19.564 ops/s
I tried testing with Protobuf's internal IntList but it's implementation isn't public so I couldn't test it from outside.
Is there any interest in this? This could be as simple as exposing the internal IntList which is a List<Integer> so existing users aren't affected.
Then, users who care about this performance gain can use the IntList and avoid auto-boxing.
@esorot pinging to make sure this didn't fall by the wayside.
Another ping :)
fastutil is a 22.3MB dependency, so it's ~14x (!) larger than the already-enormous protobuf-java library.
I'm not affiliated with protobuf, but IMO public libraries should strife for zero dependencies, and adding something that large is an absolute no-go. Exposing the internal IntList would be more reasonable, but I think there are already several issues for that.
There's the fastutil-core jar at 6MB with only some of the data structures (for ints, longs, and doubles). Exposing IntList (and other specialized primitive collections, if protobuf has them) is probably the better thing to do in this case, though. Can you link to these open issues? I'll see if I can find them myself as well.
This seems like the biggest one https://github.com/protocolbuffers/protobuf/issues/3316
We triage inactive PRs and issues in order to make it easier to find active work. If this issue should remain active or becomes active again, please add a comment.
This issue is labeled inactive because the last activity was over 90 days ago.
This, as well as the issue @ennerf linked, are still relevant. Storing collections of primitives in dedicated containers and providing an API to retrieve them as such (i.e. an IntList) would help performance. I think internally this is already used but there's no API to access the collection without boxing.
We triage inactive PRs and issues in order to make it easier to find active work. If this issue should remain active or becomes active again, please add a comment.
This issue is labeled inactive because the last activity was over 90 days ago.
Yes, I would like to keep this active.
Fastutil in java already works with some internal specialized IntList that doesn't box the objects within. It could expose that interface and let users get ints instead of Integers.
We triage inactive PRs and issues in order to make it easier to find active work. If this issue should remain active or becomes active again, please add a comment.
This issue is labeled inactive because the last activity was over 90 days ago.
Yep, still active and relevant. Exposing a specialized primitive collection, already in use within the code, demonstrably improves performance.