Optimize primitive collect methods for primitive Bags when collecting into a primitive Bag.
This is a performance bug. There should be an optimization similar to the way collect is implemented for HashBag (see below).
if (target instanceof MutableBagIterable<?>)
{
MutableBagIterable<V> targetBag = (MutableBagIterable<V>) target;
this.forEachWithOccurrences((each, occurrences) -> targetBag.addOccurrences(function.valueOf(each), occurrences));
}
I will try to solve this issue. Please assign it to me. @donraab
@shreyateeza thank you for volunteering to contribute. Consider it assigned to you. Its a long pending feature request to GitHub to allow new collaborators to be assigned. Once you contribute to Eclipse Collections I think future issues can be assigned to you. Thanks for understanding!
@donraab, but the collect method signature of primitive iterables doesn't fit the requirements. For example IntIterable:
<V, R extends Collection<V>> R collect(IntToObjectFunction<? extends V> function, R target)
MutableIntBag doesn't extend the Collection class.
Is it neccessary to add a new collect method with MutableIntBag as target parameter?
@nolequen Sorry for the delay in responding.
The existing signatures are fine. In the case of an MutableIntBag with the collect signature as you have above, the method can be overridden as follows:
default <V, R extends Collection<V>> R collect(IntToObjectFunction<? extends V> function, R target)
{
if (target instanceof MutableBagIterable<?>)
{
MutableBagIterable<V> targetBag = (MutableBagIterable<V>) target;
this.forEachWithOccurrences((each, occurrences) ->
targetBag.addOccurrences(function.valueOf(each), occurrences));
}
else
{
this.forEachWithOccurrences((each, occurrences) -> {
V value = function.valueOf(each);
for (int i = 0; i < occurrences; i++)
{
target.add(value);
}
});
}
return target;
}
This could be applied similarly across all of the other collect primitive variants (collectInt, collectLong, etc.).
@donraab thanks! Now it is clear.
Hi @shreyateeza following up, will you be working on this?
Hi @shreyateeza, any update on this? Let me know if you will be working on it.
I don't believe this is a valid optimization because of the function involved. Here is an example function that will break if the code is changed:
int prev = -1;
int count = 0;
String prettyPrintBag(int item)
{
if (item == prev)
{
count++;
}
else
{
count = 0;
}
prev = item;
return ""+item+"-"+count;
}
This is a great point @mohrezaei. The optimization would require the function to be stable and repeatable for the same input. This might mean we have some other places where we should back out similar optimizations.
@donraab @mohrezaei I don't understand how the function above will break with the change. Could you provide a example using the collection ? If possible of course, if not it's ok.
@LuizFrra try it with the current code (no optimization) and see what ends up in the target.
I got it now, thx.
Since we aren't able to determine if a function is stateless or not this optimization isn't possible right ? Do you think that is a option add a collect method with a parameter to indicate if the function is stateless ?
Yes, this optimization is flawed @LuizFrra so we should probably remove this optimization on the object side side as well.
Another option would be to add a specialized method, possibly named collectOccurrences which will only apply a Function once for each unique element, and add the appropriate number of occurrences to a target collection, which would use addOccurrences in the case of a MutableBag. We could specify in the JavaDoc then that the Function only gets applied once per unique element.
There is a method on the object side named collectWithOccurrences today that takes each unique bag element with its occurrences and will add the result of applying a function taking both as parameters to some target collection. There's no way to use this method for the same optimization.
Thx for the explication @donraab . We have these things to do:
- Remove the optimization from AbstractBag.java
- Add a method called collectOccurrences to AbstractBag and all others primitive collect methods that apply the function only once
is that right ?
Hi @LuizFrra, yes, this looks correct to me. I think once we add collectOccurrences, we should have countBy on Bag use it instead of collect so the optimization can continue for countBy. We will need to clarify that the Function will only be applied once for countBy on Bags, thus making collect the method that will need to be called if a stateful Function is required.
Does this make sense to you @motlin @mohrezaei @nikhilnanivadekar @prathasirisha?
I will try work on this at weekend.
Proposing, that we name this method collectEachOccurrences so it reflects that we are iterating/looping?
Proposing, that we name this method
collectEachOccurrencesso it reflects that we are iterating/looping?
This works for me, as it makes the distinction clearer between collectWithOccurrences.
I have an question, should I write tests for those implementations ? If so, then where could I put it ?
So if I am understanding this correctly, the current collect implementation should be the implementation of collectEachOccurrences, only running the given function once for each unique element, whereas collect should be changed to run the given function once for every occurrence of each element.
Since it has been several months since any activity on this issue, I can take a stab at this.
@donraab Is this issue still open for new contributors?