eclipse-collections icon indicating copy to clipboard operation
eclipse-collections copied to clipboard

Optimize primitive collect methods for primitive Bags when collecting into a primitive Bag.

Open donraab opened this issue 8 years ago • 22 comments

This is a performance bug. There should be an optimization similar to the way collect is implemented for HashBag (see below).

if (target instanceof MutableBagIterable<?>)
{
    MutableBagIterable<V> targetBag = (MutableBagIterable<V>) target;

    this.forEachWithOccurrences((each, occurrences) -> targetBag.addOccurrences(function.valueOf(each), occurrences));
}

donraab avatar Dec 17 '17 03:12 donraab

I will try to solve this issue. Please assign it to me. @donraab

shreyateeza avatar Jan 07 '18 07:01 shreyateeza

@shreyateeza thank you for volunteering to contribute. Consider it assigned to you. Its a long pending feature request to GitHub to allow new collaborators to be assigned. Once you contribute to Eclipse Collections I think future issues can be assigned to you. Thanks for understanding!

nikhilnanivadekar avatar Jan 07 '18 08:01 nikhilnanivadekar

@donraab, but the collect method signature of primitive iterables doesn't fit the requirements. For example IntIterable: <V, R extends Collection<V>> R collect(IntToObjectFunction<? extends V> function, R target) MutableIntBag doesn't extend the Collection class.

Is it neccessary to add a new collect method with MutableIntBag as target parameter?

nolequen avatar Feb 23 '18 17:02 nolequen

@nolequen Sorry for the delay in responding.

The existing signatures are fine. In the case of an MutableIntBag with the collect signature as you have above, the method can be overridden as follows:

default <V, R extends Collection<V>> R collect(IntToObjectFunction<? extends V> function, R target)
{
    if (target instanceof MutableBagIterable<?>)
    {
        MutableBagIterable<V> targetBag = (MutableBagIterable<V>) target;

        this.forEachWithOccurrences((each, occurrences) ->
                targetBag.addOccurrences(function.valueOf(each), occurrences));
    }
    else
    {
        this.forEachWithOccurrences((each, occurrences) -> {
            V value = function.valueOf(each);
            for (int i = 0; i < occurrences; i++)
            {
                target.add(value);
            }
        });
    }
    return target;
}

This could be applied similarly across all of the other collect primitive variants (collectInt, collectLong, etc.).

donraab avatar Mar 24 '18 05:03 donraab

@donraab thanks! Now it is clear.

nolequen avatar Mar 24 '18 09:03 nolequen

Hi @shreyateeza following up, will you be working on this?

nikhilnanivadekar avatar Sep 12 '18 04:09 nikhilnanivadekar

Hi @shreyateeza, any update on this? Let me know if you will be working on it.

donraab avatar Dec 31 '19 19:12 donraab

I don't believe this is a valid optimization because of the function involved. Here is an example function that will break if the code is changed:

int prev = -1;
int count = 0;

String prettyPrintBag(int item)
{
    if (item == prev)
    {
         count++;
    }
    else
    {
        count = 0;
    }
    prev = item;
    return ""+item+"-"+count;
}

mohrezaei avatar Aug 28 '21 21:08 mohrezaei

This is a great point @mohrezaei. The optimization would require the function to be stable and repeatable for the same input. This might mean we have some other places where we should back out similar optimizations.

donraab avatar Aug 30 '21 22:08 donraab

@donraab @mohrezaei I don't understand how the function above will break with the change. Could you provide a example using the collection ? If possible of course, if not it's ok.

LuizFrra avatar Sep 02 '21 01:09 LuizFrra

@LuizFrra try it with the current code (no optimization) and see what ends up in the target.

mohrezaei avatar Sep 02 '21 01:09 mohrezaei

I got it now, thx.

LuizFrra avatar Sep 02 '21 01:09 LuizFrra

Since we aren't able to determine if a function is stateless or not this optimization isn't possible right ? Do you think that is a option add a collect method with a parameter to indicate if the function is stateless ?

LuizFrra avatar Sep 02 '21 02:09 LuizFrra

Yes, this optimization is flawed @LuizFrra so we should probably remove this optimization on the object side side as well.

Another option would be to add a specialized method, possibly named collectOccurrences which will only apply a Function once for each unique element, and add the appropriate number of occurrences to a target collection, which would use addOccurrences in the case of a MutableBag. We could specify in the JavaDoc then that the Function only gets applied once per unique element.

There is a method on the object side named collectWithOccurrences today that takes each unique bag element with its occurrences and will add the result of applying a function taking both as parameters to some target collection. There's no way to use this method for the same optimization.

donraab avatar Sep 02 '21 04:09 donraab

Thx for the explication @donraab . We have these things to do:

  • Remove the optimization from AbstractBag.java
  • Add a method called collectOccurrences to AbstractBag and all others primitive collect methods that apply the function only once

is that right ?

LuizFrra avatar Sep 02 '21 10:09 LuizFrra

Hi @LuizFrra, yes, this looks correct to me. I think once we add collectOccurrences, we should have countBy on Bag use it instead of collect so the optimization can continue for countBy. We will need to clarify that the Function will only be applied once for countBy on Bags, thus making collect the method that will need to be called if a stateful Function is required.

Does this make sense to you @motlin @mohrezaei @nikhilnanivadekar @prathasirisha?

donraab avatar Sep 02 '21 17:09 donraab

I will try work on this at weekend.

LuizFrra avatar Sep 02 '21 22:09 LuizFrra

Proposing, that we name this method collectEachOccurrences so it reflects that we are iterating/looping?

prathasirisha avatar Sep 03 '21 21:09 prathasirisha

Proposing, that we name this method collectEachOccurrences so it reflects that we are iterating/looping?

This works for me, as it makes the distinction clearer between collectWithOccurrences.

donraab avatar Sep 03 '21 21:09 donraab

I have an question, should I write tests for those implementations ? If so, then where could I put it ?

LuizFrra avatar Sep 03 '21 22:09 LuizFrra

So if I am understanding this correctly, the current collect implementation should be the implementation of collectEachOccurrences, only running the given function once for each unique element, whereas collect should be changed to run the given function once for every occurrence of each element. Since it has been several months since any activity on this issue, I can take a stab at this.

JonathanJagt23 avatar Jun 21 '22 19:06 JonathanJagt23

@donraab Is this issue still open for new contributors?

JonathanJagt23 avatar Jun 27 '22 17:06 JonathanJagt23