eclipse-collections icon indicating copy to clipboard operation
eclipse-collections copied to clipboard

SortedSet#intersect does not always return elements from this

Open duponter opened this issue 1 year ago • 3 comments

Context

According to the JavaDocs SortedSet#intersect should always return members from this:

https://eclipse.dev/collections/javadoc/11.1.0/org/eclipse/collections/api/set/sorted/ImmutableSortedSet.html#intersect(org.eclipse.collections.api.set.SetIterable)

Description copied from interface: SortedSetIterable Returns the set of all objects that are members of both this and set. The intersection of [1, 2, 3] and [2, 3, 4] is the set [2, 3]. The output will contain instances from this, not set.

Issue

However, testing reveals that elements of the smallest of the 2 sets are returned instead. Not only contradicts this the API documentation, but this is counterintuitive as well. In addition, when replacing intersect with the select(Set::contains) combination, the same functionality works as expected.

Reproduction

Following test suite contains 4 test cases:

  • SetA.intersect(SetB) where SetA > SetB
  • SetA.intersect(SetB) where SetA < SetB
  • SetA.select(SetB::contains) where SetA > SetB
  • SetA.select(SetB::contains) where SetA < SetB

Technologies used:

  • JDK 21
  • Eclipse Collections 11.1
  • JUnit 5.10
  • AssertJ 3.25
class SortedSetIntersectTest {

    @Nested
    class SelectContains {
        @Test
        void selectContainsReturnsElementsFromTheInvokingSetWhenFirstSetIsLargerThanSecond() {
            ImmutableList<Person> persons = Person.generate(20);
            ImmutableSortedSet<Person> first = SortedSets.immutable.withAll(Person.NAME_ONLY_COMPARATOR, persons.take(15));
            ImmutableSortedSet<Person> second = SortedSets.immutable.withAll(Person.NAME_ONLY_COMPARATOR, persons.drop(10).collect(Person::changeAge));

            System.out.printf("Set 1 with %d elements ...%n", first.size());
            first.forEachWithIndex((p, i) -> System.out.printf("%02d: %s%n", i, p));
            System.out.printf("%n...vs Set 2 with %d elements%n", second.size());
            second.forEachWithIndex((p, i) -> System.out.printf("%02d: %s%n", i, p));

            assertThat(selectContains(first, second)).containsExactlyElementsOf(first.drop(10));
            assertThat(selectContains(second, first)).containsExactlyElementsOf(second.take(5));
        }

        @Test
        void selectContainsReturnsElementsFromTheInvokingSetWhenFirstSetIsSmallerThanSecond() {
            ImmutableList<Person> persons = Person.generate(20);
            ImmutableSortedSet<Person> first = SortedSets.immutable.withAll(Person.NAME_ONLY_COMPARATOR, persons.take(15)).drop(10);
            ImmutableSortedSet<Person> second = SortedSets.immutable.withAll(Person.NAME_ONLY_COMPARATOR, persons.drop(10).collect(Person::changeAge));

            System.out.printf("Set 1 with %d elements ...%n", first.size());
            first.forEachWithIndex((p, i) -> System.out.printf("%02d: %s%n", i, p));
            System.out.printf("%n...vs Set 2 with %d elements%n", second.size());
            second.forEachWithIndex((p, i) -> System.out.printf("%02d: %s%n", i, p));

            assertThat(selectContains(first, second)).containsExactlyElementsOf(first);
            assertThat(selectContains(second, first)).containsExactlyElementsOf(second.take(5));
        }

        private SortedSet<Person> selectContains(ImmutableSortedSet<Person> first, ImmutableSortedSet<Person> second) {
            return SortedSets.immutable.withAll(Person.NATURAL_COMPARATOR, first.select(second::contains)).castToSortedSet();
        }
    }

    @Nested
    class Intersect {

        @Test
        void intersectReturnsElementsFromTheSmallestSetWhenFirstSetIsLargerThanSecond() {
            ImmutableList<Person> persons = Person.generate(20);
            ImmutableSortedSet<Person> first = SortedSets.immutable.withAll(Person.NAME_ONLY_COMPARATOR, persons.take(15));
            ImmutableSortedSet<Person> second = SortedSets.immutable.withAll(Person.NAME_ONLY_COMPARATOR, persons.drop(10).collect(Person::changeAge));

            System.out.printf("Set 1 with %d elements ...%n", first.size());
            first.forEachWithIndex((p, i) -> System.out.printf("%02d: %s%n", i, p));
            System.out.printf("%n...vs Set 2 with %d elements%n", second.size());
            second.forEachWithIndex((p, i) -> System.out.printf("%02d: %s%n", i, p));

            assertThat(intersect(first, second)).containsExactlyElementsOf(second.take(5));   // succeeds, but should return elements from 'first'
            assertThat(intersect(second, first)).containsExactlyElementsOf(second.take(5));
//        assertThat(intersect(first, second)).containsExactlyElementsOf(first.drop(10));   // fails, but should work imho
        }
        
        @Test
        void intersectReturnsElementsFromTheSmallestSetWhenFirstSetIsSmallerThanSecond() {
            ImmutableList<Person> persons = Person.generate(20);
            ImmutableSortedSet<Person> first = SortedSets.immutable.withAll(Person.NAME_ONLY_COMPARATOR, persons.take(15)).drop(10);
            ImmutableSortedSet<Person> second = SortedSets.immutable.withAll(Person.NAME_ONLY_COMPARATOR, persons.drop(10).collect(Person::changeAge));

            System.out.printf("Set 1 with %d elements ...%n", first.size());
            first.forEachWithIndex((p, i) -> System.out.printf("%02d: %s%n", i, p));
            System.out.printf("%n...vs Set 2 with %d elements%n", second.size());
            second.forEachWithIndex((p, i) -> System.out.printf("%02d: %s%n", i, p));

            assertThat(intersect(first, second)).containsExactlyElementsOf(first);
            assertThat(intersect(second, first)).containsExactlyElementsOf(first);            // succeeds, but should return elements from 'second'
//        assertThat(intersect(second, first)).containsExactlyElementsOf(second.take(5));   // fails, but should work imho
        }

        private SortedSet<Person> intersect(ImmutableSortedSet<Person> first, ImmutableSortedSet<Person> second) {
            return SortedSets.immutable.withAll(Person.NATURAL_COMPARATOR, first.intersect(second)).castToSortedSet();
        }
    }

    private record Person(String name, int age) {
        public static final Comparator<Person> NATURAL_COMPARATOR = Comparator.comparing(Person::name).thenComparingInt(Person::age);
        public static final Comparator<Person> NAME_ONLY_COMPARATOR = Comparator.comparing(Person::name);

        private static final ThreadLocalRandom RANDOM = ThreadLocalRandom.current();
        private static final String[] NAMES = new String[]{
            "Hye", "Robbie", "Daryl", "Ulrike", "Marivel", "Huey", "Lindsay", "Avery", "Brock", "Patti", "Makeda", "Rudolf", "Burma", "Slyvia", "Mike", "Oscar", "Cher", "Willard", "Buford", "Hobert", "Brain", "Ricky", "Jeraldine", "Lindsey", "Mia", "Carolee", "Barney", "Emmett", "Lezlie", "Cristi"
        };

        public static ImmutableList<Person> generate(int size) {
            MutableSortedSet<Person> persons = SortedSets.mutable.of(NAME_ONLY_COMPARATOR);
            while (persons.size() < size) {
                persons.with(new Person());
            }
            return persons.toImmutableList();
        }

        private static int randomAge() {
            return RANDOM.nextInt(100);
        }

        private static String randomName() {
            return NAMES[RANDOM.nextInt(NAMES.length)];
        }

        public Person() {
            this(randomName(), randomAge());
        }

        Person changeAge() {
            int newAge = IntStream.generate(Person::randomAge).filter(age -> age != this.age).findFirst().orElseThrow();
            return new Person(name, newAge);
        }
    }
}

duponter avatar Aug 09 '24 12:08 duponter

Hi! I think we might be able to fix this issue by gently removing this part of the optimization code in https://github.com/eclipse/eclipse-collections/blob/master/eclipse-collections/src/main/java/org/eclipse/collections/impl/utility/internal/SetIterables.java#L77-L82

lzcyx avatar Aug 17 '24 05:08 lzcyx

It makes sense to me. I'd like to get @mohrezaei's opinion too since there was a similar change in #1661 that raised performance concerns.

motlin avatar Aug 17 '24 19:08 motlin

So clearly the behavior and the doc are inconsistent. It should be noted that this can only happen if the set's sense of equality is different from whatever way the returned set's elements are being compared to elements from this.

So there are two ways to fix this:

  1. Keep the doc, change the behavior. This would get rid of the optimization.
  2. Keep the behavior, fix the doc.

I would generally favor (2), for the following reasons:

  • the only real benefit of this funciton is that bit of optimization. It's otherwise just an alias for a one-liner.
  • these functions (intersect, difference, etc) are borrowed from set-theory, where set elements don't have different senses of equality.

We could even be very explicit in the doc and suggest using this.select(that.contains) if someone has the requirement to preserve the memory references from this.

mohrezaei avatar Aug 17 '24 19:08 mohrezaei