SortedSet#intersect does not always return elements from this
Context
According to the JavaDocs SortedSet#intersect should always return members from this:
https://eclipse.dev/collections/javadoc/11.1.0/org/eclipse/collections/api/set/sorted/ImmutableSortedSet.html#intersect(org.eclipse.collections.api.set.SetIterable)
Description copied from interface: SortedSetIterable Returns the set of all objects that are members of both this and set. The intersection of [1, 2, 3] and [2, 3, 4] is the set [2, 3]. The output will contain instances from this, not set.
Issue
However, testing reveals that elements of the smallest of the 2 sets are returned instead. Not only contradicts this the API documentation, but this is counterintuitive as well.
In addition, when replacing intersect with the select(Set::contains) combination, the same functionality works as expected.
Reproduction
Following test suite contains 4 test cases:
- SetA.intersect(SetB) where SetA > SetB
- SetA.intersect(SetB) where SetA < SetB
- SetA.select(SetB::contains) where SetA > SetB
- SetA.select(SetB::contains) where SetA < SetB
Technologies used:
- JDK 21
- Eclipse Collections 11.1
- JUnit 5.10
- AssertJ 3.25
class SortedSetIntersectTest {
@Nested
class SelectContains {
@Test
void selectContainsReturnsElementsFromTheInvokingSetWhenFirstSetIsLargerThanSecond() {
ImmutableList<Person> persons = Person.generate(20);
ImmutableSortedSet<Person> first = SortedSets.immutable.withAll(Person.NAME_ONLY_COMPARATOR, persons.take(15));
ImmutableSortedSet<Person> second = SortedSets.immutable.withAll(Person.NAME_ONLY_COMPARATOR, persons.drop(10).collect(Person::changeAge));
System.out.printf("Set 1 with %d elements ...%n", first.size());
first.forEachWithIndex((p, i) -> System.out.printf("%02d: %s%n", i, p));
System.out.printf("%n...vs Set 2 with %d elements%n", second.size());
second.forEachWithIndex((p, i) -> System.out.printf("%02d: %s%n", i, p));
assertThat(selectContains(first, second)).containsExactlyElementsOf(first.drop(10));
assertThat(selectContains(second, first)).containsExactlyElementsOf(second.take(5));
}
@Test
void selectContainsReturnsElementsFromTheInvokingSetWhenFirstSetIsSmallerThanSecond() {
ImmutableList<Person> persons = Person.generate(20);
ImmutableSortedSet<Person> first = SortedSets.immutable.withAll(Person.NAME_ONLY_COMPARATOR, persons.take(15)).drop(10);
ImmutableSortedSet<Person> second = SortedSets.immutable.withAll(Person.NAME_ONLY_COMPARATOR, persons.drop(10).collect(Person::changeAge));
System.out.printf("Set 1 with %d elements ...%n", first.size());
first.forEachWithIndex((p, i) -> System.out.printf("%02d: %s%n", i, p));
System.out.printf("%n...vs Set 2 with %d elements%n", second.size());
second.forEachWithIndex((p, i) -> System.out.printf("%02d: %s%n", i, p));
assertThat(selectContains(first, second)).containsExactlyElementsOf(first);
assertThat(selectContains(second, first)).containsExactlyElementsOf(second.take(5));
}
private SortedSet<Person> selectContains(ImmutableSortedSet<Person> first, ImmutableSortedSet<Person> second) {
return SortedSets.immutable.withAll(Person.NATURAL_COMPARATOR, first.select(second::contains)).castToSortedSet();
}
}
@Nested
class Intersect {
@Test
void intersectReturnsElementsFromTheSmallestSetWhenFirstSetIsLargerThanSecond() {
ImmutableList<Person> persons = Person.generate(20);
ImmutableSortedSet<Person> first = SortedSets.immutable.withAll(Person.NAME_ONLY_COMPARATOR, persons.take(15));
ImmutableSortedSet<Person> second = SortedSets.immutable.withAll(Person.NAME_ONLY_COMPARATOR, persons.drop(10).collect(Person::changeAge));
System.out.printf("Set 1 with %d elements ...%n", first.size());
first.forEachWithIndex((p, i) -> System.out.printf("%02d: %s%n", i, p));
System.out.printf("%n...vs Set 2 with %d elements%n", second.size());
second.forEachWithIndex((p, i) -> System.out.printf("%02d: %s%n", i, p));
assertThat(intersect(first, second)).containsExactlyElementsOf(second.take(5)); // succeeds, but should return elements from 'first'
assertThat(intersect(second, first)).containsExactlyElementsOf(second.take(5));
// assertThat(intersect(first, second)).containsExactlyElementsOf(first.drop(10)); // fails, but should work imho
}
@Test
void intersectReturnsElementsFromTheSmallestSetWhenFirstSetIsSmallerThanSecond() {
ImmutableList<Person> persons = Person.generate(20);
ImmutableSortedSet<Person> first = SortedSets.immutable.withAll(Person.NAME_ONLY_COMPARATOR, persons.take(15)).drop(10);
ImmutableSortedSet<Person> second = SortedSets.immutable.withAll(Person.NAME_ONLY_COMPARATOR, persons.drop(10).collect(Person::changeAge));
System.out.printf("Set 1 with %d elements ...%n", first.size());
first.forEachWithIndex((p, i) -> System.out.printf("%02d: %s%n", i, p));
System.out.printf("%n...vs Set 2 with %d elements%n", second.size());
second.forEachWithIndex((p, i) -> System.out.printf("%02d: %s%n", i, p));
assertThat(intersect(first, second)).containsExactlyElementsOf(first);
assertThat(intersect(second, first)).containsExactlyElementsOf(first); // succeeds, but should return elements from 'second'
// assertThat(intersect(second, first)).containsExactlyElementsOf(second.take(5)); // fails, but should work imho
}
private SortedSet<Person> intersect(ImmutableSortedSet<Person> first, ImmutableSortedSet<Person> second) {
return SortedSets.immutable.withAll(Person.NATURAL_COMPARATOR, first.intersect(second)).castToSortedSet();
}
}
private record Person(String name, int age) {
public static final Comparator<Person> NATURAL_COMPARATOR = Comparator.comparing(Person::name).thenComparingInt(Person::age);
public static final Comparator<Person> NAME_ONLY_COMPARATOR = Comparator.comparing(Person::name);
private static final ThreadLocalRandom RANDOM = ThreadLocalRandom.current();
private static final String[] NAMES = new String[]{
"Hye", "Robbie", "Daryl", "Ulrike", "Marivel", "Huey", "Lindsay", "Avery", "Brock", "Patti", "Makeda", "Rudolf", "Burma", "Slyvia", "Mike", "Oscar", "Cher", "Willard", "Buford", "Hobert", "Brain", "Ricky", "Jeraldine", "Lindsey", "Mia", "Carolee", "Barney", "Emmett", "Lezlie", "Cristi"
};
public static ImmutableList<Person> generate(int size) {
MutableSortedSet<Person> persons = SortedSets.mutable.of(NAME_ONLY_COMPARATOR);
while (persons.size() < size) {
persons.with(new Person());
}
return persons.toImmutableList();
}
private static int randomAge() {
return RANDOM.nextInt(100);
}
private static String randomName() {
return NAMES[RANDOM.nextInt(NAMES.length)];
}
public Person() {
this(randomName(), randomAge());
}
Person changeAge() {
int newAge = IntStream.generate(Person::randomAge).filter(age -> age != this.age).findFirst().orElseThrow();
return new Person(name, newAge);
}
}
}
Hi! I think we might be able to fix this issue by gently removing this part of the optimization code in https://github.com/eclipse/eclipse-collections/blob/master/eclipse-collections/src/main/java/org/eclipse/collections/impl/utility/internal/SetIterables.java#L77-L82
It makes sense to me. I'd like to get @mohrezaei's opinion too since there was a similar change in #1661 that raised performance concerns.
So clearly the behavior and the doc are inconsistent. It should be noted that this can only happen if the set's sense of equality is different from whatever way the returned set's elements are being compared to elements from this.
So there are two ways to fix this:
- Keep the doc, change the behavior. This would get rid of the optimization.
- Keep the behavior, fix the doc.
I would generally favor (2), for the following reasons:
- the only real benefit of this funciton is that bit of optimization. It's otherwise just an alias for a one-liner.
- these functions (
intersect,difference, etc) are borrowed from set-theory, where set elements don't have different senses of equality.
We could even be very explicit in the doc and suggest using this.select(that.contains) if someone has the requirement to preserve the memory references from this.