serializer icon indicating copy to clipboard operation
serializer copied to clipboard

Duplicate set entries with LazyHashSet.add()

Open ZockiRR opened this issue 1 year ago • 5 comments

Environment Details

  • Eclipse Serializer Version: 1.2.0
  • JDK version: 21.0.1
  • OS: Win 10
  • Used frameworks: -

Describe the bug

I am using a LazyHashSet to store a Wrapper Object, which is identified by identity. When I just use LazyHashSet.add(wrapper) it happens that the same Wrapper appears multiple times in the LazyHashSet and add returns true even tho the Wrapper is already in the set. I am currently not sure when exactly it happens, it is not always. It seems to work correctly when I do a contains check before adding.

To Reproduce

Store Objects in the LazyHashSet with no overriden equals or hashCode Method. Add the same instance multiple times. The LazyHashSet is stored with eclipse-store, but currently contains only a few objects (< 30).

Expected behavior

One Object instance only appears once in the LazyHashSet.

ZockiRR avatar Mar 17 '24 18:03 ZockiRR

Hi, I was not able to reproduce duplicates in the set yet. Could you provide a running example that reproduces the error?

hg-ms avatar Mar 19 '24 08:03 hg-ms

Hi, thanks for the efford. I can't share the whole project, but I will try to build a minimal example. Will take some time tho.

ZockiRR avatar Mar 19 '24 10:03 ZockiRR

Hi again,

here is a minimal example including a reference with contains check which is working just fine. Just execute it a few times and the test set will grow. Seems to only go wrong when the object is loaded by eclipse-store and even then not always. Example was tested/created with v1.3.1.

import org.eclipse.serializer.collections.lazy.LazyHashSet;
import org.eclipse.store.storage.embedded.types.EmbeddedStorage;
import org.eclipse.store.storage.embedded.types.EmbeddedStorageManager;

public class LazyHashSetDuplicates {
	
	public static class DataRoot {
		public LazyHashSet<Object> test = new LazyHashSet<>();
		public LazyHashSet<Object> reference = new LazyHashSet<>();
	}

	public static void main(String[] args) {
		EmbeddedStorageManager store = initEclipseStore();
		DataRoot root = (DataRoot) store.root();
		
		Object testObject = root.test.isEmpty() ? new Object() : root.test.iterator().next();
		Object referenceObject = root.reference.isEmpty() ? new Object() : root.reference.iterator().next();
		
		root.test.add(testObject);
		if (!root.reference.contains(referenceObject)) {
			root.reference.add(referenceObject);
		}
		store.storeAll(root.reference, root.test);
		System.out.println("Test set size: " + root.test.size());
		System.out.println("Test elements hashcode: " + String.join(", ", root.test.stream().mapToInt(Object::hashCode).mapToObj(String::valueOf).toList()));
		System.out.println("Reference set size: " + root.reference.size());
		System.out.println("Reference elements hashcode: " + String.join(", ", root.reference.stream().mapToInt(Object::hashCode).mapToObj(String::valueOf).toList()));
		store.shutdown();
	}
	
	private static final EmbeddedStorageManager initEclipseStore() {
		EmbeddedStorageManager store = EmbeddedStorage.start();
		if (store.root() == null) {
			store.setRoot(new DataRoot());
			store.storeRoot();
		}
		return store;
	}
}

ZockiRR avatar Mar 27 '24 21:03 ZockiRR

Hello, Many thanks for the example code, I was able to reproduce the duplicated entries.

The duplicated entries are indeed caused by the Objects hashCode method implementation that does not guarantee to return the same hash code in different executions of an application. For the current implementation of LazyHashSet it is crucial that the hash code of inserted objects is constant across all executions.

As workaround you may override the hashCode and equals methods to ensure such constant hash codes.

As improvement for future releases we should consider to change the implementation in a way that does no more require such constant hash codes.

hg-ms avatar Apr 11 '24 10:04 hg-ms

Am I right that LazyHashSet#contains() already works like that? I am currently using an explicit contains() before add() as a workaround, as shown in the example reference set, which seems to behave just fine. Either way I'd agree that overriding hashCode() and equals() is definetly the best practice here.

ZockiRR avatar Apr 12 '24 07:04 ZockiRR

Won't change the implementation. Instead added javadoc clarifying hashCode() requirements on objects added to LazyHashSet and LazyHashMap.

hg-ms avatar Sep 12 '24 09:09 hg-ms

For the sake of completeness, in case anyone falls into the same trap: This implicitly means that enum values with their default and final implementation of the hashCode() method must not be part of the key or its hash value, since the enum values, like any other object, have a different hashCode after each VM restart. As an alternative one may use the ordinal or maybe even better the names hashCode() for the hashcode of the wrapping key object. Thanks for the clarification!

ZockiRR avatar Oct 06 '24 11:10 ZockiRR