Sort is causing a loss of records
I'm incorporating FileBasedCollection into a project I'm working on, I've been trying to track down a problem I'm having and it seems to be related to the sort mechanism dropping records. My FileBasedCollection is of a custom object.
I'm looping through my source material and coming up with a list that contains 11698533 records. I issue a .sort() on that list and come up with a list that contains 778088 records.
After some troubleshooting I have determined that the .sort is eliminating duplicates from the list.
I simplified everything down and created a FileBasedCollection and added 3 records to the list
Test1 Test3 Test2
I issue a sort and it puts them in the order I expect.
Test1 Test2 Test3
If I replace Test2 with a second Test1
Test1 Test3 Test1
the output after sort is
Test1 Test3
So....before sort I had 3 elements in my list, I would expect to have 3 in my list AFTER sort...with both of my Test1's lined up one after the other....but it appears to be doing some sort of duplicate check and removing the duplicates.
What can be done to correct this?
The FileBasedCollection uses a TreeSet to sort elements. The behaviour you observed, that duplicates are eliminated, is a consequence of adding the element to the Set (see line 471). To my knowledge, there is no standard Java Collection that allows duplicates while sorting. You may want to implement the Comparable interface for your objects and compare object references, for example.
I've been able to work around this issue by ensuring that my comparator never returns a duplicate value, but I've looked everywhere and found nothing in the documentation that states that duplicate values will be eliminated....so it would be nice if that was either documented, or fixed :)
I'll add a note in the documentation.