Fuel icon indicating copy to clipboard operation
Fuel copied to clipboard

Try Camillo's dictionaries

Open GoogleCodeExporter opened this issue 10 years ago • 8 comments

Look at mail thread titled "Migrated HashTable from ss to SmalltalkHub".

Original issue reported on code.google.com by [email protected] on 11 Feb 2013 at 4:12

GoogleCodeExporter avatar Mar 24 '15 16:03 GoogleCodeExporter

This issue has been automatically marked as stale because it has not had recent activity. It will remain open but will probably not come into focus. If you still think this should receive some attention, leave a comment. Thank you for your contributions.

stale[bot] avatar May 18 '21 20:05 stale[bot]

Might be nice to revisit at some point.

theseion avatar Oct 30 '21 17:10 theseion

I searched the subject in my email account and found it, but tried with Google to place here a URL to the mailing-list discussion and didn't find it.

However, I can summarize it: In Feb, 10th 2013, Stef sent an email announcing the repository migration of HashTable, then Sven asked what about it, and Camillo Bruni answered:

I think it's a dictionary implementation based on buckets (as any other decent dictionary implementation does...)

I've shown in my master thesis that a simple bucket-based dictionary can outperform our poor-mans single array collection dictionary in almost every operation. The only drawback is the slightly bigger size in memory.

http://scg.unibe.ch/archive/masters/Brun11a.pdf

I just looked at sthub and found a match: http://smalltalkhub.com/Moose/HashTable/ But I remembered there is gh/pharo-containers gathering this kind of projects and found a repo that should be a more recent successor: https://github.com/pharo-containers/Containers-HashTable

tinchodias avatar Nov 05 '21 20:11 tinchodias

It's neither. That HashTable is pretty much a slower version of the standard dictionary with some additional abstractions (e.g. uses a wrapping element). Cami's dicts are part of Pinocchio and Pinocchio includes the benchmark suite he describes in his thesis. Pretty nice!

http://squeaksource.com/@7M9PFWFa1IUf1xjv/jOl2eyM7

theseion avatar Nov 11 '21 20:11 theseion

I spent a bit of time analyzing FLLargeIdentityDictionary using Camillo's benchmarks. I noticed that the performance is generally comparable to Camillo's dictionaries but becomes really bad when inserting a large number of objects with little hash collisions, which is pretty much our standard case (since we use #identityHash). The issue is that we use a fixed number of buckets and these buckets can, therefore, become very large which, in turn, is an issue because we need to iterate over the entire bucket and check every entry before we know that it's a new entry.

The other issue I found, although I didn't look at it too much, is, that SmallInteger>>largeIdentityHash performs much worse then SmallInteger>>identityHash, both in terms of performance and (from what I saw in my brief look) hash distribution.

I will analyse a bit more and then maybe replace FLLargeIdentityDictionary with an adapted version of P4IdentityDictionary.

theseion avatar Nov 13 '21 14:11 theseion

This issue has been automatically marked as stale because it has not had recent activity. It will remain open but will probably not come into focus. If you still think this should receive some attention, leave a comment. Thank you for your contributions.

stale[bot] avatar Jan 12 '22 17:01 stale[bot]

Not stale

theseion avatar Jan 12 '22 19:01 theseion

This issue has been automatically marked as stale because it has not had recent activity. It will remain open but will probably not come into focus. If you still think this should receive some attention, leave a comment. Thank you for your contributions.

stale[bot] avatar Mar 13 '22 19:03 stale[bot]