AutoProfiler icon indicating copy to clipboard operation
AutoProfiler copied to clipboard

Improve binning of histograms

Open adamperer opened this issue 3 years ago • 2 comments

Consider re-implementing Vega's binning strategy: https://github.com/vega/vega/blob/72b9b3bbf912212e7879b6acaccc84aff969ef1c/packages/vega-statistics/src/bin.js#L23

adamperer avatar Nov 28 '22 15:11 adamperer

Is there a writeup of that binning style? I found this. https://vega.github.io/vega-lite/docs/bin.html

When I implemented binning for my project, I made an extra bin for the 1st and 99th percentile, capturing most outliers and making a fatter higher resolution middle. Ideally I would include that first and last percentile in the first and last regular bins and communicate the change via mousover for Bin sizes. the 1/99 will only ever have 2% of total values and will never have a high bar.

paddymul avatar Oct 26 '23 13:10 paddymul

So I dont know the name of the algorithm they use in vega, right now in AutoProfiler we use equal width bins. This issue was about doing something smarter to pick the number of bins. Right now we do bins = min(unique values, 20) iirc which is very simple and vega seemingly has a better approach depending on cardinality or range of data

willeppy avatar Oct 28 '23 12:10 willeppy