plot icon indicating copy to clipboard operation
plot copied to clipboard

Added density reducer to bin/group/hexbin

Open wirhabenzeit opened this issue 1 year ago • 2 comments

This is a pull request addressing https://github.com/observablehq/plot/issues/1940, see also the discussion in https://talk.observablehq.com/t/normalised-histogram-in-observable-plot/8576

Basically I added a new reduceDensity reducer which computes the density per series/facet. This is slightly different from the reduceProportion (with or without the facet scope) reducer (when not supplied with a value)

  • reduceDensity does normalise by the bin size
  • reduceDensity normalises by group rather than globally or by facet.

I forked the notebook https://observablehq.com/@fil/plot-normalized-histograms here https://observablehq.com/d/fb0d876105777d59 to illustrate the functionality. I also added a test plot in test/plots/density-reducer.ts

The main change to existing code is that in src/transforms/group.js, src/transforms/bin.js and src/transforms/hexbin.js need to call the reducer scope on a per group level as in

for (const o of outputs) o.scope("group", I);

I noticed that this new density reducer can also be used as a replacement for proportion-facet in

  • test/plots/athletes-sport-weight.ts
  • test/plots/hexbin-r.ts

where (at least to me) it also makes sense semantically.

Among the tests this would leave

  • test/plots/penguin-species-island-relative.ts

as the only place where proportion-facet is used. Here density does not makes sense on a group level due to the grouping by fill.

One could also consider a more customisable reducer allowing to specify the normalisation scope, but I could not come up with a satisfying syntax. Something like

..., y: {reducer: "density", scope: "group"}, ...

feels a bit too verbose given that most reducers do not have/need any scope.

wirhabenzeit avatar Apr 10 '24 07:04 wirhabenzeit

@Fil thanks for the review, will look into it! One question regarding

I think we can extend the concept to weighted density when the original channel is a value?

What do you mean by this, basically like proportion-facet but per series? What is the expected output of something like

Plot.binX({y: "density"}, {x: "xVar", y: "yVar", "stroke": "strokeVar"})

in this case? Is it the sum of the values in the bin for a given series, divided by the sum over the whole series, i.e.

sum( d.y | d.x in bin, d.z = z ) / sum( d.y | d.z = z )

or something else? I am not sure normalising by bin area makes sense in this case?

wirhabenzeit avatar Apr 16 '24 10:04 wirhabenzeit

Normalizing with weights is for example, when a data point is a city, and represents anywhere from 1,000 to 1 million inhabitants. You want the same chart as you would have if you had one point per inhabitant.

Fil avatar Apr 16 '24 11:04 Fil