Bar chart: handle overlapping bars
When a dataset is loaded that has multiple values for an attribute, the bar chart just overplots a bunch of bars on top of each other. Ideally the bar chart should separate these into subplots... (may need an additional option?)
I feel like the default should be to make multiple bars for the name value, side by side. Making subplots requires more assumptions - i.e. that all values are duplicated the same number of times with a different secondary attribute value for each.
Yes, sorry, that's what I meant by "subplot"—not heavy separations with their own axes; just putting multiple bars side by side, with a little padding between categories so that it's clear that bars from neighboring categories don't flow together.
Looking this over a bit more carefully, I think this issue and #420 are slightly different.
- Case 1. (#420) x-values are distinct per record, but y-values are in separate columns and we want to plot each column, like how LineChart handles multiple y-values.
- Case 2. (this issue) A general case where x-values are not distinct (like say using petal-width in the Iris data as x-values where some happen to match up).
- Case 3. (no current issue) x-values are not distinct for one column, but a secondary column can be used as the within-group delineator. For example, in the stock data, the records are (date, symbol, price). Using date as the x-value will have multiple bars at the same location, but further adding a "group" parameter as stock symbol could split out the bars by symbol for each date.
I see Case 2 as quite difficult to support, because each group of bars could be a different cardinality in general - a scale nightmare. A histogram or box plot would be better at summarizing these types of groups of different sizes.
Supporting Case 1 means turning the y parameter into a list. This seems feasible.
Supporting Case 3 means adding an optional x-axis column parameter called something like "group". Also feasible.
Supporting 1 and 3 at the same time seems difficult and would require a multi-tier grouping. I'd propose we only support one or the other and give a warning/error if both are supplied.
I agree with supporting only one at a time, but allowing both cases to be addressed separately. Further agreed that case 2 is pretty intractable.
On Oct 7, 2016, at 10:29 PM, Jeff Baumes [email protected] wrote:
Looking this over a bit more carefully, I think this issue and #420 are slightly different.
Case 1. (#420) x-values are distinct per record, but y-values are in separate columns and we want to plot each column, like how LineChart handles multiple y-values. Case 2. (this issue) A general case where x-values are not distinct (like say using petal-width in the Iris data as x-values where some happen to match up). Case 3. (no current issue) x-values are not distinct for one column, but a secondary column can be used as the within-group delineator. For example, in the stock data, the records are (date, symbol, price). Using date as the x-value will have multiple bars at the same location, but further adding a "group" parameter as stock symbol could split out the bars by symbol for each date. I see Case 2 as quite difficult to support, because each group of bars could be a different cardinality in general - a scale nightmare. A histogram or box plot would be better at summarizing these types of groups of different sizes.
Supporting Case 1 means turning the y parameter into a list. This seems feasible.
Supporting Case 3 means adding an optional x-axis column parameter called something like "group". Also feasible.
Supporting 1 and 3 at the same time seems difficult and would require a multi-tier grouping. I'd propose we only support one or the other and give a warning/error if both are supplied.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or mute the thread.