How to properly apply y limits on the entire plot in a stacked bar plot?
I have this data which represents the sales decomposition from high-level to granular, left to right (hence sum over the index column should add up to "total_revenue"):
data = np.array([['total_revenue', 137576.4, 0],
['non-offer_revenue', 136261.41, 1],
['offer_revenue', 1314.99, 1],
['non-offer_revenue', 136261.41, 2],
['baseline_revenue', 24.81, 2],
['incremental_sales', 1290.18, 2]])
df = pl.DataFrame(data, schema={'sales_type': str, 'sales_value': float, 'index': int})
However, due to the nature of the data, there is drastic difference in the values of sales_value. Hence, when visualizing it via a stacked bar plot as follows:
(ggplot(df)
+ geom_bar(aes(x='index', fill='sales_type', y='sales_value'), stat='identity',
tooltips=layer_tooltips(['sales_value']).title('^fill'),
labels=layer_labels(['sales_value'])
)
+ theme(title=element_text(face='bold'), axis_title='blank', axis_ticks='blank', axis_text_x='blank')
+ labs(title='Offer Sales Decomposition', fill='Sales Type:')
)
Some of the stacked bars on the top are way too small compared to the bars below:
I tried to "refactor" this by lower bounding the y limit to "zoom in" and highlight the small, stacked bars on top by doing:
+ ylim(120000, 138000), but that applied the y limit constraint on each bar individually in the stacked bar plot, making the bars with small y values disappear instead!
I tried adding + scale_y_continuous(breaks={f'{i}': i for i in range(120000, 138001, 1000)}) but that didn't shift the y scale up either.
I can't apply a log scale on the y values either because then the stacked values won't add up to the same total amount.
Any workaround for this?
Hi, yes, it seems lims aren't working for y-axis in barchart. See related issue: https://github.com/JetBrains/lets-plot-kotlin/issues/219
However, stacked bars poses a kind of discrepancy between user's expectations and how scale's limits work. Scale limits by definition do indeed cause dropping of data-points that laying beyond the limits (I.e. all revenues below 120,000 in your example), which is rather unexpected.
As a workaround try applying coordinate system limits: + coord_cartesian(ylim=(120000, 138000))
Hey @alshan !
Thanks for your prompt response, the workaround worked exactly as expected!
Though, what's the difference between adding + ylim(ymin, ymax) vs + coord_cartesian(ylim(ymin, ymax)), how are they different under the hood? Is it that coord_cartesian() changes the viewport to the specified y-axis limits, while ylim() sort of acts like a threshold limit on the individual layers of a plot?
That's right, "coord limits" work as a visual zooming.
With "scale limits", the key difference is that setting these limits discards all data outside the range.
Here is the relevant chapter: Zooming into a plot with coord_cartesian().
@alshan Hey!
One more related issue: When attaching numerical labels to corresponding stacked bars via labels=layer_labels(['sales_value']), it works fine, but upon altering the viewport limits via coord_cartesian(ylim=(min, max)), the labels on the lowest bars are not visible anymore because they sink to the very bottom, while the highest bars' labels are positioned at the very top, which affects the readability of the plot.
I tried to manually add the labels via geom_text() but it requires the addition of a custom "height" column in the dataframe (in this stacked bar plot's case) and some tinkering around. Plus, I believe its vjust property is not working as expected; because in the docs it says: "vjust : vertical text alignment. Possible values: ... or number between 0 (‘bottom’) and 1 (‘top’)." However, the only numerical values that actually take effect are either 0 or 1, number in between are simply ignored.
While the labels property is very useful and convenient, we have no control over the label's placement/positioning.
Adding what's equivalent to the vertical-align property in CSS would be very helpful, and maybe the nudge_y property from geom_text() as well!
Hi, regarding annotation labels on bars - congrats, you nailed a bug :) : https://github.com/JetBrains/lets-plot/issues/981
As for geom_text(), the key thing you have to do is to use the "group" aesthetic to achieve labels stacking. On a stacked bar-chart that is.
Consider this example: https://nbviewer.org/github/JetBrains/lets-plot-docs/blob/master/source/examples/cookbook/position_stack.ipynb
Note geom_label(aes(..., group="year"), ... in Out [14], [15].
However, the only numerical values that actually take effect are either 0 or 1, number in between are simply ignored.
Could you provide a minimal example? In the demo above value 0.5 works as expected (see Out [15])
While the labels property is very useful and convenient, we have no control over the label's placement/positioning.
Adding what's equivalent to the vertical-align property in CSS would be very helpful, and maybe the nudge_y property from geom_text() as well!
Yes, we are planning to expand annotations API in this direction. Hopefully sooner than later.
Hey @alshan !
Regarding the use of the group aesthetic, to be honest, I still don't quite understand how it works; I understand it's supposed to logically separate and group the layers on a plot based on the group value, but I don't understand why you wouldn't just use color or fill instead.
Anyways, when adding it in the example mentioned in the first post:
(ggplot(df)
+ geom_bar(aes(x='index', fill='sales_type', y='sales_value'), stat='identity',
tooltips=layer_tooltips(['sales_value']).title('^fill'),
)
+ theme(title=element_text(face='bold'), axis_title='blank', axis_ticks='blank', axis_text_x='blank')
+ labs(title='Offer Sales Decomposition', fill='Sales Type:')
+ geom_text(
aes(label='sales_value', group='sales_type'),
label_format='{0.2f}k',
position='stack',
)
)
It doesn't quite do anything (even with adding vjust for position).
The way I'm doing it right now is by addting a "height" column to my dataframe (using the df from above):
df = df.with_columns(pl.cum_sum('sales_value').over('index').alias('height'))
I then use that height in geom_text aesthetic to "properly" position the text labels as such:
+ coord_cartesian(ylim=(136100, 137600))
+ geom_text(
aes(x='index', label='sales_value', y='height'),
label_format='{0.2f}k',
)
)
And regarding the use of vjust, I was referring to the vjust property within the geom_text() function itself:
+ geom_text(
aes(x='index', label='sales_value', y='height'),
label_format='{0.2f}k',
vjust=1,
)
that way, it doesn't work as expected:
- the value of 0 (corresponding to 'bottom') is not behaving as expected compared to a value of 1 ('top')
- it only understands either 0 or 1; values in between are simply the default. However, using
position=position_stack(vjust=.5)does work
Thanks a lot!
but I don't understand why you wouldn't just use color or fill instead.
You are right, as long as you map color aesthetic in geom_text() layer (color or fill in geom_label()) on a discrete variable, it will also create groups.
However, if you don't want to map text color to a variable than you will have to use the "group" aesthetic.
In your code snippet:
+ geom_text(
aes(label='sales_value', group='sales_type'),
label_format='{0.2f}k',
position='stack',
)
You forgot to add label coordinates: aes(..., x='index', y='sales_value')
Alternatively, you can move this mapping from "bar" to the root:
ggplot(df, mapping=aes(x='index', y='sales_value'))
so that both layers could share it.
And regarding the use of vjust, I was referring to the vjust property within the geom_text() function itself:
As far as I can see you don't have position="stack" here. Thus vjust has no effect.
The way I'm doing it right now is by addting a "height" column to my dataframe
Clever trick ) Hopefully you wan't need it.
Hey @alshan ,
Thanks for your help and comments as usual!
Sorry, I know this discussion is quite old, but I would like to point out, that in this case, where you have drastic differences between stacked bar values (heights), the current best solution (given that labels=layer_labels(['sales_value']) doesn't work with coord_cartesian(), yet) to display each bar's values is to use a mix of coord_cartesian(), aes(group=""), and position="stack". Here is how the complete code looks like (with df defined the same as in the first comment):
(ggplot(df)
+ geom_bar(
aes(x='index', fill='sales_type', y='sales_value'), stat='identity',
tooltips=layer_tooltips(['sales_value']).title('^fill')
)
+ geom_text(
aes(label='sales_value', group='sales_type', y='sales_value', x='index'),
label_format='{0.2f}k',
position="stack",
vjust='top',
size=6
)
+ coord_cartesian(ylim=(136100, 137650))
+ theme(title=element_text(face='bold'), axis_title='blank', axis_ticks='blank', axis_text_x='blank')
+ labs(title='Offer Sales Decomposition', fill='Sales Type:')
)
Which produces the following plot:
However, this is still suboptimal.
Would it perhaps be possible if the labels=layer_labels() class on bar plots worked the same to how it works in the case of geom_pie() (where the label annotations for small slices is displayed outside of the geom itself?) , such as in here?
(P.S. when adding nudge_y to the code snippet above (no matter the value), some label placements disappear)
Hi @OSuwaidi
this overlapping
you can try to fix using
vjust in position_stack:
geom_text(position=position_stack(vjust=0.9))
labels=layer_labels(['sales_value']) doesn't work with coord_cartesian(), yet
This is fixed already - will work in the next release.
Would it perhaps be possible if the labels=layer_labels() class on bar plots worked the same to how it works in the case of geom_pie() (where the label annotations for small slices is displayed outside of the geom itself?) , such as in here?
Maybe. This situation we couldn't figure out how to handle on bar-charts.
(P.S. when adding nudge_y to the code snippet above (no matter the value), some label placements disappear)
Thanks, this likely a bug.
Hi @OSuwaidi , the issue with "scale limits" was fixed in v4.3.0. Note however that "coord system limits" is still a better way to zoom charts.
Also, since v4.3.0, you don't have to set the upper limit here : + coord_cartesian(ylim=(136100, 137650))
Try just: + coord_cartesian(ylim=(136100, None)).
Hey @alshan 👋🏼!
Big, great changes in v4.3.0!
Yet to try out and test all of them. The coord_cartesian(ylim=()) works beautifully with layer_labels() now. And while I like the feature idea of coord_cartesian(ylim=(136100, None)) to not explicitly define an upper limit (especially important in dynamic settings), in my experience, it tends to add too much white space above the bar plot, it's quite unpredictable.
Great changes overall!
in my experience, it tends to add too much white space above the bar plot, it's quite unpredictable.
The space above is the effect of the scale "expand". By default, continuous Y-scale has a multiplicative expand 0.05 (i.e. scale domain * 0.05).
You can try to remove multiplicative expand and set additive expand.
For example, scale_y_continuous(expand=[0, 1000]) will create a $1000 worth space above bars.