plotnine
plotnine copied to clipboard
geom_bar (stacked bars) is slow
First, thanks for the nice package.
I was working on a problem and noticed the plotnine implementation of geom_bar is slow especially for stacked bar plots. Below is an example:
from plotnine import ggplot, geom_bar, aes
import pandas as pd
import random
n = 4000
data = pd.DataFrame(
{
"a": range(0, n),
"u": random.sample(range(1, 2 * n), n),
"v": random.sample(range(1, 2 * n), n),
"w": random.sample(range(1, 2 * n), n),
"x": random.sample(range(1, 2 * n), n),
"y": random.sample(range(1, 2 * n), n),
}
)
data1 = pd.melt(data, id_vars=["a"], var_name="sty", value_name="value")
p = ggplot() + geom_bar(aes(x="a", weight="value", fill="sty"), data=data1)
%timeit p.save("test.pdf")
On my machine, it takes about 30 seconds to make a plot. Yet, the similar code in R would takes only about 5 seconds. The R code is attached below.
library(ggplot2)
library(data.table)
library(rbenchmark)
n = 4000
data = data.table(a=1:n,
u=runif(n), v=runif(n), w=runif(n), x=runif(n), y=runif(n))
data1 = melt(data, id.vars="a", variable.name="sty", value.name="value")
p = ggplot() + geom_bar(aes(x=a,weight=value,fill=sty), data=data1)
benchmark(ggsave("test1.pdf", p), replications=10)
I too notice the same issue for stacked bar plot using this
p = (ggplot(df, aes(x='var', y='result', fill = 'op', label='op'))
+ geom_bar(stat='identity', position='stack')
+ ggtitle("title")
+ xlab("xlabel")
+ ylab("ylabel")
+ scale_fill_manual(values=cbbPalette)
+ scale_x_discrete(labels = ""))
the time for saving the figure as a BinaryIO object using p.save(buf, verbose = False) can be around several minutes and the time taken increases depending on the size of df. Please suggest if there is any workaround for this.