plotly_express icon indicating copy to clipboard operation
plotly_express copied to clipboard

How to use countplot() in plotly with VAEX data frame?

Open bhargav-inthezone opened this issue 4 years ago • 9 comments

Some one please give me an alternate plotly code for this one : sns.countplot(x='Census_ProcessorClass', hue='HasDetections',data=df_train) plt.show()

both are int64

bhargav-inthezone avatar Jun 09 '21 09:06 bhargav-inthezone

This is basically px.histogram.

nicolaskruchten avatar Jun 09 '21 12:06 nicolaskruchten

This is basically px.histogram.

df_train = vaex DataFrame when I tried using this :

fig = px.histogram(df_train, x ='Census_ProcessorClass' , color = 'HasDetections', barmode = 'relative') fig.show()

I am getting this Value error : ValueError Traceback (most recent call last) in ----> 1 fig = px.histogram(df_train, x ='Census_ProcessorClass' , color = 'HasDetections', barmode = 'relative') 2 fig.show()

/opt/conda/lib/python3.7/site-packages/plotly/express/_chart_types.py in histogram(data_frame, x, y, color, facet_row, facet_col, facet_col_wrap, facet_row_spacing, facet_col_spacing, hover_name, hover_data, animation_frame, animation_group, category_orders, labels, color_discrete_sequence, color_discrete_map, marginal, opacity, orientation, barmode, barnorm, histnorm, log_x, log_y, range_x, range_y, histfunc, cumulative, nbins, title, template, width, height) 454 histnorm=histnorm, histfunc=histfunc, cumulative=dict(enabled=cumulative), 455 ), --> 456 layout_patch=dict(barmode=barmode, barnorm=barnorm), 457 ) 458

/opt/conda/lib/python3.7/site-packages/plotly/express/_core.py in make_figure(args, constructor, trace_patch, layout_patch) 1859 apply_default_cascade(args) 1860 -> 1861 args = build_dataframe(args, constructor) 1862 if constructor in [go.Treemap, go.Sunburst] and args["path"] is not None: 1863 args = process_dataframe_hierarchy(args)

/opt/conda/lib/python3.7/site-packages/plotly/express/_core.py in build_dataframe(args, constructor) 1376 1377 df_output, wide_id_vars = process_args_into_dataframe( -> 1378 args, wide_mode, var_name, value_name 1379 ) 1380

/opt/conda/lib/python3.7/site-packages/plotly/express/_core.py in process_args_into_dataframe(args, wide_mode, var_name, value_name) 1181 if argument == "index": 1182 err_msg += "\n To use the index, pass it in directly as df.index." -> 1183 raise ValueError(err_msg) 1184 elif length and len(df_input[argument]) != length: 1185 raise ValueError(

ValueError: Value of 'x' is not the name of a column in 'data_frame'. Expected one of [0] but received: Census_ProcessorClass

bhargav-inthezone avatar Jun 09 '21 13:06 bhargav-inthezone

Try converting your Vaex df to a Pandas one to see if that resolves things?

nicolaskruchten avatar Jun 09 '21 14:06 nicolaskruchten

Try converting your Vaex df to a Pandas one to see if that resolves things?

Yeah Nic I am pretty sure it will resolve the issue but it will take a lot of time and memory to convert my data into pandas dataframe. I think my system may crash.

I am looking for more efficient ways. Is there any method to make Vaex dataframe acceptable by plotly.

bhargav-inthezone avatar Jun 09 '21 14:06 bhargav-inthezone

PX doesn't natively accept Vaex data frames at the moment, no. Part of the reason for that is that for plots like these histograms, it doesn't do Python-side aggregation: all the data is sent to the browser for aggregation, so there's a bit of an upper bound on the dataset size that px.histogram can handle anyway.

nicolaskruchten avatar Jun 09 '21 14:06 nicolaskruchten

See https://github.com/plotly/plotly.py/issues/2649 for more details

nicolaskruchten avatar Jun 09 '21 14:06 nicolaskruchten

See plotly/plotly.py#2649 for more details

Thanks will check this

bhargav-inthezone avatar Jun 09 '21 15:06 bhargav-inthezone

See plotly/plotly.py#2649 for more details

Hey after lot of trail and errors, I think I found a better way. Check this code it worked

fig = px.histogram (x = df_train['Census_ProcessorClass'].tolist(), color= df_train['HasDetections'].tolist()) fig.show()

newplot

bhargav-inthezone avatar Jun 09 '21 16:06 bhargav-inthezone

See plotly/plotly.py#2649 for more details

Hey after lot of trail and errors, I think I found a better way. Check this code it worked

fig = px.histogram (x = df_train['Census_ProcessorClass'].tolist(), color= df_train['HasDetections'].tolist()) fig.show()

newplot

I found a much better method:

df_train.select(df_train['Census_ProcessorClass'] ,'Census_ProcessorClass' != 'None' ) x_axis = df_train.evaluate(df_train['Census_ProcessorClass'], selection = True) color_axis = df_train.evaluate(df_train['HasDetections'], selection = True)

%%time fig = px.histogram (x = x_axis, color= color_axis, width = 300, height = 400) fig.show() newplot (1)

CPU times: user 761 ms, sys: 33.1 ms, total: 794 ms Wall time: 811 ms

bhargav-inthezone avatar Jun 10 '21 11:06 bhargav-inthezone