vscode-data-table icon indicating copy to clipboard operation
vscode-data-table copied to clipboard

Data Summary - Mean, Median and Standard Deviation usage

Open MsSQLGirl opened this issue 4 years ago β€’ 2 comments

Hello,

Data Summary works well in summarizing categorical data, i.e. showing the number of categories and which values are more popular. For example, below it shows the number of State values (i.e. 3 categories) and when you hover you get "TEXAS" as the most popular state. image

In the above example, I do also have a numerical column, i.e. EventCount. However, it seems that descriptive stats (Mean, median and SD) are not applied to this. Is this expected?

Sample notebook for repro: https://github.com/MsSQLGirl/jubilant-data-wizards/blob/main/Simple%20Demo/DotNetInteractive%20Notebooks/DotNetConfDemo2021.ipynb

Thanks!

MsSQLGirl avatar Dec 27 '21 18:12 MsSQLGirl

I'll need to make sure numeric data is properly typed in that dataset passed to Data Summary view.

You should see those values and a bar chart similar to the lon/lat fields below instead: https://observablehq.com/@randomfractals/data-table-viewer?dataUrl=https://raw.githubusercontent.com/vega/vega-datasets/master/data/us-state-capitals.json

image

@MsSQLGirl I'll see if other sample datasets with typed data behave the same way. Most likely I'll have to infer date and numeric fields in JSON and CSV, or JSON array data when data is loaded in the Data Summary renderer by inspecting the first 10 rows or so and converting numeric string data fields to proper JS Date or number types for that to work.

I'll investigate. thanks for pointing this out.

RandomFractals avatar Dec 28 '21 12:12 RandomFractals

Will try using d3.autoType when parsing CSV data from cell output to create proper number and date type values:

https://github.com/d3/d3-dsv#autoType

RandomFractals avatar Jan 28 '22 11:01 RandomFractals