StatsBase.jl icon indicating copy to clipboard operation
StatsBase.jl copied to clipboard

[Feature Request] Add Standard Deviation to summarystats()?

Open noamsgl opened this issue 4 years ago • 12 comments

Feature Request: Add STD to summarystats()...

https://github.com/JuliaStats/StatsBase.jl/blob/a0e6f1e807a84a09b5f74431bb0099f4aaed5ae0/src/scalarstats.jl#L555

noamsgl avatar May 20 '21 13:05 noamsgl

I agree that it's surprising not to print the standard deviation here. Do you feel like making a pull request to add it?

nalimilan avatar Aug 31 '21 20:08 nalimilan

@nalimilan is this a dormant issue or still open for contribution?

I was thinking how about a summarystat() just like the summary function in R. It'll return different values for string/factor types just like R does...

itsdebartha avatar Mar 25 '23 07:03 itsdebartha

We need more opinions to decide what's best. In DataFrames (https://github.com/JuliaData/DataFrames.jl/pull/2459), we decided not to report standard deviations and quartiles by default so that the output fits in the screen width: one needs to do describe(df, :detailed) to get them. Here screen width isn't a problem and we already report quartiles, so maybe we could print the standard deviation too.

@bkamins @pdeffebach What do you think?

nalimilan avatar Mar 26 '23 14:03 nalimilan

In general I almost always want to see std, so I would like to have this change. The only issue is that it would be breaking. I am not sure what decision would be best. Maybe we can consider it to be mildly breaking and go for it?

bkamins avatar Mar 26 '23 21:03 bkamins

I agree I almost always want std. I would maybe call it mildly breaking? Its really only useful in interactive work.

pdeffebach avatar Mar 26 '23 23:03 pdeffebach

That would only change the printing, so that's considered non-breaking I think?

nalimilan avatar Mar 28 '23 09:03 nalimilan

printing would be changed for describe, but summarystat is an object that stores the values (the struct would need to be changed, so e.g. if someone were serializing it it would break). See https://github.com/JuliaStats/StatsBase.jl/blob/master/src/scalarstats.jl#L858

bkamins avatar Mar 28 '23 09:03 bkamins

Do we consider that adding a new field to an object is breaking though? That sounds quite restrictive.

nalimilan avatar Apr 21 '23 06:04 nalimilan

OK - let us add it.

bkamins avatar Apr 21 '23 07:04 bkamins

Will it be ok if I go on and try making a PR for this addition?

itsdebartha avatar Apr 21 '23 08:04 itsdebartha

Sure.

bkamins avatar Apr 21 '23 08:04 bkamins

Created #858

itsdebartha avatar Apr 21 '23 13:04 itsdebartha