When-in-Rome icon indicating copy to clipboard operation
When-in-Rome copied to clipboard

Have pandas dataframe to store slices data

Open giamic opened this issue 3 years ago • 2 comments

One of the most well-known libraries for doing statistical analysis is Pandas. This provides a nifty interface to databases basically implemented as a tabular where rows represents different instances in the dataset and columns different properties.

Pandas uses the power of numpy arrays to provide a lot of convenience function to do statistical analysis. It is easy to take average and standard deviation of different quantities, group the dataset by values, select only the data that satisfies certain conditions, and even visualise the data in plots. If we want to give users the ability to do meaningful and easy statistical analysis in a programmatic way, I think that pandas is the way to go.

The downside is that pandas is a bit bulky and adding that dependency is a bit heavy. However, it is easily accessible everywhere (conda and pip).

giamic avatar Dec 14 '22 12:12 giamic

Thanks for this. I'm minded to go for it. Also, with https://github.com/MarkGotham/When-in-Rome/pull/73, we're already doing so ;) Any dissenting voices @malcolmsailor, @napulen? I'd also welcome your views @jonnybluesman on all the current dev., especially as it pertains to integration with ChoCo. Thanks all.

MarkGotham avatar Dec 14 '22 14:12 MarkGotham

Indeed, I've been using pandas all along during machine learning experiments. If useful, a pandas-exported collection of TSV files can be found here

These files have an excess of information but they illustrate how all the slices and annotation data can be chunked into a single tsv file.

napulen avatar Jan 03 '23 01:01 napulen