SynopticPy icon indicating copy to clipboard operation
SynopticPy copied to clipboard

Rewrite using Polars

Open blaylockbk opened this issue 1 year ago • 2 comments

I'm inclined to re-write this using Polars. I love Polars!

  • [x] Load data into Polars DataFrames.
  • [ ] Timeseries data will return be in one dataframe rather than a list of dataframes. (Use categorical dtype for some columns like STID, TIMEZONE, etc.)

blaylockbk avatar May 08 '24 15:05 blaylockbk

Note to self:

Saving a JSON copy of the returned data that for 18 stations, all variables for 1 month is ~25 MB on disk. Organizing the data into a Polars DataFrame and saved to Parquet is 131KB.

blaylockbk avatar Aug 29 '24 05:08 blaylockbk

I'm making great progress on this. Need a todo list

Code

  • [x] Data will be provided in long format by default. Add optional argument to pivot the data.
  • [ ] add optional argument to return data as Pandas data frame (for those users who prefer pandas, but I'm telling you that I am fully on the polars bandwagon; I don't use pandas anymore)
  • [ ] add optional argument with_latency
  • [ ] basic plotting for summary; seaborn will be an optional dependency

Docs

  • [x] rewrite docs with new examples
  • [ ] examples of doing rolling/resample windows
  • [ ] examples of pivot long to wide format
  • [ ] examples of plotting with seaborn
  • [ ] show users how to save to parquet (and the benefits of doing so)
  • [ ] rewrite readme

GitHub

  • [x] explain big overhaul to users. The entire package is a breaking change; it's practically a new package. Reasons: improve maintainability, I wanted to learn polars, I am learning class Inheritance, long format data frame makes more sense to me.

blaylockbk avatar Sep 06 '24 13:09 blaylockbk