SynopticPy
SynopticPy copied to clipboard
Rewrite using Polars
I'm inclined to re-write this using Polars. I love Polars!
- [x] Load data into Polars DataFrames.
- [ ] Timeseries data will return be in one dataframe rather than a list of dataframes. (Use categorical dtype for some columns like STID, TIMEZONE, etc.)
Note to self:
Saving a JSON copy of the returned data that for 18 stations, all variables for 1 month is ~25 MB on disk. Organizing the data into a Polars DataFrame and saved to Parquet is 131KB.
I'm making great progress on this. Need a todo list
Code
- [x] Data will be provided in long format by default. Add optional argument to pivot the data.
- [ ] add optional argument to return data as Pandas data frame (for those users who prefer pandas, but I'm telling you that I am fully on the polars bandwagon; I don't use pandas anymore)
- [ ] add optional argument with_latency
- [ ] basic plotting for summary; seaborn will be an optional dependency
Docs
- [x] rewrite docs with new examples
- [ ] examples of doing rolling/resample windows
- [ ] examples of pivot long to wide format
- [ ] examples of plotting with seaborn
- [ ] show users how to save to parquet (and the benefits of doing so)
- [ ] rewrite readme
GitHub
- [x] explain big overhaul to users. The entire package is a breaking change; it's practically a new package. Reasons: improve maintainability, I wanted to learn polars, I am learning class Inheritance, long format data frame makes more sense to me.