Add new primitive: Butterworth filter
Description
A low pass filter (in this example a butterworth filter) for the preprocessing of time series data.
The outcome of such a filter should be similar to the moving aggregations, but the number of samples will not be decreased and therefore might improve the performance of the pipeline.
It takes an array of the data, that should be filtered, as input and returns another filtered array.
What I Did
I started implementing this primitive for testing purposes in the butterworth branch on my fork, which you can check out.
Concretely, I added a Primitive JSON file and a custom function in timeseries_preprocessing.py.
Any feedback on the primitive itself and the implementation would be highly appreciated.
Thanks for the proposal @AlexanderGeiger
Some thoughts and comments:
- Perhaps it would be better to split this in two parts and make these First Type (JSON only) function primitives.
We could have:
-
scipy.signal.butter.ba.json, which points atscipy.signal.butter, hasNandWnandbtypeas tunable Hyperparameters andoutput=baandanalog=Falseas fixed hyperparameter and which inputs nothing but returnsbanda, which will be set as context variables. -
scipy.signal.filtfilt.json, which points atscipy.signal.filtfilt, hasaxisas fixed hyperparameter andpadtypeandpadlenas tunable hyperparameters and which inputsa,bandXand returnsX.
-
Optionally, we would add scipy.signal.butter.zpk in the future if needed.
Doing this, no python code is needed and both primitives can be freely combined with other options.
- If we do not make them JSON Only and we build a custom python function, consider:
- not sorting the timeseries and not even requiring a time index: a single sequence without time index can be also processed. Assume that it has been already sorted before. This would, enable, for example, using this primitive right after a downsampling made by
timeseries_preprocessing.time_series_aggregation, which outputs X and the time index as two different variables. - supporting both numpy array and pandas DataFrame. This is not mandatory, but if possible, it's better if primitives support both types of inputs.
- making the output match the input format: if you are given a DataFrame with a time index column, return a DataFrame with a time index column. If you are given a 1d numpy array, return a 1d numpy array.
- not sorting the timeseries and not even requiring a time index: a single sequence without time index can be also processed. Assume that it has been already sorted before. This would, enable, for example, using this primitive right after a downsampling made by
Thanks for the feedback @csala I like the first approach and will try to implement the primitive this way.