Missing data
I received the following question by email*
Dear Romain
thanks for this toolkit. Can TSlearn handle missing data - quite a big problem in time series analysis of Earth Observation (EO) data ... my field ?
I am not 100% sure what is implied by "handle missing data", but I can try to formulate an answer:
-
tslearndoes not have a missing data imputation module - however,
tslearncan provide methods that do not rely on the assumption that series to be compared are observed at the same time stamps. For example, if only the ordering of elements matter, one could use Dynamic Time Warping. Having a look at our user guide is likely to provide some input on this (at least I hope so).
*I can no longer answer the questions regarding tslearn by email, so please post your questions as a GitHub issue to maximize your chances of getting an answer
Missing data is not incompatible with variable-length time series. You can have a time series whose length is 80 with no missing data and another time series whose length is 60 with missing data. Toy example:
How is it not compatible? Can't you easily distinguish between "missing values" and "padding" by the location of the NaN? If it's at the end -> padding, in the middle -> missing value
I would advocate for two different values (maybe np.nan and np.inf) to highlight the difference. But as Romain said, there is no imputation module for the moment so NaN are just used for padding values.
I said not incompatible so I think that we agree on this ^^