tslearn icon indicating copy to clipboard operation
tslearn copied to clipboard

Missing data

Open rtavenar opened this issue 5 years ago • 3 comments

I received the following question by email*

Dear Romain

thanks for this toolkit. Can TSlearn handle missing data - quite a big problem in time series analysis of Earth Observation (EO) data ... my field ?

I am not 100% sure what is implied by "handle missing data", but I can try to formulate an answer:

  • tslearn does not have a missing data imputation module
  • however, tslearn can provide methods that do not rely on the assumption that series to be compared are observed at the same time stamps. For example, if only the ordering of elements matter, one could use Dynamic Time Warping. Having a look at our user guide is likely to provide some input on this (at least I hope so).

*I can no longer answer the questions regarding tslearn by email, so please post your questions as a GitHub issue to maximize your chances of getting an answer

rtavenar avatar Jun 29 '20 15:06 rtavenar

Missing data is not incompatible with variable-length time series. You can have a time series whose length is 80 with no missing data and another time series whose length is 60 with missing data. Toy example:

Capture d’écran 2020-07-02 à 09 25 34

johannfaouzi avatar Jul 02 '20 07:07 johannfaouzi

How is it not compatible? Can't you easily distinguish between "missing values" and "padding" by the location of the NaN? If it's at the end -> padding, in the middle -> missing value

GillesVandewiele avatar Jul 02 '20 07:07 GillesVandewiele

I would advocate for two different values (maybe np.nan and np.inf) to highlight the difference. But as Romain said, there is no imputation module for the moment so NaN are just used for padding values.

I said not incompatible so I think that we agree on this ^^

johannfaouzi avatar Jul 02 '20 08:07 johannfaouzi