tslearn icon indicating copy to clipboard operation
tslearn copied to clipboard

Reduce memory usage of DTW warping path matrix via using sparse matrix in scipy.

Open PercyLau opened this issue 5 years ago • 5 comments

I am sometimes frustrated that tslearn cannot handle extremely large time series data set because of running out of memory in big-data sets, e.g., 100k time series samples each of which is with over 1000 ticks.

A simple solution is to use scipy.sparse.coo_matrix instead of the common dense numpy's arrays in _subgradient_valence_warping to get the dtw warping matrix list_w_k. I think this improvement perhaps work in most of big-data cases.

There might be other possible improvements for reducing memory and computation costs, and I will be happy to tackle them if convenient.

PercyLau avatar Oct 09 '20 06:10 PercyLau

I think this could be a great addition indeed! What would the impact on the runtime be if you use sparse instead of dense matrices? Perhaps we could make this (dense/sparse) configurable by the user?

GillesVandewiele avatar Oct 10 '20 09:10 GillesVandewiele

I think this could be a great addition indeed! What would the impact on the runtime be if you use sparse instead of dense matrices?

The first purpose is to reduce the memory consumption of the majority minimization algorithms for calculating centroids. The previous implementation may encounter memory explosion if input time-series dataset becomes quite large, let's say n samples of t length, since each sample require t^2 space to store one DTW warping matrix. These warping matrices are row-sparse, each row only involving very one non-zero elements. Thus, row-sparse representation is quite suit for this case.

Indeed, this perhaps benefits runtime also. But nonetheless, besides using sparse matrix, I also improve dtw_path() with numba as something like static function in C++. Putting them together, we may observe 20% - 40% speedup in the calculating average DBA path and a magnitude of memory reduction, for 1000 samples of 1000-length 10-feature time series.

Perhaps we could make this (dense/sparse) configurable by the user?

Sure, it is not a big issue. But I am just curious any advantages of dense data structure of DTW warping matrix?

PercyLau avatar Oct 11 '20 09:10 PercyLau

Hi,

So, you are talking about DBA implementation, right? (was not clear in the first post, but your last post seem to clarify it). I would be interested in anyone willing to showcase the improvement one can get, but I am afraid that the memory consumption implied by storing the path in a dense matrix is the same as the one we have inside the DTW algorithm to store partial DTW costs, so I suspect the improvement is likely to be marginal, but I might be wrong.

rtavenar avatar Oct 12 '20 06:10 rtavenar

Yes, I am talking about DBA implementation. In my use case, i.e., time series clustering and classification over large dataset, DBA implementation very dominates tslearn performance. Let's be more specific. I found in my case sometimes DBA implementation runs out of memory caused by the dense matrix list_w_k as follows,

https://github.com/tslearn-team/tslearn/blob/e78ba4b0594acf3b3a901054e941d2dcea57975c/tslearn/barycenters.py#L325-L334

I think barycenter_size and input sz_k can be extremely large in real world. Thus, Current list_w_k perhaps is inefficient to do arithmetic operations. After replacing numpy dense matrix with scipy.sparse.csr_matrix, the RAM usage reduces in the order of magnitudes, although runtime decreasing only if sz_k, barycenter_size are large enough, e.g., over 10000, which is expected. As for sz_k, barycenter_size are less than 100, the runtime becomes at most two times if using sparse matrix. But I think we can speedup sparse matrix by using numba and native C.

Here is a minimal script to reproduce my results.

import numpy as np
import tslearn.barycenters
import timeit
n, sz, d = 100, 10000, 10 # 100, 1000, 10 OR 100,100,10
rng = np.random.RandomState(123)
def mm():
   ts = rng.randn(n,sz,d)
   tslearn.barycenters.dtw_barycenter_averaging(ts)

res = timeit.repeat("mm()","from __main__ import mm", number = 1)
print(res)

The remained problem here is, scipy.sparse module is not compatible with numba, therefore, accerleration DBA is not so easy (see https://github.com/numba/numba-scipy/issues/29). Perhaps, we need re-implement sparse_matrix ourselves to leverage numba.

PercyLau avatar Oct 15 '20 04:10 PercyLau

A related question. There is a subgradient-based version of DBA_averaging in tslearn. A subgradient-based algorithm is quite suitable for handling large data set. But I have not found any other modules of tslearn import this subgradient-based algorithm for calculating barycenters. Perhaps, it is because it often outputs weird DBA loss, which is increasing instead of being decreasing. Any ideas about this subgradient-based algorithm?

PercyLau avatar Oct 29 '20 03:10 PercyLau