tslearn
tslearn copied to clipboard
[HELP Request] How to use multivariate softDTW barycenter?
I have a problem similar to the one asked in 87, i.e my data set looks as follows, where each row belongs to a sequence and each row is a point in time:
| sequence | x0 | y0 | z0 | x1 | y1 | z1 | x2 | y2 | z2 | ... | z17 | x18 | y18 | z18 | x19 | y19 | z19 | x20 | y20 | z20 |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 01.avi | 0.709314 | 0.595564 | 0.0 | 0.654333 | 0.577397 | 0.014865 | 0.599553 | 0.606103 | 0.006795 | ... | -0.091750 | 0.636288 | 0.801407 | -0.114206 | 0.620451 | 0.853457 | -0.118281 | 0.608881 | 0.899898 | -0.119697 |
| 01.avi | 0.719498 | 0.590370 | 0.0 | 0.638796 | 0.589439 | 0.032860 | 0.589824 | 0.661405 | 0.037496 | ... | -0.060961 | 0.666572 | 0.975270 | -0.060586 | 0.655359 | 1.046923 | -0.047032 | 0.649768 | 1.090954 | -0.036204 |
| 01.avi | 0.765117 | 0.503310 | 0.0 | 0.688468 | 0.490631 | 0.007665 | 0.633087 | 0.559083 | -0.004564 | ... | -0.094739 | 0.706533 | 0.864469 | -0.105250 | 0.692404 | 0.913337 | -0.092633 | 0.683403 | 0.929996 | -0.081646 |
| 02.avi | 0.847238 | 0.123516 | 0.0 | 0.802507 | 0.064425 | 0.013631 | 0.762274 | 0.058606 | 0.016534 | ... | -0.046354 | 0.735729 | 0.338295 | -0.041054 | 0.740595 | 0.307241 | -0.028013 | 0.758677 | 0.274317 | -0.017921 |
| 02.avi | 0.837651 | 0.125420 | 0.0 | 0.792646 | 0.065793 | 0.023380 | 0.755813 | 0.068872 | 0.030797 | ... | -0.033827 | 0.746565 | 0.347425 | -0.025514 | 0.745967 | 0.349261 | -0.013675 | 0.754414 | 0.329223 | -0.005807 |
According to the documentation (https://tslearn.readthedocs.io/en/stable/variablelength.html?highlight=variable%20lenght), I can do something like this:
# Each row in the multi-dimension array represents a time-series
# Each array in a row represents a feature, the number of arrays in a row are times the features were encountered
X = to_time_series_dataset([
[[1,1,1], [1,1,1], [1,1,1]], # time-series of length 3
[[1,1,1], [1,1,1]], # time-series of length 2
[[4,4,4], [4,4,4]], # # time-series of length 2
])
y = np.array([0, 0, 1])
X.shape
>> (3, 3, 3)
X
>> array([[[ 0., 0., 0.],
[ 1., 1., 1.],
[ 2., 2., 2.]],
[[ 2., 2., 2.],
[ 3., 3., 3.],
[nan, nan, nan]],
[[ 4., 4., 4.],
[ 5., 5., 5.],
[nan, nan, nan]]])
The time-series conversion seems good but throws an error when I try to run the following:
initial_barycenter = ts_zeros(sz=5)
bar = softdtw_barycenter(X, init=initial_barycenter)
...
...
...
~/anaconda3/envs/pytorch-build/lib/python3.8/site-packages/sklearn/metrics/pairwise.py in euclidean_distances(X, Y, Y_norm_squared, squared, X_norm_squared)
260 paired_distances : distances betweens pairs of elements of X and Y.
261 """
--> 262 X, Y = check_pairwise_arrays(X, Y)
263
264 # If norms are passed as float32, they are unused. If arrays are passed as
~/anaconda3/envs/pytorch-build/lib/python3.8/site-packages/sklearn/metrics/pairwise.py in check_pairwise_arrays(X, Y, precomputed, dtype, accept_sparse, force_all_finite, copy)
151 (X.shape[0], X.shape[1], Y.shape[0]))
152 elif X.shape[1] != Y.shape[1]:
--> 153 raise ValueError("Incompatible dimension for X and Y matrices: "
154 "X.shape[1] == %d while Y.shape[1] == %d" % (
155 X.shape[1], Y.shape[1]))
ValueError: Incompatible dimension for X and Y matrices: X.shape[1] == 1 while Y.shape[1] == 3
What am I doing wrong?
I think the nested structure of the sequence is not supported. you can compute barycenter for each time-series set or compute time-series set as seven multidimensional time series instead of nesting them.
pattern1:
X1, X2, X3 = to_time_series_dataset([[1,1,1], [1,1,1], [1,1,1]]), to_time_series_dataset([[1,1,1], [1,1,1]]), to_time_series_dataset([[4,4,4], [4,4,4]])
bar = [softdtw_barycenter(X, init=initial_barycenter) for X in [X1, X2, X3]]
pattern2:
X = to_time_series_dataset(
[
[1,1,1], [1,1,1], [1,1,1], [1,1,1], [1,1,1], [4,4,4], [4,4,4]
]
)
bar = softdtw_barycenter(X, init=initial_barycenter)