[HELP Request] How to use multivariate softDTW barycenter?

Open RafayAK opened this issue 4 years ago • 1 comments

I have a problem similar to the one asked in 87, i.e my data set looks as follows, where each row belongs to a sequence and each row is a point in time:

sequence	x0	y0	x1	y1	z1	x2	y2	z2	...	z17	x18	y18	z18	x19	y19	z19	x20	y20	z20
01.avi	0.709314	0.595564	0.654333	0.577397	0.014865	0.599553	0.606103	0.006795	...	-0.091750	0.636288	0.801407	-0.114206	0.620451	0.853457	-0.118281	0.608881	0.899898	-0.119697
01.avi	0.719498	0.590370	0.638796	0.589439	0.032860	0.589824	0.661405	0.037496	...	-0.060961	0.666572	0.975270	-0.060586	0.655359	1.046923	-0.047032	0.649768	1.090954	-0.036204
01.avi	0.765117	0.503310	0.688468	0.490631	0.007665	0.633087	0.559083	-0.004564	...	-0.094739	0.706533	0.864469	-0.105250	0.692404	0.913337	-0.092633	0.683403	0.929996	-0.081646
02.avi	0.847238	0.123516	0.802507	0.064425	0.013631	0.762274	0.058606	0.016534	...	-0.046354	0.735729	0.338295	-0.041054	0.740595	0.307241	-0.028013	0.758677	0.274317	-0.017921
02.avi	0.837651	0.125420	0.792646	0.065793	0.023380	0.755813	0.068872	0.030797	...	-0.033827	0.746565	0.347425	-0.025514	0.745967	0.349261	-0.013675	0.754414	0.329223	-0.005807

According to the documentation (https://tslearn.readthedocs.io/en/stable/variablelength.html?highlight=variable%20lenght), I can do something like this:

# Each row in the multi-dimension array represents a time-series
# Each array in a row represents a feature, the number of arrays in a row are times the features were encountered
X = to_time_series_dataset([
    [[1,1,1], [1,1,1], [1,1,1]],   # time-series of length 3
    [[1,1,1], [1,1,1]],   # time-series of length 2
    [[4,4,4], [4,4,4]],   # # time-series of length 2
])
y = np.array([0, 0, 1])

X.shape
>> (3, 3, 3)

X
>> array([[[ 0.,  0.,  0.],
        [ 1.,  1.,  1.],
        [ 2.,  2.,  2.]],

       [[ 2.,  2.,  2.],
        [ 3.,  3.,  3.],
        [nan, nan, nan]],

       [[ 4.,  4.,  4.],
        [ 5.,  5.,  5.],
        [nan, nan, nan]]])

The time-series conversion seems good but throws an error when I try to run the following:

initial_barycenter = ts_zeros(sz=5)
bar = softdtw_barycenter(X, init=initial_barycenter)

...
...
...
~/anaconda3/envs/pytorch-build/lib/python3.8/site-packages/sklearn/metrics/pairwise.py in euclidean_distances(X, Y, Y_norm_squared, squared, X_norm_squared)
    260     paired_distances : distances betweens pairs of elements of X and Y.
    261     """
--> 262     X, Y = check_pairwise_arrays(X, Y)
    263 
    264     # If norms are passed as float32, they are unused. If arrays are passed as

~/anaconda3/envs/pytorch-build/lib/python3.8/site-packages/sklearn/metrics/pairwise.py in check_pairwise_arrays(X, Y, precomputed, dtype, accept_sparse, force_all_finite, copy)
    151                              (X.shape[0], X.shape[1], Y.shape[0]))
    152     elif X.shape[1] != Y.shape[1]:
--> 153         raise ValueError("Incompatible dimension for X and Y matrices: "
    154                          "X.shape[1] == %d while Y.shape[1] == %d" % (
    155                              X.shape[1], Y.shape[1]))

ValueError: Incompatible dimension for X and Y matrices: X.shape[1] == 1 while Y.shape[1] == 3

What am I doing wrong?

Oct 14 '21 12:10 RafayAK

I think the nested structure of the sequence is not supported. you can compute barycenter for each time-series set or compute time-series set as seven multidimensional time series instead of nesting them.

pattern1:
X1, X2, X3 = to_time_series_dataset([[1,1,1], [1,1,1], [1,1,1]]), to_time_series_dataset([[1,1,1], [1,1,1]]), to_time_series_dataset([[4,4,4], [4,4,4]])
bar = [softdtw_barycenter(X, init=initial_barycenter) for X in [X1, X2, X3]]

pattern2:
X = to_time_series_dataset(
    [
        [1,1,1], [1,1,1], [1,1,1], [1,1,1], [1,1,1], [4,4,4], [4,4,4]
    ]
)
bar = softdtw_barycenter(X, init=initial_barycenter)

Oct 25 '21 12:10 masatakashiwagi