xeofs icon indicating copy to clipboard operation
xeofs copied to clipboard

Broadcasting dimensions with `xr.Dataset`

Open nicrie opened this issue 2 years ago • 0 comments

Combining xr.Dataset as input with both multi-dimensional sample and feature dimensions will broadcast dimensions thus yielding components with inflated dimensions. The broadcasted dimensions are filled with NaN and results seem right. ideally, however, this broadcasting shouldn't happen and should be avoided.

In a nutshell, instead of obtaining components like the following

xarray.Dataset
    Dimensions: (sample1: 2, feature1: 2, feature2: 3)
    Coordinates:  
        sample1  (sample1)  int64  1 2
        feature1  (feature1)  <U1  'a' 'b'
        feature2  (feature2)  int64  0 1 2
    Data variables:
        da1  (sample1, feature1, feature2)   int64    0 1 2 3 4 5 6 7 8 9 10 11
        da2  (sample1, feature1)   int64    0 3 6 9
    Indexes: (3)
    Attributes: (0)

we currently get

xarray.Dataset
    Dimensions: sample1: 2,  feature1: 2,  feature2: 3
    Coordinates:
        sample1 (sample1)  int64 1 2
        feature1 (feature1)   <U1  'a' 'b'
        feature2 (feature2)  int  0 1 2
    Data variables:
        da1  (sample1, feature1, feature2)  int64   0 1 nan 3 ... 9 10 nan
        da2  (sample1, feature1, feature2)  int64   nan nan 0 nan ... 6 nan nan 9
    Indexes: (3)
    Attributes: (0)

This arises from a potential inconsistency in xarray's to_stacked_array()/to_unstacked_dataset() methods (see discussion).

nicrie avatar Aug 11 '23 12:08 nicrie