tspreprocess
tspreprocess copied to clipboard
Lexicographical sort of column "time" after compression
The "time" shows bins and is encoded as bin_0.0. This makes it hard to sort by the column and make plot. What about renaming "time" to "bin" and providing bin numbers?
In general, one would like to pass the dataframe to tsfresh, so the "time" column should be ordered accordingly.
| id | feature_agg_autocorrelation_f_agg_"mean" | feature_agg_autocorrelation_f_agg_"median" | feature_agg_autocorrelation_f_agg_"var" | time |
|---|---|---|---|---|
| 0 | -0.006695 | -0.031946 | 0.031041 | bin_0.0 |
| 0 | 0.003307 | 0.002723 | 0.015377 | bin_1.0 |
| 0 | -0.019875 | -0.020356 | 0.016519 | bin_10.0 |
| 0 | -0.010753 | -0.026369 | 0.021735 | bin_100.0 |
| 0 | 0.011816 | 0.019509 | 0.010336 | bin_101.0 |
| 0 | -0.012836 | -0.012418 | 0.038740 | bin_102.0 |
| 0 | -0.013034 | -0.008422 | 0.008983 | bin_103.0 |
| 0 | -0.015615 | -0.015442 | 0.022139 | bin_104.0 |
| 0 | -0.011075 | 0.006340 | 0.018839 | bin_105.0 |
| 0 | -0.012528 | -0.002204 | 0.014608 | bin_106.0 |
| 0 | 0.003264 | -0.012552 | 0.012001 | bin_107.0 |
| 0 | -0.008267 | -0.013056 | 0.031777 | bin_108.0 |
| 0 | -0.014031 | -0.026050 | 0.011954 | bin_109.0 |
| 0 | -0.027372 | -0.028189 | 0.012125 | bin_11.0 |
| 0 | -0.006538 | -0.016846 | 0.020991 | bin_110.0 |
| 0 | 0.028912 | -0.002320 | 0.018458 | bin_111.0 |
| 0 | -0.011757 | -0.021368 | 0.040606 | bin_112.0 |
| 0 | -0.014773 | -0.022101 | 0.013958 | bin_113.0 |
| 0 | -0.010944 | -0.001797 | 0.028481 | bin_114.0 |
| 0 | -0.016143 | -0.028406 | 0.007117 | bin_115.0 |
| 0 | -0.013865 | -0.021711 | 0.011233 | bin_116.0 |
| 0 | -0.009488 | 0.007354 | 0.008971 | bin_117.0 |
| 0 | -0.014187 | -0.017223 | 0.044131 | bin_118.0 |
| 0 | -0.013005 | -0.005250 | 0.011614 | bin_119.0 |
| 0 | -0.011601 | 0.010453 | 0.016970 | bin_12.0 |
| 0 | -0.012738 | -0.004333 | 0.012729 | bin_120.0 |
| 0 | -0.013266 | -0.016564 | 0.007020 | bin_121.0 |
| 0 | -0.015038 | -0.042097 | 0.024701 | bin_122.0 |
| 0 | -0.012776 | -0.004399 | 0.016492 | bin_123.0 |
| 0 | -0.012934 | -0.018298 | 0.017719 | bin_124.0 |
| ... | ... | ... | ... | ... |
| 9 | -0.017292 | -0.010434 | 0.007727 | bin_72.0 |
| 9 | -0.009239 | 0.000410 | 0.007263 | bin_73.0 |
| 9 | -0.050343 | -0.035553 | 0.016307 | bin_74.0 |
| 9 | -0.016550 | -0.019668 | 0.007808 | bin_75.0 |
| 9 | -0.015879 | -0.034310 | 0.014253 | bin_76.0 |
| 9 | -0.019754 | -0.037949 | 0.018174 | bin_77.0 |
| 9 | -0.016839 | -0.005070 | 0.016695 | bin_78.0 |
| 9 | -0.015295 | -0.005584 | 0.012654 | bin_79.0 |
| 9 | -0.015647 | -0.016262 | 0.008907 | bin_8.0 |
| 9 | -0.010676 | -0.014450 | 0.010222 | bin_80.0 |
| 9 | -0.003566 | 0.010439 | 0.009648 | bin_81.0 |
| 9 | 0.008290 | 0.015121 | 0.009266 | bin_82.0 |
| 9 | -0.004448 | -0.014874 | 0.007668 | bin_83.0 |
| 9 | -0.012481 | -0.017615 | 0.012226 | bin_84.0 |
| 9 | -0.018334 | -0.007268 | 0.009883 | bin_85.0 |
| 9 | -0.017429 | -0.029421 | 0.009856 | bin_86.0 |
| 9 | -0.000159 | 0.010534 | 0.008968 | bin_87.0 |
| 9 | -0.003924 | -0.022100 | 0.018910 | bin_88.0 |
| 9 | 0.008415 | 0.019052 | 0.020014 | bin_89.0 |
| 9 | -0.012393 | -0.000086 | 0.010260 | bin_9.0 |
| 9 | 0.006285 | 0.020495 | 0.012573 | bin_90.0 |
| 9 | -0.010193 | -0.008106 | 0.008721 | bin_91.0 |
| 9 | -0.016792 | -0.009178 | 0.012188 | bin_92.0 |
| 9 | 0.008476 | 0.020195 | 0.010278 | bin_93.0 |
| 9 | 0.005893 | 0.007117 | 0.008789 | bin_94.0 |
| 9 | -0.008254 | -0.010829 | 0.017784 | bin_95.0 |
| 9 | 0.004660 | 0.014164 | 0.009694 | bin_96.0 |
| 9 | 0.011764 | -0.004501 | 0.010030 | bin_97.0 |
| 9 | -0.017136 | -0.026493 | 0.011077 | bin_98.0 |
| 9 | 0.013644 | 0.033041 | 0.008518 | bin_99.0 |
Renaming "time" to "bin" and with numericals in the column, then passing to tsfresh:
extract_features(compressed_df, column_id="id", column_sort="bin")
I am fine with changing the naming of the bins if we also change the name of the id column to bin column afterwards.
Tiny correction: The id column stays the same, "time" is changed to "bin".