tspreprocess Lexicographical sort of column "time" after compression

The "time" shows bins and is encoded as bin_0.0. This makes it hard to sort by the column and make plot. What about renaming "time" to "bin" and providing bin numbers?

In general, one would like to pass the dataframe to tsfresh, so the "time" column should be ordered accordingly.

id	feature_agg_autocorrelation_f_agg_"mean"	feature_agg_autocorrelation_f_agg_"median"	feature_agg_autocorrelation_f_agg_"var"	time
0	-0.006695	-0.031946	0.031041	bin_0.0
0	0.003307	0.002723	0.015377	bin_1.0
0	-0.019875	-0.020356	0.016519	bin_10.0
0	-0.010753	-0.026369	0.021735	bin_100.0
0	0.011816	0.019509	0.010336	bin_101.0
0	-0.012836	-0.012418	0.038740	bin_102.0
0	-0.013034	-0.008422	0.008983	bin_103.0
0	-0.015615	-0.015442	0.022139	bin_104.0
0	-0.011075	0.006340	0.018839	bin_105.0
0	-0.012528	-0.002204	0.014608	bin_106.0
0	0.003264	-0.012552	0.012001	bin_107.0
0	-0.008267	-0.013056	0.031777	bin_108.0
0	-0.014031	-0.026050	0.011954	bin_109.0
0	-0.027372	-0.028189	0.012125	bin_11.0
0	-0.006538	-0.016846	0.020991	bin_110.0
0	0.028912	-0.002320	0.018458	bin_111.0
0	-0.011757	-0.021368	0.040606	bin_112.0
0	-0.014773	-0.022101	0.013958	bin_113.0
0	-0.010944	-0.001797	0.028481	bin_114.0
0	-0.016143	-0.028406	0.007117	bin_115.0
0	-0.013865	-0.021711	0.011233	bin_116.0
0	-0.009488	0.007354	0.008971	bin_117.0
0	-0.014187	-0.017223	0.044131	bin_118.0
0	-0.013005	-0.005250	0.011614	bin_119.0
0	-0.011601	0.010453	0.016970	bin_12.0
0	-0.012738	-0.004333	0.012729	bin_120.0
0	-0.013266	-0.016564	0.007020	bin_121.0
0	-0.015038	-0.042097	0.024701	bin_122.0
0	-0.012776	-0.004399	0.016492	bin_123.0
0	-0.012934	-0.018298	0.017719	bin_124.0
...	...	...	...	...
9	-0.017292	-0.010434	0.007727	bin_72.0
9	-0.009239	0.000410	0.007263	bin_73.0
9	-0.050343	-0.035553	0.016307	bin_74.0
9	-0.016550	-0.019668	0.007808	bin_75.0
9	-0.015879	-0.034310	0.014253	bin_76.0
9	-0.019754	-0.037949	0.018174	bin_77.0
9	-0.016839	-0.005070	0.016695	bin_78.0
9	-0.015295	-0.005584	0.012654	bin_79.0
9	-0.015647	-0.016262	0.008907	bin_8.0
9	-0.010676	-0.014450	0.010222	bin_80.0
9	-0.003566	0.010439	0.009648	bin_81.0
9	0.008290	0.015121	0.009266	bin_82.0
9	-0.004448	-0.014874	0.007668	bin_83.0
9	-0.012481	-0.017615	0.012226	bin_84.0
9	-0.018334	-0.007268	0.009883	bin_85.0
9	-0.017429	-0.029421	0.009856	bin_86.0
9	-0.000159	0.010534	0.008968	bin_87.0
9	-0.003924	-0.022100	0.018910	bin_88.0
9	0.008415	0.019052	0.020014	bin_89.0
9	-0.012393	-0.000086	0.010260	bin_9.0
9	0.006285	0.020495	0.012573	bin_90.0
9	-0.010193	-0.008106	0.008721	bin_91.0
9	-0.016792	-0.009178	0.012188	bin_92.0
9	0.008476	0.020195	0.010278	bin_93.0
9	0.005893	0.007117	0.008789	bin_94.0
9	-0.008254	-0.010829	0.017784	bin_95.0
9	0.004660	0.014164	0.009694	bin_96.0
9	0.011764	-0.004501	0.010030	bin_97.0
9	-0.017136	-0.026493	0.011077	bin_98.0
9	0.013644	0.033041	0.008518	bin_99.0

Aug 12 '17 17:08 nikhase

Renaming "time" to "bin" and with numericals in the column, then passing to tsfresh:

extract_features(compressed_df, column_id="id", column_sort="bin")

Aug 12 '17 18:08 nikhase

I am fine with changing the naming of the bins if we also change the name of the id column to bin column afterwards.

Aug 14 '17 10:08 MaxBenChrist

Tiny correction: The id column stays the same, "time" is changed to "bin".

Aug 14 '17 10:08 nikhase