qlib source data lose precise after dump

souce data:

date,open,close,high,low,volume,change,turnover
2000-01-04,600000.41,600000.42,600000.43,600000.44,600000.45,600000.46,600000.47
2000-01-05,600000.51,600000.52,600000.53,600000.55,600000.55,600000.56,600000.57
2000-01-06,600000.61,600000.62,600000.63,600000.66,600000.66,600000.66,600000.67

dump result(dump_all and print features):

source data changed after dump,by source code:

np.array(_df[field]).astype("<f").tofile(fp)

it change float64 to little-endian single-precision float,but why do so?

Aug 29 '22 16:08 lerit

It should be a display bug, you can index to a single variable to see the value.

Aug 30 '22 10:08 Chaoyingz

@Chaoyingz Thank you for your reply. part of it.when i show value in vscode,i get another value(df.iloc[0:1, [0]]): and i try to save result to csv,to see the value:

crazy me: source value:600000.41 dump and print value:600000.4375 dump and save csv value:600000.44

which one i can trust?

Aug 31 '22 00:08 lerit

This is because the precision in the dump process is float32, which causes the precision to be lost, you can try to change this precision in the dump_bin file. However, the precision is also lost when using D.features, change the precision limit in file_storage.start_index and file_storage.__getitem__ and data.expression to solve this problem. It is possible to place constraints on precision in the config file, which gives the user the flexibility to modify it, If you are interested in this, you can submit your changes.

Nov 21 '22 09:11 SunsetWolf

This issue is stale because it has been open for three months with no activity. Remove the stale label or comment on the issue otherwise this will be closed in 5 days

Feb 19 '23 12:02 github-actions[bot]