qlib icon indicating copy to clipboard operation
qlib copied to clipboard

source data lose precise after dump

Open lerit opened this issue 3 years ago • 2 comments

souce data:

date,open,close,high,low,volume,change,turnover
2000-01-04,600000.41,600000.42,600000.43,600000.44,600000.45,600000.46,600000.47
2000-01-05,600000.51,600000.52,600000.53,600000.55,600000.55,600000.56,600000.57
2000-01-06,600000.61,600000.62,600000.63,600000.66,600000.66,600000.66,600000.67

dump result(dump_all and print features): image

source data changed after dump,by source code:

np.array(_df[field]).astype("<f").tofile(fp)

it change float64 to little-endian single-precision float,but why do so?

lerit avatar Aug 29 '22 16:08 lerit

It should be a display bug, you can index to a single variable to see the value.

Chaoyingz avatar Aug 30 '22 10:08 Chaoyingz

@Chaoyingz Thank you for your reply. part of it.when i show value in vscode,i get another value(df.iloc[0:1, [0]]): image and i try to save result to csv,to see the value: image

crazy me: source value:600000.41 dump and print value:600000.4375 dump and save csv value:600000.44

which one i can trust?

lerit avatar Aug 31 '22 00:08 lerit

This is because the precision in the dump process is float32, which causes the precision to be lost, you can try to change this precision in the dump_bin file. However, the precision is also lost when using D.features, change the precision limit in file_storage.start_index and file_storage.__getitem__ and data.expression to solve this problem. It is possible to place constraints on precision in the config file, which gives the user the flexibility to modify it, If you are interested in this, you can submit your changes.

SunsetWolf avatar Nov 21 '22 09:11 SunsetWolf

This issue is stale because it has been open for three months with no activity. Remove the stale label or comment on the issue otherwise this will be closed in 5 days

github-actions[bot] avatar Feb 19 '23 12:02 github-actions[bot]