qlib icon indicating copy to clipboard operation
qlib copied to clipboard

Add support for category data type

Open Chaoyingz opened this issue 3 years ago • 0 comments

Description

See https://github.com/microsoft/qlib/issues/1249. Supports storing data of string or category type.

If a column is of type string or category, then the value of the column will be stored as follows step:

  1. Stores a list of unique values for this column(path is qlib_dir/categories/column_name.txt).
  2. Convert the value of this column to the index of the value in the previous step list.
  3. The column is stored in bin format.

You can query the value of the string or category type column with the following methods:

  1. Query Index: D.features(instruments, ["$column"])
  2. Query Value(use Cat operator): D.features(instruments, ["Cat($column")])

For specific usage, please refer to the test case(tests/test_category_data.py).

Motivation and Context

How Has This Been Tested?

  • [ ] Pass the test by running: pytest qlib/tests/test_all_pipeline.py under upper directory of qlib.
  • [x] If you are adding a new feature, test on your own test scripts.

Screenshots of Test Results (if appropriate):

  1. Pipeline test:
  2. Your own tests: image

Types of changes

  • [ ] Fix bugs
  • [x] Add new feature
  • [ ] Update documentation

Notes

related ISSUE https://github.com/microsoft/qlib/issues/232.

Chaoyingz avatar Aug 17 '22 08:08 Chaoyingz