highfreq data get calendar failure
🐛 Bug Description
The program report ValueError: could not convert string to Timestamp error for obtaining highfreq calendar. If the data has freq of 1 day it works fine.
To Reproduce
import qlib
qlib.init(provider_uri="qlib_data/30min", dataset_cache=None, custom_ops=[], expression_cache=None, region=qlib.config.REG_US)
from qlib.data import D
trade_calendar = D.calendar(start_time="2000-01-01", end_time="2100-12-31", freq="30min")
Expected Behavior
Screenshot
[3723:MainThread](2022-07-07 00:59:52,410) INFO - qlib.Initialization - [config.py:413] - default_conf: client.
[3723:MainThread](2022-07-07 00:59:52,634) INFO - qlib.Initialization - [__init__.py:74] - qlib successfully initialized based on client settings.
[3723:MainThread](2022-07-07 00:59:52,634) INFO - qlib.Initialization - [__init__.py:76] - data_path={'__DEFAULT_FREQ': PosixPath('/data/hf/qlib_data/30min')}
[3723:MainThread](2022-07-07 00:59:53,406) ERROR - qlib.workflow - [utils.py:41] - An exception has been raised[ValueError: could not convert string to Timestamp].
File "./test.py", line 7, in <module>
trade_calendar = D.calendar(start_time=START_TIME, end_time="2100-12-31", freq="day", future=False)
File "/data/hf/.venv/lib/python3.8/site-packages/pyqlib-0.8.6.99-py3.8-linux-x86_64.egg/qlib/data/data.py", line 1146, in calendar
return Cal.calendar(start_time, end_time, freq, future=future)
File "/data/hf/.venv/lib/python3.8/site-packages/pyqlib-0.8.6.99-py3.8-linux-x86_64.egg/qlib/data/data.py", line 90, in calendar
_calendar, _calendar_index = self._get_calendar(freq, future)
File "/data/hf/.venv/lib/python3.8/site-packages/pyqlib-0.8.6.99-py3.8-linux-x86_64.egg/qlib/data/data.py", line 173, in _get_calendar
_calendar = np.array(self.load_calendar(freq, future))
File "/data/hf/.venv/lib/python3.8/site-packages/pyqlib-0.8.6.99-py3.8-linux-x86_64.egg/qlib/data/data.py", line 659, in load_calendar
backend_obj = self.backend_obj(freq=freq, future=future).data
File "/data/hf/.venv/lib/python3.8/site-packages/pyqlib-0.8.6.99-py3.8-linux-x86_64.egg/qlib/data/storage/file_storage.py", line 132, in data
np.array(list(map(pd.Timestamp, _calendar))), self._freq_file, self.freq, self.region
File "pandas/_libs/tslibs/timestamps.pyx", line 1399, in pandas._libs.tslibs.timestamps.Timestamp.__new__
File "pandas/_libs/tslibs/conversion.pyx", line 408, in pandas._libs.tslibs.conversion.convert_to_tsobject
File "pandas/_libs/tslibs/conversion.pyx", line 652, in pandas._libs.tslibs.conversion._convert_str_to_tsobject
ValueError: could not convert string to Timestamp
Environment
Linux x86_64 Linux-4.18.0-147.el8.x86_64-x86_64-with-glibc2.2.5 #1 SMP Wed Dec 4 21:51:45 UTC 2019
Python version: 3.8.6 (default, Oct 22 2020, 17:03:03) [GCC 9.3.0]
Qlib version: 0.8.6.99 numpy==1.23.0 pandas==1.4.3 scipy==1.8.1 requests==2.28.1 sacred==0.8.2 python-socketio==5.7.0 redis==4.3.4 python-redis-lock==3.7.0 schedule==1.1.0 cvxpy==1.2.1 hyperopt==0.1.2 fire==0.4.0 statsmodels==0.13.2 xlrd==2.0.1 plotly==5.9.0 matplotlib==3.5.2 tables==3.7.0 pyyaml==6.0 mlflow==1.27.0 tqdm==4.64.0 loguru==0.6.0 lightgbm==3.3.2 tornado==6.2 joblib==1.1.0 fire==0.4.0 ruamel.yaml==0.17.21
Additional Notes
我也遇到了这个问题:
File "/home/anaconda3/envs/Qlib/lib/python3.8/site-packages/pyqlib-0.8.6.99-py3.8-linux-x86_64.egg/qlib/data/data.py", line 90, in calendar
_calendar, _calendar_index = self._get_calendar(freq, future)
File "/home/anaconda3/envs/Qlib/lib/python3.8/site-packages/pyqlib-0.8.6.99-py3.8-linux-x86_64.egg/qlib/data/data.py", line 173, in _get_calendar
_calendar = np.array(self.load_calendar(freq, future))
File "/home/anaconda3/envs/Qlib/lib/python3.8/site-packages/pyqlib-0.8.6.99-py3.8-linux-x86_64.egg/qlib/data/data.py", line 672, in load_calendar
return [pd.Timestamp(x) for x in backend_obj]
File "/home/anaconda3/envs/Qlib/lib/python3.8/site-packages/pyqlib-0.8.6.99-py3.8-linux-x86_64.egg/qlib/data/data.py", line 672, in
不知道是不是qlib的问题,我用下载好的cn_data_1min, 在yaml里加入了freq: "1min",也是一直报这个错误。。
import qlib
from qlib.data import D
qlib.init(provider_uri="~/.qlib/qlib_data/cn_data_1min", region="cn")
inst = D.list_instruments(D.instruments("all"), freq="1min", as_list=True)
df = D.features(inst[:100], ["$close"], freq="1min")
同样的问题

我的解决方法:
找到data.py
.\Lib\site-packages\qlib\data\data.py
修改返回值
def load_calendar(self, freq, future):
import re ####加载re
try:
backend_obj = self.backend_obj(freq=freq, future=future).data
except ValueError:
if future:
get_module_logger("data").warning(
f"load calendar error: freq={freq}, future={future}; return current calendar!"
)
get_module_logger("data").warning(
"You can get future calendar by referring to the following document:
)
backend_obj = self.backend_obj(freq=freq, future=False).data
else:
raise
return [pd.Timestamp(re.sub('[\[\],\']','',x)) for x in backend_obj] ####去掉多余字符,让时间戳能正常转化
It is fixed now Please refer to the main branch. https://github.com/microsoft/qlib/blob/main/qlib/data/storage/file_storage.py#L105