Issue parsing ZH-Hant locale
Hi.
This doesn't work:
dateparser.parse('2020年9月1日 下午6:25', languages=['zh-Hant'])
This works:
dateparser.parse('2020年9月1日 下午6:25', languages=['zh'])
This also works:
dateparser.parse('2020年9月1日 下午6:25', languages=['zh-Hant', 'zh'])
It's weird because I can see all the info in https://github.com/scrapinghub/dateparser/blob/master/dateparser_data/cldr_language_data/date_translation_data/zh-Hant.json , but it also needs "zh" just to make it work.
Using latest 1.0.0 version, Python 3.8.
Thank you!
Hi,
The zh.yaml file has additional simplification rules: https://github.com/Workable/python-dateparser/blob/master/dateparser_data/supplementary_language_data/date_translation_data/zh.yaml#L41-L49
I've had a similar problem with simplified Chinese, and adding these rules into the zh_Hans.yaml did the trick.
However, we would probably need a native Chinese speaker to work on a fix.
Edit: shouldn't zh and zh_Hans be the same language wit the same rules? Since simplified script is the default for Chinese, according to the CLDR: https://st.unicode.org/cldr-apps/v#/zh_Hans
Edit: I've opened another issue for simplified Chinese. It's not the same problem as traditional Chinese is different, but it can be related.