dateparser icon indicating copy to clipboard operation
dateparser copied to clipboard

Issue parsing ZH-Hant locale

Open croqaz opened this issue 5 years ago • 1 comments

Hi.

This doesn't work: dateparser.parse('2020年9月1日 下午6:25', languages=['zh-Hant']) This works: dateparser.parse('2020年9月1日 下午6:25', languages=['zh']) This also works: dateparser.parse('2020年9月1日 下午6:25', languages=['zh-Hant', 'zh'])

It's weird because I can see all the info in https://github.com/scrapinghub/dateparser/blob/master/dateparser_data/cldr_language_data/date_translation_data/zh-Hant.json , but it also needs "zh" just to make it work.

Using latest 1.0.0 version, Python 3.8.

Thank you!

croqaz avatar Feb 05 '21 14:02 croqaz

Hi,

The zh.yaml file has additional simplification rules: https://github.com/Workable/python-dateparser/blob/master/dateparser_data/supplementary_language_data/date_translation_data/zh.yaml#L41-L49

I've had a similar problem with simplified Chinese, and adding these rules into the zh_Hans.yaml did the trick. However, we would probably need a native Chinese speaker to work on a fix.

Edit: shouldn't zh and zh_Hans be the same language wit the same rules? Since simplified script is the default for Chinese, according to the CLDR: https://st.unicode.org/cldr-apps/v#/zh_Hans

Edit: I've opened another issue for simplified Chinese. It's not the same problem as traditional Chinese is different, but it can be related.

Merinorus avatar Jun 20 '25 15:06 Merinorus