locale.normalize returns non-existing locale
When running:
>>> locale.normalize ('en')
'en_US.ISO8859-1'
On the system in question this locale does not exist:
% locale -a
C
C.UTF-8
de_AT
de_AT.iso88591
de_AT.utf8
de_DE.utf8
en_GB
en_GB.iso88591
en_GB.iso885915
en_US.utf8
POSIX
In the environment I have:
% env | grep LOCAL
XTERM_LOCALE=en_US.UTF-8
% env | grep LANG
GDM_LANG=en_US.utf8
LANG=en_US.UTF-8
% env | grep LC
LC_ALL=en_US.UTF-8
These are all correct and supported on the system. No idea where locale.normalize takes the non-existing encoding for that locale from.
The documentation tells us:
locale.normalize(localename) Returns a normalized locale code for the given locale name. The returned locale code is formatted for use with setlocale().
But using the returned locale with setlocale I get:
>>> x = locale.normalize ('en')
>>> x
'en_US.ISO8859-1'
>>> locale.setlocale (locale.LC_NUMERIC, x)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python3.9/locale.py", line 610, in setlocale
return _setlocale(category, locale)
locale.Error: unsupported locale setting
Your environment
Debian bullseye 11.4 % python3 --version Python 3.9.2 Also reproduceable with % python3.10 Python 3.10.5 (main, Jul 21 2022, 08:51:05) [GCC 10.2.1 20210110] on linux
normalize() doesn't have anything to do with what is available on current system. Instead it uses locale.locale_alias dict and some logic to return a normalized value, which doesn't mean it's the most commonly used or available on most systems, or most modern value. In this case alias for en was updated in 2004.
I will close this as not a bug unless anyone wants to add any comments to this..
On Thu, Nov 10, 2022 at 11:39:34AM -0800, andrei kulakov wrote:
normalize()doesn't have anything to do with what is available on current system. Instead it useslocale.locale_aliasdict and some logic to return a normalized value, which doesn't mean it's the most commonly used or available on most systems, or most modern value. In this case alias forenwas updated in 2004.
But the documentation claims "Returns a normalized locale code for the given locale name. The returned locale code is formatted for use with setlocale()."
I have clearly shown that using the returned value with setlocale throws an error. This should never happen according to the documentation.
So my feeling is: It's a bug. It returns a value that is NOT suitable for use with setlocale.
So maybe you'd have to fixy the "some logic" here. One way that might give a better heuristic is to use utf-8 locales in preference of latin1 these days. Which wouldn't fix it in the strong sense but make it more usable in most cases.
The real fix, of course, would be to make it use the supported locales on the system.
Ralf
Dr. Ralf Schlatterbeck Tel: +43/2243/26465-16 Open Source Consulting www: www.runtux.com Reichergasse 131, A-3411 Weidling email: @.***