Commands executed on remote hosts have their locale unexpectedly switched
When I ssh to a remote host manually and run locale I get the following result:
$ locale
LANG=en_US.utf8
LC_CTYPE="en_US.utf8"
LC_NUMERIC="en_US.utf8"
LC_TIME="en_US.utf8"
LC_COLLATE="en_US.utf8"
LC_MONETARY="en_US.utf8"
LC_MESSAGES="en_US.utf8"
LC_PAPER="en_US.utf8"
LC_NAME="en_US.utf8"
LC_ADDRESS="en_US.utf8"
LC_TELEPHONE="en_US.utf8"
LC_MEASUREMENT="en_US.utf8"
LC_IDENTIFICATION="en_US.utf8"
LC_ALL=
However, when I launch this task on the same host as the same user using Rex:
#!/usr/bin/env perl
use 5.026;
use strict;
use warnings;
use Rex -feature => [qw/1.3/];
task 'test_locale' => sub {
my $out = run 'locale';
say 'locale on remote host: ' . $out;
};
auth for => 'test_locale', user => 'myuser';
I get the following result:
% rex -H target.host test_locale
[2019-09-23 11:19:29] INFO - Running task test_locale on target.host
locale on remote host: LANG=en_US.utf8
LC_CTYPE="C"
LC_NUMERIC="C"
LC_TIME="C"
LC_COLLATE="C"
LC_MONETARY="C"
LC_MESSAGES="C"
LC_PAPER="C"
LC_NAME="C"
LC_ADDRESS="C"
LC_TELEPHONE="C"
LC_MEASUREMENT="C"
LC_IDENTIFICATION="C"
LC_ALL=C
This suggests that Rex enforces certain locale for remote connections, which can lead to hard to debug consequences ( one real example: if a Postgres database is created from inside such Rex connection, it will have its locale set to C and some collation operations will behave strangely ).
This behavior is completely unexpected and surprising because the documentation for 'run' https://metacpan.org/pod/Rex::Commands::Run does not say a word about locale switching.
I think Rex should not touch user's locale settings unless explicitly instructed to do so.
Currently Rex heavily relies on the C locale internally to have reproducible and parseable output from the various commands it executes while doing its job (especially for sudo commands), so I'm afraid there's little chance to change that completely.
Instead, the documentation should be of course more clear on this, and there should be feature flags controlling this behavior, ultimately giving more control to the user.
A particularly interesting solution would be to ensure Rex still can use whatever locale it prefers for its internal usage, and to provide a way to easily specify any other locale to be used for a single command and/or for a block of commands.
Currently it can be overridden only for single commands like this:
-
my $out = run 'locale', no_locales => TRUE; -
my $out = run 'locale', no_locales => TRUE, env => { LC_ALL => 'set_some_other_locale' };
For the Postgres database creation use case specifically, I would go for either:
- one of the examples from above for the
runcommand which initializes the database, and specify all theLC_*variables in theenvhash for the command - use
localeand/or--lc-*CLI options of the postgresinitdbcommand explicitly with theruncall, so the environment variables doesn't matter, whether rex enforcesClocale internally or not
It is completely understandable that to have a parseable output from certain commands (like sudo), Rex would invoke them with a known locale value.
It is however completely unclear to me, why this locale switching for internal use is not done on per-command basis (i.e. $ LC_ALL=C sudo foo whatever) and instead done on an entire connection, which leaks into user commands as well.
Yes, I agree it feels a bit weird that the current codebase seems to set locales in the Shell and Exec interfaces.
My only idea currently is that IIRC all the internal run calls have been converted to internal i_run calls a while ago, and if that's the case, there might be a chance to push the locale settings into the i_run wrapper from the Shell/Exec interfaces. I have no idea how to write proper tests for locales yet, though.
I'm putting the help wanted label on this one for now, and this probably would need to be dissected into many smaller tasks that can be addressed separately (documentation, testing, specification of new desired behavior, rewrite, new features, new feature flags, etc.).