b2-sdk-python icon indicating copy to clipboard operation
b2-sdk-python copied to clipboard

Error when b2 writes non-ASCII to stdout

Open tom-- opened this issue 6 years ago • 5 comments

The command

b2 sync --noProgress --keepDays 14 /home/data/v2 b2://backup/ > /path/to/logfile

produced this error on stderr.

WARNING:b2sdk.sync.report:could not output the following line with encoding None on stdout due 
to 'ascii' codec can't encode character u'\xfa' in position 40: ordinal not in range(128): 
upload files/Playlist-for-Derecho-a-la-música-9-15-19.csv

I think this is not my fault as a user of the b2 command.

If b2 uses ASCII on stdout then it should take care to not send any non-ASCII characters to stdout to avoid this python error.

Better than that is to use UTF-8 for stdout, at least when it is not connected to a term.

tom-- avatar Sep 14 '19 17:09 tom--

It seems that you are using a slightly misconfigured terminal. b2sdk sync reporter outputs names of transferred files on stdout, but if stdout cannot accept (for example) utf-8 and the file names are encoded in utf-8, B2CLI cannot perform it's job, that's why the warning is generated.

I think this happens because python is unable to detect encoding (it says the encoding is "None"). Could you please try to run a simple command with --verbose and show us the output, so that we can see the encoding settings reported by it? (please remove any sensitive information)

ppolewicz avatar Sep 15 '19 19:09 ppolewicz

I'm not using a terminal. I said so in the OP.

The error case is when stdout from b2 is piped to a file. And the command is not even run from a term, it is run from a cron.hourly script.

According to what I read about Python, when stdout is piped to a file, Python does not assign a codec to that file pointer. (That seems perfectly reasonable to me.) So unless you set it to something different, it's going to be "None".

In such a case I would say it is up to b2 to either set a codec or to use the "None" codec that is Python's default correctly and not send to it characters that cause an error.

spinitron avatar Sep 25 '19 19:09 spinitron

Sorry, I used my personal Github account for the OP by mistake. @tom-- and @spinitron are both me.

spinitron avatar Sep 25 '19 20:09 spinitron

If the CLI silently swallows non-ascii characters during output, it can report wrong things such as files and a being transferred will be indistinguishable.

I think we should set the output encoding to utf-8 if python cannot determine it (and sets it to None). @bwbeach, is that a good idea?

ppolewicz avatar Sep 25 '19 22:09 ppolewicz

That would fix my problem since file names are encoded in UTF-8 on my systems. Binary strings would be more general since 1) that's what output pipes/files are, and 2) filenames on Linux filesystems are binary so that users can use any encoding.

spinitron avatar Sep 26 '19 12:09 spinitron