nmos-cpp icon indicating copy to clipboard operation
nmos-cpp copied to clipboard

Logging Boost error messages as UTF-8

Open aholzinger opened this issue 6 years ago • 15 comments

By default Boost error messages when using Visual Studio are ANSI encoded and not UTF-8. To force Boost to emmit strings in UTF-8 the #define BOOST_SYSTEM_USE_UTF8 is needed. It could easily be added to Development/cmake/NmosCppCommon.cmake after line 147 like this: add_definitions(/DBOOST_SYSTEM_USE_UTF8)

Unfortunately the console also needs a hint that printed text is UTF-8. I had to fiddle a little bit, but finally I found a portable solution:

#include <clocale>

static bool make_utf8()
{
    std::string locale = setlocale(LC_CTYPE, "");
    if (!locale.empty()) {
        auto dot = locale.find_last_of('.');
        if (dot != std::string::npos) {
            locale = locale.substr(0, dot) + ".utf8";
            char* result = setlocale(LC_CTYPE, locale.c_str());
            return result != nullptr;
        }
    }
    return false;
}

After initialising the log:

if (!make_utf8())
{
    slog::log<slog::severities::error>(gate, SLOG_FLF) << "Setting console to UTF-8 mode failed";
}

Tested under Windows and Linux.

aholzinger avatar Jan 16 '20 15:01 aholzinger

@aholzinger, thanks for the thorough investigation! Please would you give some examples of what happened before and after you made this change?

garethsb avatar Jan 16 '20 16:01 garethsb

A couple of questions:

  • Presumably we need to also build Boost libraries (those which aren't header-only) and C++ REST SDK with BOOST_SYSTEM_USE_UTF8?
  • Could we achieve the same effect for logging via a call to e.g. std::cerr.imbue(...) (getting its current locale first if required) rather than setting the global locale with setlocale?

garethsb avatar Jan 22 '20 08:01 garethsb

  1. Good question: I really don't know, but I guess it would make sense. On my dev machine I did not compile boost with BOOST_SYSTEM_USE_UTF8 and the ASIO error messages are correct ("... ungültig ..."), but I can't imagine that compiled Boost code can emmit correct UTF-8 messages if Boost is not compiled with BOOST_SYSTEM_USE_UTF8.
  2. I'm not an expert. I made a short recherche and I doubt that it can be done via imbue, but what about this? https://www.boost.org/doc/libs/master/index.html Well it needs upcoming Boost version 1.73.

aholzinger avatar Jan 22 '20 09:01 aholzinger

Hi Axel,

Sorry, did you mean to put in a more specific link in 2?

For 1, I think it may require std::ios_base::sync_with_stdio(false) before any I/O or calling std::cerr.imbue(...) but then I think it should work... (e.g. see answer to similar question on SO, though it's possible Windows Console needs something different...)

garethsb avatar Jan 22 '20 09:01 garethsb

Sorry.

https://www.boost.org/doc/libs/master/libs/nowide/doc/html/index.html

aholzinger avatar Jan 22 '20 09:01 aholzinger

It sounds like defining BOOST_SYSTEM_USE_UTF8 is a good idea, but should be done consistently through whole application code, not just nmos-cpp.

Have you tried adding /DBOOST_SYSTEM_USE_UTF8 to the CMake variable CMAKE_CXX_FLAGS? CMake GUI makes this easy, or the default value can be overridden in the command line with something like /DCMAKE_CXX_FLAGS:STRING="/DBOOST_SYSTEM_USE_UTF8 /DWIN32 /D_WINDOWS /W3 /GR /EHsc" (so as to keep the default symbol definitions and flags (though we forcibly increase the warning level for nmos-cpp)).

I have followed Boost.Nowide development, but I'd prefer not to require latest Boost in the library itself, maybe could be optionally used in nmos-cpp-node example app. Of course, the setlocale solution, or imbue solution if that works, also need to be done at application, not library, level. This could also be demonstrated in nmos-cpp-node example app.

garethsb avatar Jan 22 '20 10:01 garethsb

Yes, BOOST_SYSTEM_USE_UTF8 needs to be set everywhere.

Yes, setting BOOST_SYSTEM_USE_UTF8 should work, but I think it should go to CMakeLists.txt (or to one of the .cmake files) finally.

I think for the nmos-cpp lib itself UTF-8 is not an issue, because only if the text is sent to a console the issue comes up and then it's the responsibility of the console application to handle this. So yes, handling this in the example apps is fine.

aholzinger avatar Jan 22 '20 10:01 aholzinger

Yes, setting BOOST_SYSTEM_USE_UTF8 should work, but I think it should go to CMakeLists.txt (or to one of the .cmake files) finally.

If it is possible some users may not want this behaviour, we need to keep some way to turn it on and off. So maybe just documenting how to to set it with CMake command line is enough?

I think for the nmos-cpp lib itself UTF-8 is not an issue, because only if the text is sent to a console the issue comes up and then it's the responsibility of the console application to handle this. So yes, handling this in the example apps is fine.

To decide this, we also have to consider whether exceptions and messages from other libraries will also use UTF-8. Generally this is probably safe assumption, but I am not certain yet.

garethsb avatar Jan 22 '20 10:01 garethsb

If it is possible some users may not want this behaviour, we need to keep some way to turn it on and off. So maybe just documenting how to to set it with CMake command line is enough?

What would they loose? On Linux the #define does nothing. On Windows those with US/latin codepage won't notice adifference, would they? On Windows those with a different codepage than US/latin won't have garbled error messages anymore ==> improvement.

So why not generally define it?

To decide this, we also have to consider whether exceptions and messages from other libraries will also use UTF-8. Generally this is probably safe assumption, but I am not certain yet.

In general, portable projects that I know use UTF-8 always and the Windows port has to handle this. This puts a burden on the Windows port, but unfortunately Microsoft back then opted for UTF-16LE and since then didn't do their homework to add a UTF-8 mode (adding some #define UNICODE_UTF8 to tchar.h). Like this (with something like #define UNICODE_UTF8) all this hassle wouldn't exist and Windows would take care to convert back and forth from/to UTF-8/UTF-16LE as it already does from/to ANSI/UTF-16LE when you choose the A-functions (if you don't #define UNICODE the macros expand to the A-functions like i.e. CreateFile expands to CreateFileA).

Long story short, I think defining BOOST_SYSTEM_USE_UTF8 will only improve the situation. One that doesn't want UTF-8 (on Windows) has to do the same as with any other portable library, convert to/from ANSI or UTF16-LE (depending whether he/she uses #define UNICODE or not.

aholzinger avatar Jan 22 '20 10:01 aholzinger

Thanks, Axel, good points, and I agree that the direction of travel even for Windows libraries is char with UTF-8. I am often working in the narrow world of ASCII, and these issues don't arise, but of course, I have to handle Japanese deployment/integration regularly also, so I'll use that as test environment to confirm.

garethsb avatar Jan 22 '20 11:01 garethsb

Switching the console to UTF-8 in a portable way (tested on Linux and Windows):

#include <clocale>

int main(int argc, char* argv[])
{
    // set locale for console to UTF-8, because boost is emitting
       UTF-8 error messages when BOOST_SYSTEM_USE_UTF8 is defined
#ifdef BOOST_SYSTEM_USE_UTF8
    std::setlocale(LC_CTYPE, ".UTF-8");
#endif
...

Could also being moved to a static function somewhere in the boost helpers to avoid the ugly #ifdef in main.

aholzinger avatar Mar 15 '20 14:03 aholzinger

Which versions of Windows and Visual Studio did you test on? I have to consider several generations of compiler/standard libraries and OS unfortunately.

garethsb avatar Mar 16 '20 07:03 garethsb

Sorry, overlooked that.

WIndows 10 x64 Visual Studio 2019

I could test on Visual Studio 2017 also.

Different Windows version will be hard. I currently have no 8.1, 8 or 7 running anymore :-(

aholzinger avatar Mar 23 '20 09:03 aholzinger

This will have to wait until I have time to fully test on multiple Linux, Windows and macOS platforms.

garethsb avatar Mar 23 '20 09:03 garethsb

No problem, take your time.

aholzinger avatar Mar 23 '20 09:03 aholzinger