Gettext icon indicating copy to clipboard operation
Gettext copied to clipboard

"invalid multibyte sequence" reading generated PO file

Open delmicio opened this issue 9 years ago • 4 comments

When I try to open the generated PO file is corrupted and can't open it with for ex.: poedit. "invalid multibyte sequence"

I think is a problem with the encoding.

This code I use to generate the file:

use Gettext\Extractors;
use Gettext\Translations;

if(!isset(Extractors\PhpCode::$functions['_'])) {
    Extractors\PhpCode::$functions['_'] = '__';
}

$translations = new Translations();
$translations->setLanguage('en');
$translations->setHeader('Report-Msgid-Bugs-To', '[email protected]');

$files = array_merge(
    glob(__DIR__.'/*.php'),
    glob(__DIR__.'/*.PHP'),
    glob(__DIR__.'/*.html'),
    glob(__DIR__.'/*.HTML'),
    glob(__DIR__.'/*.htm'),
    glob(__DIR__.'/*.HTM')
);
foreach ($files as $key => $file) {
    if (strpos($file, '/vendor') === 0) continue;
    $translations->addFromPhpCodeFile($file);
}

//And then, export all translations in a single .po file
$translations->toPoFile('file.po');

I must say that not of all my project files are UTF-8 encoded ¿may be this is the problem?

USING v3

delmicio avatar Sep 20 '16 21:09 delmicio

I never had this problem with SublimeText, so maybe it's due to multiple encoded files in your projects (I don't know). SublimeText allows to reopen the file using a specific encoding, so reopening with utf-8 should'n report this error.

oscarotero avatar Sep 21 '16 09:09 oscarotero

@oscarotero I really don't care much about SublimeText, was just a comment. I've solved with not allowing different encoding than UTF8.

$filecontent = file_get_contents($file);
$encoding = mb_detect_encoding($filecontent, 'ISO-8859-1, UTF-8', true);
if($encoding && $encoding != 'UTF-8') {
    $filecontent = mb_convert_encoding($filecontent, 'UTF-8', $encoding);
    $encoding = 'UTF-8';
}
if($encoding == 'UTF-8') {
    $translations->addFromPhpCodeString($filecontent);
}

But this makes it slower. May be there is already an option for this, like xgettex has --from-code=UTF-8

delmicio avatar Sep 21 '16 14:09 delmicio

@delmicio This is a problem with your source files and the way you create the catalog.

  • By default, GNU Gettext and Poedit expect your input strings to be ASCII only.
  • When you create a .po file where the input strings have non-ASCII characters, you must set an appropriate header.

So you must either fix your source files or set the the correct header before creating the .po file:

$translations->setHeader("Content-Type", "text/plain; charset=UTF-8");

lxg avatar Jun 29 '17 11:06 lxg

In theory, this header is added by default. https://github.com/oscarotero/Gettext/blob/master/src/Translations.php#L109

oscarotero avatar Jun 29 '17 16:06 oscarotero