SMF icon indicating copy to clipboard operation
SMF copied to clipboard

[2.1.4]: upgrade from SMF 2.0.18 fails if the forum is in Slovenian language and UTF-8

Open gregorklaric opened this issue 1 year ago • 9 comments

Basic Information

Upgrade from SMF 2.0.18 hangs in step 6 if the forum is in Slovenian language and UTF-8. I get this error: Notice : Undefined index: charset_detected in /home3/fungifun/public_html/gobe/forum/upgrade.php on line 3143

I think the problem is that in the upgrade script charsets are hardcoded for each language, (starting line 3083 // Figure out what charset we should be converting from... ) whereas the languages can be either in that encoding or in UTF-8 in SMF 2.0(as in my case) See https://wiki.simplemachines.org/smf/UTF-8_Readme

The same problem might also be present in the upgrade.php for SMF 3.0.

See also https://www.simplemachines.org/community/index.php?topic=588397.0;topicseen for more details and a workaround.

Steps to reproduce

  1. Have a 2.0.18 forum in slovenian language, that has been converted to UTF-8 after install
  2. Upgrade to 2.1 as described in https://wiki.simplemachines.org/smf/Upgrading

Expected result

Upgrader works and upgrades the forum to 2.1.

Actual result

Upgrade is stuck in step 6 of the upgrade process.

Version/Git revision

2.1.4

Database Engine

MySQL

Database Version

5.7.23-23

PHP Version

7.4

Logs

No response

Additional Information

No response

gregorklaric avatar Mar 03 '24 08:03 gregorklaric

It is generally a good idea to update the language packs as well, since you are updating literally everything else before running the upgrade. Were you still using language packs from 2.0, or did you update the languages as well? 2.1 uses only UTF-8, so I think this should not have been an issue if you did update the languages as well. Alternatively, you can switch to english during the upgrade, which means using the updated language files included in the upgrade package.

LexArma avatar Oct 15 '24 04:10 LexArma

did you update the languages as well? Yes, I copied the 2.1 slovenian language files on the server as well before upgrade.

Alternatively, you can switch to english during the upgrade, which means using the updated language files included in the upgrade package.

I think this would be bad, because in this case, there would be no caharacter conversion in the database, in case the database wasn't in UTF-8 yet. Rem,ber, the content in the database is in Slovenian, not in English.

Have you checked https://www.simplemachines.org/community/index.php?topic=588397.0;topicseen for more details and a workaround I used?

gregorklaric avatar Oct 15 '24 06:10 gregorklaric

Didn't check this right now, but I do believe the DB character set in 2.0 is determined by a line in Settings.php, not by the installed language packs.

LexArma avatar Oct 15 '24 16:10 LexArma

Yes, indeed, but depending on this setting the upgrader decides which character set it is converting from, and this was wrong. If you take a look at the post you will get the whole picture.

gregorklaric avatar Oct 15 '24 18:10 gregorklaric

I'll take a look into this.

sbulen avatar Oct 15 '24 20:10 sbulen

I cannot reproduce the issue. I don't think there's a bug with the 2.1 upgrader here, however, I do think there was an issue with the initial 2.0 utf8 conversion. This left the db & settings in a contradictory state that confused the upgrader.

Because if the 2.0 utf8 conversion were successful, the 2.1 upgrader would have bypassed the utf8 conversion.

Steps taken:

  • Installed 2.0.19, using Latin1
  • Changed default language & installed the Slovenian 2.0 language pack
  • Converted to UTF8 in 2.0
  • Installed the Slovenian utf8 2.0 language pack
  • Copied some Slovenian text
  • Tested to confirm 2.0 UTF8 is OK... (see screenshot 1)
  • Upgraded to 2.1.4, in Slovenian, using the 2.1 Slovenian language pack (see screenshot 3)
  • Tested to confirm 2.1 OK... (see screenshot 2)

Upgrader notes:

  • The utf8 conversion utility in 2.0 will set a flag indicating UTF8. If found, the utf8 conversion is skipped. https://github.com/SimpleMachines/SMF/blob/19a9de63d8b895c6f931137bba09c3d5a79caf52/other/upgrade.php#L2955
  • Only if the flag is NOT set, will the upgrader proceed... (We have run into issues in the past where folks did manual utf8 conversions at the db level and were not aware they needed to change the flags...)

So... If the utf8 conversion was performed in 2.0, there will be no conversion in the 2.1 upgrader.

Some notes on running the upgrader in other languages:

  • You can, in fact, run the upgrader in languages other than english (see screenshot 3).
  • Keep in mind that the language packs, including for the upgrader, are significantly different across 2.0 & 2.1... You cannot run the 2.1 upgrader with a 2.0 language pack.
  • So you have two options: (1) run it in English, or (2) install the 2.1 language pack that matches the forum default language prior to executing the upgrader.
  • Caveat: there is a bug reported on long running upgrades where sometimes it reverts back to English. (https://github.com/SimpleMachines/SMF/issues/7157)

slov_20_text slov_21_text slov_21_upgrade_1 slov_21_upgrade_2 slov_21_upgrade_3

sbulen avatar Oct 15 '24 22:10 sbulen

OTOH:

I do get the error:

Warning
: Undefined array key "charset_detected" in
D:\wamp64\www\84slovutf8\upgrade.php
on line
3143

If the database is in latin2_general_ci, with a db_character_set of "latin2" and I upgrade, leaving the default language as slovenian & pre-installing the 2.1 slovenian language pack.

I.e., If I DO NOT do the utf8 upgrade in 2.0, and attempt to use the 2.1 upgrader to update the forum from latin2 to utf8, the error is reproducible.

sbulen avatar Oct 16 '24 01:10 sbulen

Did one last test... Left everything in latin2, changed the forum default to English, and ran the upgrader.

I was hoping this would bypass the (likely unnecessary) old ISO tweaks and avoid the undefined error... If so, we could just go back to telling folks to run the upgrader in English...

The good news is that it ran thru to completion. The bad news is that it corrupted the text: image

So... This does need more looking into.

Where gregor's instincts were correct is that that 3083+ logic is sketchy... Also I don't think there's a clean 2.0 => 2.1 upgrade for non-utf8 Slovenian at the moment. Some form of manual intervention is needed.

And also yes, it's possible the same logic is in the 3.0 upgrader (???)... We had been talking about simplifying the upgrader and assuming folks are upgrading from 2.1, if so, there is no utf8 conversion... I am not sure where that stands.

sbulen avatar Oct 16 '24 02:10 sbulen

On a side note croatian is misspelled in the script:

image

gregorklaric avatar Oct 16 '24 06:10 gregorklaric

The upgrader is trying to call $upcontext['charset_detected'] and set it. The problem is, it does it in its own line:

		$upcontext['charset_detected'] = (isset($lang_charsets[$language]) && isset($charsets[strtr(strtolower($upcontext['charset_detected']), array('utf' => 'UTF', 'iso' => 'ISO'))])) ? $lang_charsets[$language] : 'ISO-8859-1';

I think this is because the original code before the move into the upgrader set https://github.com/SimpleMachines/SMF/pull/3100/files#diff-b4a500cbcf7af4ffe935b9a1ef72bb9b0a3f7b061f554327e56cb8ae8385bf30L440 We did this:

$context['charset_detected'] = $txt['lang_character_set'];

Then did the line.

So we could try to fix this by doing the same thing (and global $txt).

jdarwood007 avatar Jun 26 '25 03:06 jdarwood007

Fixed in #8805

I really wish GitHub's feature for automatically closing issues worked more reliably.

Sesquipedalian avatar Aug 30 '25 17:08 Sesquipedalian