doc-en icon indicating copy to clipboard operation
doc-en copied to clipboard

mb_strwidth behavior changed

Open logistiker opened this issue 3 years ago โ€ข 3 comments

Due to this commit, the documentation for mb_strwidth should be changed: https://github.com/php/php-src/commit/d8c785b894e1a4ed9793d71cad02330cb0034faa?

This function now returns the proper width of a multibyte string starting with php 8.1. For example:

For php <= 8.0:

echo mb_strwidth('๐Ÿ†');

returns: 1

For php >= 8.1:

echo mb_strwidth('๐Ÿ†');

returns: 2

logistiker avatar Aug 29 '22 21:08 logistiker

Both mb_strwidth() and mb_strimwidth() have changed behavior in PHP 8.1. As you pointed out, php/php-src@d8c785b reflected Unicode 13.0 EAW properties in eaw_table.h.

Note that the "width" these functions refer to is irrelevant to modern Unicode representations.

<?php

$emojis = ['๐Ÿ†', '๐Ÿ‡บ๐Ÿ‡ธ', '๐Ÿ‘ช', '๐Ÿ‘จโ€๐Ÿ‘จโ€๐Ÿ‘ฆ', '๐Ÿ‘จโ€๐Ÿ‘ฉโ€๐Ÿ‘งโ€๐Ÿ‘ฆ'];

var_dump(array_map(fn($emoji) => [
    'emoji' => $emoji,
    'mb_strwidth()' => mb_strwidth($emoji),
    'grapheme_strlen()' => grapheme_strlen($emoji)
], $emojis));

https://3v4l.org/Cvc36U

zonuexe avatar Sep 02 '22 12:09 zonuexe

A simple idea for the manual to follow reality is to reflect the definition in EastAsianWidth.txt, but difficult to maintain in the future.

The fullwidth characters are: U+1100-U+115F, U+11A3-U+11A7, U+11FA-U+11FF, U+2329-U+232A, U+2E80-U+2E99, U+2E9B-U+2EF3, U+2F00-U+2FD5, U+2FF0-U+2FFB, U+3000-U+303E, U+3041-U+3096, U+3099-U+30FF, U+3105-U+312D, U+3131-U+318E, U+3190-U+31BA, U+31C0-U+31E3, U+31F0-U+321E, U+3220-U+3247, U+3250-U+32FE, U+3300-U+4DBF, U+4E00-U+A48C, U+A490-U+A4C6, U+A960-U+A97C, U+AC00-U+D7A3, U+D7B0-U+D7C6, U+D7CB-U+D7FB, U+F900-U+FAFF, U+FE10-U+FE19, U+FE30-U+FE52, U+FE54-U+FE66, U+FE68-U+FE6B, U+FF01-U+FF60, U+FFE0-U+FFE6, U+1B000-U+1B001, U+1F200-U+1F202, U+1F210-U+1F23A, U+1F240-U+1F248, U+1F250-U+1F251, U+20000-U+2FFFD, U+30000-U+3FFFD. All other characters are halfwidth characters.

My patch for doc-ja
diff --git a/reference/mbstring/functions/mb-strwidth.xml b/reference/mbstring/functions/mb-strwidth.xml
index 5a9b33c03..b3928927b 100644
--- a/reference/mbstring/functions/mb-strwidth.xml
+++ b/reference/mbstring/functions/mb-strwidth.xml
@@ -26,9 +26,40 @@
   <para>
    ๅ…จ่ง’ๆ–‡ๅญ—ใฏๆฌกใฎใจใŠใ‚Šใงใ™ใ€‚
    <literal>U+1100</literal>-<literal>U+115F</literal>ใ€
-   <literal>U+11A3</literal>-<literal>U+11A7</literal>ใ€
-   <literal>U+11FA</literal>-<literal>U+11FF</literal>ใ€
+   <literal>U+231A</literal>-<literal>U+231B</literal>ใ€
    <literal>U+2329</literal>-<literal>U+232A</literal>ใ€
+   <literal>U+23E9</literal>-<literal>U+23EC</literal>ใ€
+   <literal>U+23F0</literal>ใ€
+   <literal>U+23F3</literal>ใ€
+   <literal>U+25FD</literal>-<literal>U+25FE</literal>ใ€
+   <literal>U+2614</literal>-<literal>U+2615</literal>ใ€
+   <literal>U+2648</literal>-<literal>U+2653</literal>ใ€
+   <literal>U+267F</literal>ใ€
+   <literal>U+2693</literal>ใ€
+   <literal>U+26A1</literal>ใ€
+   <literal>U+26AA</literal>-<literal>U+26AB</literal>ใ€
+   <literal>U+26BD</literal>-<literal>U+26BE</literal>ใ€
+   <literal>U+26C4</literal>-<literal>U+26C5</literal>ใ€
+   <literal>U+26CE</literal>ใ€
+   <literal>U+26D4</literal>ใ€
+   <literal>U+26EA</literal>ใ€
+   <literal>U+26F2</literal>-<literal>U+26F3</literal>ใ€
+   <literal>U+26F5</literal>ใ€
+   <literal>U+26FA</literal>ใ€
+   <literal>U+26FD</literal>ใ€
+   <literal>U+2705</literal>ใ€
+   <literal>U+270A</literal>-<literal>U+270B</literal>ใ€
+   <literal>U+2728</literal>ใ€
+   <literal>U+274C</literal>ใ€
+   <literal>U+274E</literal>ใ€
+   <literal>U+2753</literal>-<literal>U+2755</literal>ใ€
+   <literal>U+2757</literal>ใ€
+   <literal>U+2795</literal>-<literal>U+2797</literal>ใ€
+   <literal>U+27B0</literal>ใ€
+   <literal>U+27BF</literal>ใ€
+   <literal>U+2B1B</literal>-<literal>U+2B1C</literal>ใ€
+   <literal>U+2B50</literal>ใ€
+   <literal>U+2B55</literal>ใ€
    <literal>U+2E80</literal>-<literal>U+2E99</literal>ใ€
    <literal>U+2E9B</literal>-<literal>U+2EF3</literal>ใ€
    <literal>U+2F00</literal>-<literal>U+2FD5</literal>ใ€
@@ -36,34 +67,109 @@
    <literal>U+3000</literal>-<literal>U+303E</literal>ใ€
    <literal>U+3041</literal>-<literal>U+3096</literal>ใ€
    <literal>U+3099</literal>-<literal>U+30FF</literal>ใ€
-   <literal>U+3105</literal>-<literal>U+312D</literal>ใ€
+   <literal>U+3105</literal>-<literal>U+312F</literal>ใ€
    <literal>U+3131</literal>-<literal>U+318E</literal>ใ€
-   <literal>U+3190</literal>-<literal>U+31BA</literal>ใ€
+   <literal>U+3190</literal>-<literal>U+31BF</literal>ใ€
    <literal>U+31C0</literal>-<literal>U+31E3</literal>ใ€
-   <literal>U+31F0</literal>-<literal>U+321E</literal>ใ€
-   <literal>U+3220</literal>-<literal>U+3247</literal>ใ€
-   <literal>U+3250</literal>-<literal>U+32FE</literal>ใ€
-   <literal>U+3300</literal>-<literal>U+4DBF</literal>ใ€
-   <literal>U+4E00</literal>-<literal>U+A48C</literal>ใ€
+   <literal>U+31F0</literal>-<literal>U+31FF</literal>ใ€
+   <literal>U+3200</literal>-<literal>U+321E</literal>ใ€
+   <literal>U+3220</literal>-<literal>U+3229</literal>ใ€
+   <literal>U+322A</literal>-<literal>U+3247</literal>ใ€
+   <literal>U+3250</literal>-<literal>U+A48C</literal>ใ€
    <literal>U+A490</literal>-<literal>U+A4C6</literal>ใ€
    <literal>U+A960</literal>-<literal>U+A97C</literal>ใ€
    <literal>U+AC00</literal>-<literal>U+D7A3</literal>ใ€
-   <literal>U+D7B0</literal>-<literal>U+D7C6</literal>ใ€
-   <literal>U+D7CB</literal>-<literal>U+D7FB</literal>ใ€
    <literal>U+F900</literal>-<literal>U+FAFF</literal>ใ€
    <literal>U+FE10</literal>-<literal>U+FE19</literal>ใ€
-   <literal>U+FE30</literal>-<literal>U+FE52</literal>ใ€
-   <literal>U+FE54</literal>-<literal>U+FE66</literal>ใ€
-   <literal>U+FE68</literal>-<literal>U+FE6B</literal>ใ€
-   <literal>U+FF01</literal>-<literal>U+FF60</literal>ใ€
-   <literal>U+FFE0</literal>-<literal>U+FFE6</literal>ใ€
-   <literal>U+1B000</literal>-<literal>U+1B001</literal>ใ€
+   <literal>U+FE30</literal>-<literal>U+FF60</literal>ใ€
+   <literal>U+FFE0</literal>-<literal>U+FFE1</literal>ใ€
+   <literal>U+FFE2</literal>ใ€
+   <literal>U+FFE3</literal>ใ€
+   <literal>U+FFE4</literal>ใ€
+   <literal>U+FFE5</literal>-<literal>U+FFE6</literal>ใ€
+   <literal>U+16FE0</literal>-<literal>U+16FE1</literal>ใ€
+   <literal>U+16FE2</literal>ใ€
+   <literal>U+16FE3</literal>ใ€
+   <literal>U+16FE4</literal>ใ€
+   <literal>U+16FF0</literal>-<literal>U+16FF1</literal>ใ€
+   <literal>U+17000</literal>-<literal>U+187F7</literal>ใ€
+   <literal>U+18800</literal>-<literal>U+18AFF</literal>ใ€
+   <literal>U+18B00</literal>-<literal>U+18CD5</literal>ใ€
+   <literal>U+18D00</literal>-<literal>U+18D08</literal>ใ€
+   <literal>U+1AFF0</literal>-<literal>U+1AFF3</literal>ใ€
+   <literal>U+1AFF5</literal>-<literal>U+1AFFB</literal>ใ€
+   <literal>U+1AFFD</literal>-<literal>U+1AFFE</literal>ใ€
+   <literal>U+1B000</literal>-<literal>U+1B0FF</literal>ใ€
+   <literal>U+1B100</literal>-<literal>U+1B122</literal>ใ€
+   <literal>U+1B150</literal>-<literal>U+1B152</literal>ใ€
+   <literal>U+1B164</literal>-<literal>U+1B167</literal>ใ€
+   <literal>U+1B170</literal>-<literal>U+1B2FB</literal>ใ€
+   <literal>U+1F004</literal>ใ€
+   <literal>U+1F0CF</literal>ใ€
+   <literal>U+1F18E</literal>ใ€
+   <literal>U+1F191</literal>-<literal>U+1F19A</literal>ใ€
    <literal>U+1F200</literal>-<literal>U+1F202</literal>ใ€
-   <literal>U+1F210</literal>-<literal>U+1F23A</literal>ใ€
+   <literal>U+1F210</literal>-<literal>U+1F23B</literal>ใ€
    <literal>U+1F240</literal>-<literal>U+1F248</literal>ใ€
    <literal>U+1F250</literal>-<literal>U+1F251</literal>ใ€
-   <literal>U+20000</literal>-<literal>U+2FFFD</literal>ใ€
-   <literal>U+30000</literal>-<literal>U+3FFFD</literal>ใ€‚
+   <literal>U+1F260</literal>-<literal>U+1F265</literal>ใ€
+   <literal>U+1F300</literal>-<literal>U+1F320</literal>ใ€
+   <literal>U+1F32D</literal>-<literal>U+1F335</literal>ใ€
+   <literal>U+1F337</literal>-<literal>U+1F37C</literal>ใ€
+   <literal>U+1F37E</literal>-<literal>U+1F393</literal>ใ€
+   <literal>U+1F3A0</literal>-<literal>U+1F3CA</literal>ใ€
+   <literal>U+1F3CF</literal>-<literal>U+1F3D3</literal>ใ€
+   <literal>U+1F3E0</literal>-<literal>U+1F3F0</literal>ใ€
+   <literal>U+1F3F4</literal>ใ€
+   <literal>U+1F3F8</literal>-<literal>U+1F3FA</literal>ใ€
+   <literal>U+1F3FB</literal>-<literal>U+1F3FF</literal>ใ€
+   <literal>U+1F400</literal>-<literal>U+1F43E</literal>ใ€
+   <literal>U+1F440</literal>ใ€
+   <literal>U+1F442</literal>-<literal>U+1F4FC</literal>ใ€
+   <literal>U+1F4FF</literal>-<literal>U+1F53D</literal>ใ€
+   <literal>U+1F54B</literal>-<literal>U+1F54E</literal>ใ€
+   <literal>U+1F550</literal>-<literal>U+1F567</literal>ใ€
+   <literal>U+1F57A</literal>ใ€
+   <literal>U+1F595</literal>-<literal>U+1F596</literal>ใ€
+   <literal>U+1F5A4</literal>ใ€
+   <literal>U+1F5FB</literal>-<literal>U+1F5FF</literal>ใ€
+   <literal>U+1F600</literal>-<literal>U+1F64F</literal>ใ€
+   <literal>U+1F680</literal>-<literal>U+1F6C5</literal>ใ€
+   <literal>U+1F6CC</literal>ใ€
+   <literal>U+1F6D0</literal>-<literal>U+1F6D2</literal>ใ€
+   <literal>U+1F6D5</literal>-<literal>U+1F6D7</literal>ใ€
+   <literal>U+1F6DD</literal>-<literal>U+1F6DF</literal>ใ€
+   <literal>U+1F6EB</literal>-<literal>U+1F6EC</literal>ใ€
+   <literal>U+1F6F4</literal>-<literal>U+1F6FC</literal>ใ€
+   <literal>U+1F7E0</literal>-<literal>U+1F7EB</literal>ใ€
+   <literal>U+1F7F0</literal>ใ€
+   <literal>U+1F90C</literal>-<literal>U+1F93A</literal>ใ€
+   <literal>U+1F93C</literal>-<literal>U+1F945</literal>ใ€
+   <literal>U+1F947</literal>-<literal>U+1F9FF</literal>ใ€
+   <literal>U+1FA70</literal>-<literal>U+1FA74</literal>ใ€
+   <literal>U+1FA78</literal>-<literal>U+1FA7C</literal>ใ€
+   <literal>U+1FA80</literal>-<literal>U+1FA86</literal>ใ€
+   <literal>U+1FA90</literal>-<literal>U+1FAAC</literal>ใ€
+   <literal>U+1FAB0</literal>-<literal>U+1FABA</literal>ใ€
+   <literal>U+1FAC0</literal>-<literal>U+1FAC5</literal>ใ€
+   <literal>U+1FAD0</literal>-<literal>U+1FAD9</literal>ใ€
+   <literal>U+1FAE0</literal>-<literal>U+1FAE7</literal>ใ€
+   <literal>U+1FAF0</literal>-<literal>U+1FAF6</literal>ใ€
+   <literal>U+20000</literal>-<literal>U+2A6DF</literal>ใ€
+   <literal>U+2A6E0</literal>-<literal>U+2A6FF</literal>ใ€
+   <literal>U+2A700</literal>-<literal>U+2B738</literal>ใ€
+   <literal>U+2B739</literal>-<literal>U+2B73F</literal>ใ€
+   <literal>U+2B740</literal>-<literal>U+2B81D</literal>ใ€
+   <literal>U+2B81E</literal>-<literal>U+2B81F</literal>ใ€
+   <literal>U+2B820</literal>-<literal>U+2CEA1</literal>ใ€
+   <literal>U+2CEA2</literal>-<literal>U+2CEAF</literal>ใ€
+   <literal>U+2CEB0</literal>-<literal>U+2EBE0</literal>ใ€
+   <literal>U+2EBE1</literal>-<literal>U+2F7FF</literal>ใ€
+   <literal>U+2F800</literal>-<literal>U+2FA1D</literal>ใ€
+   <literal>U+2FA1E</literal>-<literal>U+2FA1F</literal>ใ€
+   <literal>U+2FA20</literal>-<literal>U+2FFFD</literal>ใ€
+   <literal>U+30000</literal>-<literal>U+3134A</literal>ใ€
+   <literal>U+3134B</literal>-<literal>U+3FFFD</literal> ใ€‚
    ไป–ใฎใ™ในใฆใฎๆ–‡ๅญ—ใฏๅŠ่ง’ใฎๆ–‡ๅญ—ใงใ™ใ€‚
   </para>
  </refsect1>

zonuexe avatar Sep 02 '22 13:09 zonuexe

but difficult to maintain in the future.

Yeah. I wonder whether we should refer to https://github.com/php/php-src/blob/master/ext/mbstring/libmbfl/mbfl/eaw_table.h instead.

cmb69 avatar Sep 02 '22 13:09 cmb69