Mixed RTL and LTR content is hard to read in text inputs
This issue was created automatically by a script.
Bug 1500333
Bug Reporter: Mahtab Alam [:alamM] <[email protected]> CC: @amire80, @flodolo, @guerojeff, @mathjazz, [email protected], [email protected], [email protected] See also: https://bugzilla.mozilla.org/show_bug.cgi?id=1602426
Created attachment 9018524 Screenshot (61).png
User Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:62.0) Gecko/20100101 Firefox/62.0
Steps to reproduce:
For using XML Tag and External Argument as it is I clicked on it but it got mixed with one another.
Actual results:
Both got mixed with one another.
Expected results:
They should have remained separate.
Comment Author: @flodolo
Sorry but I don't understand what the bug is about.
Is it an issue with Pontoon? Is it an error with translation? If it's Pontoon, we need to move this bug, and you should provide a bit more information on what you did, and what the expected behavior was.
Comment Author: Mahtab Alam [:alamM] <[email protected]>
(In reply to Francesco Lodolo [:flod] from comment #1)
Sorry but I don't understand what the bug is about.
Is it an issue with Pontoon? Is it an error with translation? If it's Pontoon, we need to move this bug, and you should provide a bit more information on what you did, and what the expected behavior was.
Yes! This is with Pontoon. It's not a Translation Error. As I have attached the Screenshot where you can see the XML Tag and External Argument are separate in the actual string but in the translated one & Translation Panel it got mixed.
Comment Author: @flodolo
I'm looking at the string in a text editor, and it seems correct to me? https://pontoon.mozilla.org/ur/firefox/browser/browser/preferences/preferences.ftl/?search=extension-controlled-privacy-containers&string=178407
ایک ایکسٹینشن , {$name}, کو کنٹینر ٹیب کی ضرورت ہے۔
Comment Author: Mahtab Alam [:alamM] <[email protected]>
(In reply to Francesco Lodolo [:flod] from comment #3)
I'm looking at the string in a text editor, and it seems correct to me? https://pontoon.mozilla.org/ur/firefox/browser/browser/preferences/ preferences.ftl/?search=extension-controlled-privacy-containers&string=178407
ایک ایکسٹینشن ,
{$name}, کو کنٹینر ٹیب کی ضرورت ہے۔
Putting it in text editor correct only XML Tag & External Argument but the Urdu Translation got mismatched.
Comment Author: @mathjazz
Steps to reproduce:
- Type "a" in the textarea.
- Click on the XML placeable: "
".
You get this in the textarea:
</"a<img data-l10n-name="icon
Mahtab, thanks for the report! What would be the expected value in the textarea after you insert the placeable?
Comment Author: Mahtab Alam [:alamM] <[email protected]>
The expected value should be <img data-l10n-name="icon"/> a or a <img data-l10n-name="icon"/> depending upon the context.
Comment Author: @mathjazz
Thanks!
I'm looking at how this works for Hebrew (another RTL locale), which has an approved translation in Pontoon: https://pontoon.mozilla.org/he/firefox/browser/browser/preferences/preferences.ftl/?string=178407
Which means it's also in the file: https://hg.mozilla.org/l10n-central/he/file/2c05277ec42e/browser/browser/preferences/preferences.ftl#l91
According to Comment 6, the XML tag in the file output (LTR) seems correct.
So I suspect the problem is that the string contains both, the RTL and LTR content and we force
I wonder what can we even do about this. Flagging Amir with a NI, who's been helping us with RTL issues in the past (see bug #1190566 for example).
Comment Author: Mohammed Yaseen Khan [:foxt7ot] <[email protected]>
Thanks Mahtab for raising the issue.
Yes Matjaz, your hunch is correct. The issue is becuase the statement contains both RTL and LTR characters and this issue was there in pootle as well and I suspect this is the case with other RTL as well.
Comment Author: @amire80
Sorry, noticed it only now.
Unfortunately, I cannot think of any way to fix this easily. It's a major inherent problem with how RTL languages work. Mixing RTL text with any kind of left-to-right code, including XML is always a disaster. This is why translating into RTL languages in text files is so awful: in translations files every single line has some LTR text in it, so everything is jumbled. Using any web-based translation solution such as Pontoon makes it much better, because it separates the translation from the source string and from the LTR string key. However, it doesn't fix this problem completely because some code or markup is quite often embedded in the string itself, as it is in this example.
The ways to fix such things are:
- Make Pontoon have super-smart input boxes that are not just plain text, but that are able to truly separate code from text. It would be super-cool, but probably very complicated to make.
- Create aliases in RTL languages for XML element and attribute names. If it's done, then in Hebrew it would look like this:הרחבה" בשם <תמונה נתונים-תרגום-שם="סמל"/> {$שם} דורשת שימוש במגירת לשוניות." In theory, it would solve the problem, but it may introduce other problems, and it's a bit of a bottomless pit.
- The most realistic solution is to have a policy that strongly suggests developers to avoid any kind of code or markup in translatable strings, unless it's really, really needed. It would be good for translators to all languages and not only to RTL ones, because it will make it easier for non-developers to translate. (For many people who grew up with the 1990s web HTML and similar things are natural, but it's not true for everyone. There are people who could be great translators, but who have a hard time with markup languages, and reducing this problem may increase volunteers' participation.)
Comment Author: @mathjazz
Thanks for a very valuable input, Amir!
I'm lowering the priority until we find a meaningful way forward.
Comment Author: @amire80
(In reply to Matjaz Horvat [:mathjazz] from comment #10)
Thanks for a very valuable input, Amir!
Sure, happy to help any time. Sorry it took so long.
I'm lowering the priority until we find a meaningful way forward.
The most realistic way, as I mention in the end of my comment is not so much in the area of feature development, but in the area of policies and practices for writing, reviewing, and maintaining code: strongly encourage developers to move as much code and markup out of translatable strings as possible.
Comment Author: Safa Alfulaij <[email protected]>
It might help to add a "Raw mode" as what Pootle did. Here everything is breaked and forced LTR so you can check tags and other stuff easily. Link: https://github.com/translate/pootle/issues/3941 There is no other way of fixing it as I see it.
Comment Author: Safa Alfulaij <[email protected]>
Created attachment 9114558 Urdu (ur) · Firefox Updated bidi algorithm.png
This is how I see it. Yes it has a problem, but not a big one.
Tbh, eliminating markup from text strings is a bad idea, absoulutly bad. You create different parts, making translation much harder. Developers need to provide proper context, mistakes occur. I belive that translators who translate applications must have at least a bit of knowledge in techincal aspects like variables and placeholders and plurals and and and
Attached file: Screenshot_2019-12-09-Urdu-(ur)-·-Firefox.png (image/png, 53428 bytes) Description: Urdu (ur) · Firefox Updated bidi algorithm.png
Comment Author: @guerojeff
(In reply to Amir Aharoni from comment #11)
(In reply to Matjaz Horvat [:mathjazz] from comment #10)
Thanks for a very valuable input, Amir!
Sure, happy to help any time. Sorry it took so long.
I'm lowering the priority until we find a meaningful way forward.
The most realistic way, as I mention in the end of my comment is not so much in the area of feature development, but in the area of policies and practices for writing, reviewing, and maintaining code: strongly encourage developers to move as much code and markup out of translatable strings as possible.
Thanks Amir, but in our experience it's often more effort/cost to change developer behavior. We already ask developers to be aware of how much code they're including in strings, but resourcing any strict enforcement is not something we have resources to do.
Your first suggestion is consistent with how the majority of other computer-assisted translation tools handle code/tagged elements. They're condensed in the string automatically and the user has to expand them manually if they want to see or manipulate their content. Here's a good example: https://docs.sdl.com/LiveContent/content/en-US/SDL%20Trados%20Studio%20Help-v4/GUID-C6676C93-2EEF-4945-9438-905F05EF268E
Comment Author: Md Shahbaz Alam [:shahbaz17] <[email protected]>
Hello Everyone,
After much thought, we are thinking to use the following approach in Urdu.
Pontoon: https://pontoon.mozilla.org/ur
Writing Hamza(ء) at the end of an LTR to convert into RTL(for editor) as adding Hamza won't change the meaning of the sentence and it is a symbol that doesn't add meaning to a word.
What we expect is having a tool that looks for hamza(ء) and makes it a hidden element. So, in theory, they will be in DOM but not shown to the end-user.
Comment Author: Md Shahbaz Alam [:shahbaz17] <[email protected]>
Created attachment 9156283 Without Hamza Approach, a sentence would look like this in textarea.
Attached file: Without Hamza.jpeg (image/jpeg, 44518 bytes) Description: Without Hamza Approach, a sentence would look like this in textarea.
Comment Author: Md Shahbaz Alam [:shahbaz17] <[email protected]>
Created attachment 9156285 Hamza Approach, this will be displayed correctly in textarea.
Attached file: Hamza Approach.jpeg (image/jpeg, 46511 bytes) Description: Hamza Approach, this will be displayed correctly in textarea.
Comment Author: Safa Alfulaij <[email protected]>
Why not use a RLM? Assign it to a shortcut in the keyboard and you're good to go! This is not the proper way to "solve" this issue. RLM/LRM and all the other marks are there for this exact thing.
Comment Author: Md Shahbaz Alam [:shahbaz17] <[email protected]>
Hello Safa,
Thanks for the reference.
I agree this is not a proper approach.
But using either left-to-right mark: or (U+200E) right-to-left mark: or (U+200F)
on Pontoon editor, it doesn't work unless it is designed to do it this way.
Comment Author: Safa Alfulaij <[email protected]>
Hi!
I'm not sure what you mean. Pontoon doesn't need to do it itself, one can just insert it. We did that many times in Arabic translations. In windows there are shortcuts for ZWJ/ZWNJ/LRM/RLM in the Arabic layout (Ctrl+Shift+[1-4]). In Linux you can just customize the keyboard layout, or use helper tools (Character map, etc).
Ideal is that Pontoon implements this: https://bugzilla.mozilla.org/show_bug.cgi?id=1372861