superset icon indicating copy to clipboard operation
superset copied to clipboard

fix(i18n): improved Russian translation

Open goldjee opened this issue 1 year ago • 7 comments

SUMMARY

Recreated my previous PR #28371 because some PRs for other languages were approved and I wasn't able to resolve conflicts by myself. This new version only touches russian lang files and common messages.pot.

This is a pretty big change to Russian translation.

  • Extracted new strings and updated language files, as described in the documentation.
  • Normalized usage of dots at the end of sentences. The rule is simple: if given label contains only one sentence, it should not be ended with a dot. On the contrary, labels comprised of several sentences should be ended with a dot.
  • Normalized usage of "е" and "ё" symbols. As there was no "ё" pretty much everywhere and people are used to omit dots above this symbol, I have changed "ё" to "е" everywhere this symbol occurs.
  • Fixed typos and written language style issues.
  • Renamed chart to "Диаграмма" (previously it was called "График"). The reason is that "график" is a word that means "line plot". As most of visuals types are not plots, it is much more correct to call them "диаграмма" (diagram, chart) instead.
  • Normalized naming of various chart types.
  • Used English names for things where it is appropriate. For instance, "Superset" should not be translated to "Суперсет", as it is a proper noun.
  • Added quite a bunch of missing strings, corrected and confirmed some drafts.
  • Shortened some strings for the sake of brevity and to make them fit into tight spaces.

Of course, there is still a lot of work to do, but I have managed to cover the most frequently viewed UI elements and pages. Hope the team and end users will find this version alright.

BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF

TESTING INSTRUCTIONS

ADDITIONAL INFORMATION

  • [ ] Has associated issue:
  • [ ] Required feature flags:
  • [ ] Changes UI
  • [ ] Includes DB Migration (follow approval process in SIP-59)
    • [ ] Migration is atomic, supports rollback & is backwards-compatible
    • [ ] Confirm DB migration upgrade and downgrade tested
    • [ ] Runtime estimates and downtime expectations provided
  • [ ] Introduces new feature or API
  • [ ] Removes existing feature or API

goldjee avatar May 18 '24 17:05 goldjee

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 83.61%. Comparing base (76d897e) to head (c442024). Report is 1094 commits behind head on master.

Additional details and impacted files
@@             Coverage Diff             @@
##           master   #28572       +/-   ##
===========================================
+ Coverage   60.48%   83.61%   +23.12%     
===========================================
  Files        1931      519     -1412     
  Lines       76236    37458    -38778     
  Branches     8568        0     -8568     
===========================================
- Hits        46114    31321    -14793     
+ Misses      28017     6137    -21880     
+ Partials     2105        0     -2105     
Flag Coverage Δ
hive 48.94% <ø> (-0.23%) :arrow_down:
javascript ?
mysql 77.15% <ø> (?)
postgres 77.29% <ø> (?)
presto 53.53% <ø> (-0.27%) :arrow_down:
python 83.61% <ø> (+20.12%) :arrow_up:
sqlite 76.73% <ø> (?)
unit 59.01% <ø> (+1.38%) :arrow_up:

Flags with carried forward coverage won't be shown. Click here to find out more.

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

codecov[bot] avatar May 18 '24 17:05 codecov[bot]

@villebro @rusackas Could you please check why Python-Integration / test-mysql (pull_request) test has failed? I haven't altered Python part of code, but maybe I have still messed up somehow.

goldjee avatar May 19 '24 16:05 goldjee

@villebro @rusackas Could you please check why Python-Integration / test-mysql (pull_request) test has failed? I haven't altered Python part of code, but maybe I have still messed up somehow.

Nevermind, probably a glitch. Now it passes. Can we move forward with this PR?

goldjee avatar May 20 '24 16:05 goldjee

Maybe @artemonsh can take a look at proofreading these edits?

rusackas avatar May 20 '24 18:05 rusackas

@artemonsh, thank you for the feedback!

Right, there are still quite a bunch of empty strings. Translating, I was leaving particular string as is if:

  • I haven't encountered it in the UI and wasn't sure about the context.
  • The string could be a common term that shouldn't be translated.
  • It looked like some low-level stuff that, when translated, could confuse end user. For instance, I have intentionally left lines 81-84 that you have mentioned. I haven't stumbled upon them in the UI and, given that they are uppercase, I thought that I could break something if I provide a translation. That was the case when someone among previous contributors has translated "DELETE" string (or some other one like that) and was asked to leave it as is.
  • I wasn't sure how to properly translate a string because I am not familiar with related functionality and didn't want to mess up.

Sure there is a lot more work to do. At this point my goal was to provide at least something I was confident with to improve user experience, not the full localization. As the next step, I was planning to check specific chart settings, fill the gaps there and straighten up the terminology used. But I think it's better to deliver that in future PRs.

Regarding what word to use as "chart" term. I totally agree that's one of the core terms and it should be picked very carefully. For reference, Ozhegov's dictionary gives this definition to the term:

ГРАФИК - диаграмма, изображающая при помощи кривых количественные показатели движения, состояния чего-н. ГРАФИК is a diagram that depicts quantitative indicators of movement, state of something with curves.

Diagram is described as follows:

ДИАГРАММА - Графическое изображение соотношения каких-н. величин. ДИАГРАММА is a graphical representation of the relationship of some quantities.

Come on, "Диаграмма" is just 3 characters longer than "График", but that's much more suitable :)

Yeah, Power BI is very popular, it's my daily workhorse as well. Just checked the naming in v2.126.927.0 (February 2024), and they used "Визуализации" ("Visualizations") for visual elements of dashboards, which is even longer.

PBI

Of course, trying to incorporate new naming into Superset, I have checked that the word fits the UI space. Looked quite nice and I haven't seen any elements that were broken on my 13'' laptop screen. Let me show you some examples.

"Charts" screen Charts

"Create chart" dialog Create chart

Chart editor Chart editor

Picking a chart in Dashboard editor Dashboard editor

If there are some other spots worth checking, please tell me.

goldjee avatar May 21 '24 08:05 goldjee

Also, before moving forward, please let me adjust related commits as I've just noticed I haven't set my local Git username/email properly :/

goldjee avatar May 21 '24 09:05 goldjee

Fixed it. Sorry for the mess. I'm new to collaborative development and still figuring out how everything works.

goldjee avatar May 21 '24 10:05 goldjee

@mistercrunch @villebro I'm a little stumped on how to properly review this. The changes to the .pot file are huge, and when I run the extraction command (pybabel extract -F superset/translations/babel.cfg -o superset/translations/messages.pot -k _ -k __ -k t -k tn -k tct .) locally on this branch, the .pot file is again quite different from what's on this PR. This all makes me nervous 😅

rusackas avatar May 31 '24 23:05 rusackas

@rusackas I think that's because that command doesn't order extracted strings. Before doing anything with the .po file in this branch, I have run ./scripts/babel_update.sh as was mentioned in the documentation. So even though a lot of the strings weren't actually modified by me, the lines were moved automatically and formed a huge diff.

I have noticed that every string has a reference to the file that uses it. If the translations could be automatically arranged by these references, even alphabetically, that would result in more deterministic output of the extraction and help to identify the exact diffs. As an additional effect, it would enable simultaneous improvements of the same .po file by several contributors. As of now, any other merged PR on russian translation would end up as an integration hell for me.

goldjee avatar Jun 01 '24 17:06 goldjee

Pointing out two things, one is I move the file under scripts/translations/babel_update.sh.

The other is note the --sort-output in the command: https://github.com/apache/superset/blob/master/scripts/translations/babel_update.sh#L45

The docs had 2-3 ways of calling the pybabel commands, and i think my recent PR removed that in favor of pointing the docs to the scripts.

mistercrunch avatar Jun 03 '24 20:06 mistercrunch

@mistercrunch The first commit in this branch was based on previous workflow of updating i18n files, as described here (BTW these docs are still behind your PR #28483 for some reason). Then I've been working with that version and didn't re-run babel_update.sh even after rebasing to later state of upstream. Shall I extract strings again?

goldjee avatar Jun 04 '24 08:06 goldjee

Definitely hoping to avoid these conflict nightmares resulting from poorly sorted (and seldom updated) babel extracts.

If you can update this PR to run the script as @mistercrunch suggests (with --sort-output) then at least I should be able to run it when I pull the branch and not get a mess. This seems like it would make it approvable.

Other follow-up items might be: • Make sure the --sort-output is highly discoverable or called by another command (this sounds like it's the case now, but might warrant a quick review • Add the extraction script as a CI job, or even the pre-commit hooks, making sure that the .pot remains fresh as people add/remove/update translation calls throughout the application.

rusackas avatar Jun 04 '24 16:06 rusackas

I've run

./scripts/translations/babel_update.sh --sort-output
 pybabel update -i superset/translations/messages.pot -d superset/translations --ignore-obsolete

All i18n files were affected. Now the Python-Integration check is failing again, idk why. Could you please take a look?

goldjee avatar Jun 04 '24 20:06 goldjee

Weird... there was a test where an endpoint was returning a 200 instead of a 202. This seems like an error we saw in those tests a while back that was resolved, but I don't think this PR is as old as that. I've run the CI job again to see if it was just flakey and magically passes. If not, the PR might need a rebase, which I hope isn't too painful. Let's wait and see...

rusackas avatar Jun 04 '24 22:06 rusackas

Just a note, but from my understanding, running this

./scripts/translations/babel_update.sh --sort-output
 pybabel update -i superset/translations/messages.pot -d superset/translations --ignore-obsolete

as you mentioned @goldjee , is the same as just running

./scripts/translations/babel_update.sh

as the bash script doesn't receive any params.

mistercrunch avatar Jun 05 '24 20:06 mistercrunch

Weird... there was a test where an endpoint was returning a 200 instead of a 202

That's a known flaky test, happens on mysql only for some reason maybe one out of 20 runs or so. I'll send Superset swag to whoever fixes it!

mistercrunch avatar Jun 05 '24 20:06 mistercrunch

[I think] It should always be safe to run ./scripts/translations/babel_update.sh and merge what comes out. So I'm fine with this PR as is. As we do this, we should expect the .po files to change a lot as if you were to only add a single line at the top some files, it'll change all of the "which line of code" references.

As much as it's useful to have the references, it's a pain from a source control standpoint as the moment anyone adds a line of code, the .po files change a lot, which can lead to merge conflicts and such.

@goldjee , in terms of handling merge conflicts, I think it should always be safe to:

  • before the imminent merge conflict, set aside the .po file(s) you've altered/improved in a temp folder someplace
  • rebase/merge the upstream changes
  • copy your .poback into their place in the repo
  • re-run ./scripts/translations/babel_update.sh

I think we may want to capture this ^^^ into the docs.

Now another thing we may want to do would be to tell people submitting translation PRs to NOT run ./scripts/translations/babel_update.sh as part of their PR, so that they're actually reviewable... In this PR it seems impossible to review the changes you've made because GitHub is overwhelmed by number of changes. Even if it were to expose them, it would still be hard to review as it's a mix of legit changes + .po files bookkeeping.

I wonder if we could set the .po files to NOT hold the references (?)

mistercrunch avatar Jun 05 '24 20:06 mistercrunch

@goldjee Thanks for translations! But I still struggle with some things I can't translate. Like this in table chart: image And this in pie chart: image Do you have any ideas how I can translate it?

Atmostone avatar Jun 06 '24 11:06 Atmostone

@Atmostone My pleasure.

Do you mean that you weren't able to find corresponding lines in messages.po file? I have stumbled upon several cases like that as well. For example, at "Create chart" page there is "or" that I wasn't able to translate.

Снимок экрана 2024-06-06 в 16 54 32

I don't know how to deal with such cases yet. Probably the code of the corresponding components don't take advantage of i18n system and should be modified in prior.

Also, @mistercrunch @rusackas, will it be appropriate if I create a discussion to gather feedback like this on russian translation? I would like to tinker with it a bit more, as there is a lot of room for improvements.

goldjee avatar Jun 06 '24 13:06 goldjee

@goldjee image For this case I found some strings that can be related, then added translation, but it didn't work image Then I found this file that has only FR and ZH translations, maybe I need to modify this file (but we use superset on docker that has only compiled frontend, so I'm gonna try to install it manually and hope I can make experiments with frontend)

Atmostone avatar Jun 06 '24 13:06 Atmostone

For things like that table components and other things, we may need to feed them "translatable strings", assuming they do allow controlling those labels externally. You may have to do a bit of digging into the React component tree, and read the external libraries' documentation to figure how whether and how you can pass these translatable labels to them. If it's not supported, it may be possible to contribute such features, or there may be hacks to modify the dom - though this is not the preferred route. i18n is a worthy cause and component lib managers should welcome these changes if they're not already in there.

mistercrunch avatar Jun 06 '24 14:06 mistercrunch