language-data icon indicating copy to clipboard operation
language-data copied to clipboard

Make language-data available in MediaWiki core

Open winstonsung opened this issue 1 year ago • 8 comments

We would like to bring language-data to MediaWiki core.

Questions

Should we move this repository to Gerrit/GitLab?

Reason for Gerrit:

  • We could easily make the list of users with CR +2 rights the same with mediawiki/core on Gerrit.
  • ~~It would be hard for integration of Depends-On on different platforms.~~
    (There's no CI injection/dependency feature for libraries in Wikimedia Gerrit.)

Reason for GitLab: Contributors aren't required to accept third party privacy policies.


The reason it should be under Gerrit instead of GitLab is due to the decision of the project layout.

This repository shold fall under mediawiki/libs (i.e., named as mediawiki/libs/LanguageData and included in /vendor in mediawiki/core) as it should contain PHP codes, and all mediawiki/libs/ projects were on Gerrit while none of them were on GitLab.

https://www.mediawiki.org/wiki/GitLab/Migration_status


Should composer.json be exported?

Looks like we need composer.json to be exported, should it be removed from .gitattributes export-ignore?

https://gerrit.wikimedia.org/r/c/mediawiki/vendor/+/1056254

Nikki wrote:

The language-data format doesn't support all the data they have (multiple scripts, Wikidata IDs, English names, parent language/families, etc), and requires data that is hard to get (autonyms), I think it would need big changes if it's ever going to be useful for things other than selecting a MediaWiki interface language.

Considerations

  • BCP 47 Language/script/region/variant subtags
  • ISO codes
    • NOTE: This is actually different from BCP 47 subtags.
  • MediaWiki internal language codes
  • Wikidata IDs
  • WikiLambda ZID
  • Autonyms (language name written in its local writing system)
  • The script in which a language is written
    • Multiple scripts
  • The regions in which the language is spoken/written
  • Translations of language names
    • English names
  • Language fallback chains
  • Parent language/families
  • The writing mode of the text
    • The directionality of the text
    • The writing-mode property of the text
  • Time formats

Bug: T190129

winstonsung avatar Jul 24 '24 12:07 winstonsung

We would like to bring language-data to MediaWiki core, and it would be hard for integration of Depends-On on different platforms.

There's no CI injection/dependency feature for libraries in Wikimedia gerrit. It might conceptually get built, but not for a few years.

OTOH, moving this to Wikimedia GitLab, so contributors aren't required to accept third party privacy policies, would be very easily do-able.

jdforrester avatar Jul 24 '24 13:07 jdforrester

There's no CI injection/dependency feature for libraries in Wikimedia gerrit.

Oh. Sounds bad.

winstonsung avatar Jul 24 '24 13:07 winstonsung

You didn't respond to my second point, or wait for the owners of this repo to decide before closing. Let's give them the opportunity

jdforrester avatar Jul 24 '24 13:07 jdforrester

Given that jquery.i18n is still on GitHub, I think we could keep this repository on GitHub until we consider moving most of them.

winstonsung avatar Jul 24 '24 13:07 winstonsung

Looks like we need composer.json to be exported, should it be removed from .gitattributes export-ignore?

https://gerrit.wikimedia.org/r/c/mediawiki/vendor/+/1056254

winstonsung avatar Jul 25 '24 11:07 winstonsung

The reason it should be under Gerrit instead of GitLab is due to the decision of the project layout.

This repository shold fall under mediawiki/libs (i.e., named as mediawiki/libs/LanguageData and included in /vendor in mediawiki/core) as it should contain PHP codes, and all mediawiki/libs/ projects were on Gerrit while none of them were on GitLab.

https://www.mediawiki.org/wiki/GitLab/Migration_status

@jdforrester

winstonsung avatar Oct 06 '24 17:10 winstonsung