dataverse icon indicating copy to clipboard operation
dataverse copied to clipboard

Sanitize languages controlled vocabulary values

Open stevenferey opened this issue 2 years ago • 3 comments

What this PR does / why we need it:

This is a first proposal open to proposals in order to fix the desired modifications before working on the flyway script.

Which issue(s) this PR closes:

Closes #8243

Special notes for your reviewer:

Provide your suggestions for modifications directly in the PR review

Additional documentation:

https://en.wikipedia.org/wiki/Wikipedia:WikiProject_Languages/List_of_ISO_639-3_language_codes_(2019)

https://en.wikipedia.org/wiki/List_of_ISO_639-2_codes

stevenferey avatar Dec 20 '23 16:12 stevenferey

This PR's content can be used as a support to discuss the following issue (that has been taken into account in the PR) :

  • #8578

DS-INRAE avatar Dec 20 '23 16:12 DS-INRAE

And to further state the obvious, I was focusing on how these changes may affect metadata imports. I'm assuming that the intent behind the proposed changes to the main language names ("Slovenian", "Swahili (macrolanguage)" etc.) was how they appear in the UI menus (?). Both are important concerns, and it should be possible to reconcile them.

landreev avatar Feb 15 '24 17:02 landreev

@setevenferey I was waiting for some feedback, but then got distracted by working on other things, so I never finished looking into this (apologies). I still would like to know if it is really necessary to change the main controlled vocabulary value, such as changing Swahili to Swahili (macrolanguage) and a couple of other similar proposed changes in this PR? As I was saying, this can be done if there is a real need, but since changes like this cannot be handled by our normal metadata block update procedure a direct database update via Flyway would be needed - and we generally try to avoid that. Could you please tell me what is your primary use case and the main reason to want to make these changes (to the swahili, nepali and slovene languages) - a) metadata exports b) metadata imports c) what is shown in the CVV menu on the edit metadata page? Adding missing ISO codes and alternative spellings on the other hand is not controversial at all.

landreev avatar Mar 29 '24 21:03 landreev

Hello @landreev,

We have no real need for the modification of the Swahili, Nepali and Slovenian languages, the goal is to be in agreement with the ISO standard but the sources of information are sometimes different. We can effectively keep the language names unchanged with changes to ISO codes and alternative spellings.

like the proposal for the Slovenian language: language Slovene 146 slv sl Slovenian

Thanks a lot

stevenferey avatar Apr 10 '24 14:04 stevenferey

As I mentioned earlier, in place of this pr, I created my own branch and made a new pr: https://github.com/IQSS/dataverse/pull/10481.

landreev avatar Apr 11 '24 18:04 landreev

@landreev as the new PR has been reviewed, should we close this one already :) ?

DS-INRAE avatar Apr 16 '24 07:04 DS-INRAE

@landreev as the new PR has been reviewed, should we close this one already :) ?

Yes, we can close it now, or we can wait until #10481 is merged - I don't have a strong preference.

landreev avatar Apr 16 '24 21:04 landreev

Thank you for your feedback,

Reviews of this PR are reflected in PR #10481 I propose to close this PR. Thanks.

stevenferey avatar Apr 17 '24 09:04 stevenferey