jquery.uls icon indicating copy to clipboard operation
jquery.uls copied to clipboard

getLanguagesInTerritory should apply a threshold or allow consumers to do so

Open nemobis opened this issue 12 years ago • 0 comments

mw.uls.getFrequentLanguageList blindly appends whatever $.uls.data.getLanguagesInTerritory( countryCode ) spits to the list of "common languages" for a territory. If we look deeper into why languages suggested for Italy are so wrong ( https://bugzilla.wikimedia.org/62346), in addition to the issues already reported to CLDR there is the issue that we're not applying any threshold.

For instance, CLDR tells us that hr is spoken by 0.0057 % of the population, which is probably correct, but nevertheless hr manages to get into the list of "common" languages, which is absurd. I know that if the data was better then picking the top 7-9 languages (as the compact links feature does) would hide this issue, but it would make sense to cut the long tail, be it a threshold of 1, 0.1 or 0.01 % of the population.

The implementation doesn't matter. Some alternatives to cutting the tail in getLanguagesInTerritory:

  1. the output could contain some data (like the population data in CLDR) so that mw.uls.getFrequentLanguageList can do a filtering on its own, or
  2. it could be a new jquery.uls function, wrapping getLanguagesInTerritory, which cuts the tail and would be used by mw.uls.getFrequentLanguageList .

nemobis avatar Mar 14 '14 10:03 nemobis