unilex icon indicating copy to clipboard operation
unilex copied to clipboard

Lexical data at Unicode

Results 11 unilex issues
Sort by recently updated
recently updated
newest added

Please update the SPDX in data files to: `SPDX-License-Identifier: Unicode-3.0` relates to: - #10 - #19 - https://github.com/unicode-org/.github/issues/15

Your data is an impressive work which could help many, many minority and rare languages to get stronger online representation. The Wikimedia Foundation, Wikipedia, Wikidata, and @Lingua-Libre movements would love...

https://arxiv.org/abs/2205.03983 ![Screenshot_2022-06-22-00-35-49-05_40deb401b9ffe8e1df2f1cc5ba480b12](https://user-images.githubusercontent.com/1420189/174908171-c909060e-14f3-451d-bd10-2da4fe4a7411.jpg)

Add frequency * hpps://portal.sina.birzeit.edu/curras Assign issue to me please.

Possibility to create or request resource for Tatar: * Corpus: Corpus of written tatar (Saykhunov 2021), see [here](https://www.corpus.tatar/en) * [wordlist](https://corpus.tatar/stat_en.htm) > "[Frequency list of Tatar wordforms (case-sensitive)](https://www.corpus.tatar/stat/tatcorpus3.words_frequency_case-sensitive.bz2)" * Corpus: [leipzig](https://cls.corpora.uni-leipzig.de/en/tat_web_2019/)...

There is a [`crawl_ca-valencia.py`](https://github.com/google/corpuscrawler/blob/master/Lib/corpuscrawler/crawl_ca_valencia.py) [within the google/corpuscrawler projects](https://github.com/google/corpuscrawler/search?q=valencia). Which produces a file visible on their [readme.md](https://raw.githubusercontent.com/google/corpuscrawler/master/README.md) . Surprisingly, this frequency file didn't make it to UNILEX. As renowed Twitter expert...

Worthwhile project, but the corpus has lots of plain English (both Am & Br/Aus) and probably the source materials contain texts in English as well as Tok Pisin and the...

Just for references since I'am hand-comparing the languages lists of Lingualibre vs UNILEX. I observed the following languages are not in UNILEX, possibly for various reasons. I'am conscious this issue...

The [Griko language resources](https://bitbucket.org/antonis/grikoresource) include manually assigned [part of speech tags](https://bitbucket.org/antonis/grikoresource/src/897cb9d9526901e0905ef0c8330267b896a5eb15/data/projected_tags/train.projected_tags.txt?at=master&fileviewer=file-view-default). @antonisa, would you perhaps be interested in contributing this data to the Unilex project? If you’re interested, would you...

@antonisa, thanks again for your data submission! For now, I’ve tagged it as `el-Latn-u-sd-it75` which means “Greek in the Latin writing system as used in Apulia”. Is your data actually...