unilex
unilex copied to clipboard
Lexical data at Unicode
Please update the SPDX in data files to: `SPDX-License-Identifier: Unicode-3.0` relates to: - #10 - #19 - https://github.com/unicode-org/.github/issues/15
Your data is an impressive work which could help many, many minority and rare languages to get stronger online representation. The Wikimedia Foundation, Wikipedia, Wikidata, and @Lingua-Libre movements would love...
https://arxiv.org/abs/2205.03983 
Add frequency * hpps://portal.sina.birzeit.edu/curras Assign issue to me please.
Possibility to create or request resource for Tatar: * Corpus: Corpus of written tatar (Saykhunov 2021), see [here](https://www.corpus.tatar/en) * [wordlist](https://corpus.tatar/stat_en.htm) > "[Frequency list of Tatar wordforms (case-sensitive)](https://www.corpus.tatar/stat/tatcorpus3.words_frequency_case-sensitive.bz2)" * Corpus: [leipzig](https://cls.corpora.uni-leipzig.de/en/tat_web_2019/)...
There is a [`crawl_ca-valencia.py`](https://github.com/google/corpuscrawler/blob/master/Lib/corpuscrawler/crawl_ca_valencia.py) [within the google/corpuscrawler projects](https://github.com/google/corpuscrawler/search?q=valencia). Which produces a file visible on their [readme.md](https://raw.githubusercontent.com/google/corpuscrawler/master/README.md) . Surprisingly, this frequency file didn't make it to UNILEX. As renowed Twitter expert...
Worthwhile project, but the corpus has lots of plain English (both Am & Br/Aus) and probably the source materials contain texts in English as well as Tok Pisin and the...
Just for references since I'am hand-comparing the languages lists of Lingualibre vs UNILEX. I observed the following languages are not in UNILEX, possibly for various reasons. I'am conscious this issue...
The [Griko language resources](https://bitbucket.org/antonis/grikoresource) include manually assigned [part of speech tags](https://bitbucket.org/antonis/grikoresource/src/897cb9d9526901e0905ef0c8330267b896a5eb15/data/projected_tags/train.projected_tags.txt?at=master&fileviewer=file-view-default). @antonisa, would you perhaps be interested in contributing this data to the Unilex project? If you’re interested, would you...
@antonisa, thanks again for your data submission! For now, I’ve tagged it as `el-Latn-u-sd-it75` which means “Greek in the Latin writing system as used in Apulia”. Is your data actually...