Open-Assistant icon indicating copy to clipboard operation
Open-Assistant copied to clipboard

Proposal: Adding support for the Catalan language.

Open Nan-Do opened this issue 2 years ago • 0 comments

I'd like to add support for the Catalan language, similarly to Euskara, which is already supported by Open-Assistant, Catalan is a language spoken in some areas of Spain.

So far this dataset has been proposed to support the language https://github.com/projecte-aina/lm-catalan. It contains 52000 tokens and ~30GB of data. If required I could investigate more available sources as this dataset seems to contain only data for the main dialect.

The creation of a community on the discord server would also be highly appreciated.

Nan-Do avatar Apr 17 '23 03:04 Nan-Do