Open-Assistant
Open-Assistant copied to clipboard
Proposal: Adding support for the Catalan language.
I'd like to add support for the Catalan language, similarly to Euskara, which is already supported by Open-Assistant, Catalan is a language spoken in some areas of Spain.
So far this dataset has been proposed to support the language https://github.com/projecte-aina/lm-catalan. It contains 52000 tokens and ~30GB of data. If required I could investigate more available sources as this dataset seems to contain only data for the main dialect.
The creation of a community on the discord server would also be highly appreciated.