LinguaCafe icon indicating copy to clipboard operation
LinguaCafe copied to clipboard

Dictionaries: add original language dictionary - add definition from wiktionary in the original language

Open pm3003 opened this issue 1 year ago • 6 comments

This is a feature request

When learners reach a certain level with a language, simple translations are not enough. Coincidentally, that's often when they start to read classical texts, and when LinguaCafe is most useful.

Original language dictionaries help understand finer nuances and slightly different significations of a word.

Request: Add original language dictionaries. I believe this can be most easily achieved by adding wiktionary dictionaries in the original language (German dictionary from wiktionary.de, Italian wiktionary from wiktionary.it, etc)with word and definition.

pm3003 avatar Feb 15 '24 12:02 pm3003

It is something that I also planned. Do you have a source for monolingual wiktionaries?

It will probably be added later, because there are some other things I want to prioritize first.

simjanos-dev avatar Feb 15 '24 13:02 simjanos-dev

Thank you very much !

Here are some comments on dictionaries, I believe the easiest is Wiktionary :

Apart from wiktionaries (available in xml format , as raw dumps ), I used a few years ago a toolchain that included sdcv (Stardict command-line version). They had trouble at some point because they offered scraped copyrighted dictionaries, but they also have a handful of free dictionaries. https://github.com/Dushistov/sdcv https://github.com/huzheng001/stardict-3.

The GLAWI/ENGLAWI project has a free restructured version of Wiktionary for some languages.

The project dictmaster has a link to a relatively old list of free offline dictionaries, explicitely free, and not explicitely free (though for example the American Heritage Dictionary is public domain).

Project Gutenberg has free dictionaries in full-text format, that might be easibly parseable (See for example this Welsh-English dictionary). This person has done it with the Project Gutenberg's digitization of the 1913 Webster dictionary. (There's also a GNU version of it )

The Russian Website Lingvo has a lot of dictionaries in bgl/Goldendict format, but most of them are likely copyrighted.

Regarding two languages I know well, French and German:

  • French : the very good Littré dictionary is available as XML. TLFI and the Académie Française Dictionary are better and should be legally usable. I can ask the public institutions behind those two dicitionaries if they can provide an offline version.
  • German : the reference dictionary service overall is on dwds.de, but they have some copyrighted sources and they don't provide an API for definitions (neither do Duden and Langenscheidt, the reference commercial dictionaries). However, the best classical dictionary (by far) for the German language is the DWB, the dictionary of the famous Grimm brothers. It's still unmatched in some aspects. It's digitization was managed by the Trier University Center for Digital Humanities. They've done quite a few other dictionaries. The DWB is public domain and at some point it was available on CD ROM. I believe there's a high chance they would provide the XML data if asked nicely. I'll look for it on the Internet meanwhile (for the DWB at least) .

pm3003 avatar Feb 17 '24 00:02 pm3003

Thank you so much for the detailed information and links!

I will start with the monolingual wiktionaries first in the next couple of updates. I haven't checked the details of xml yet, if the definitions are extractable easily, they should be very easy to add. I'll look at all the other ones in the future, and eventually add all of them.

I love dictionaries, the more the better.

simjanos-dev avatar Feb 17 '24 09:02 simjanos-dev

word

It seems like it will be very difficult to parse it. XML files do not have a "meaning" field, so I'll have to try to parse the plain formatted text somehow.

simjanos-dev avatar Feb 28 '24 15:02 simjanos-dev

@pm3003

Hi!

It is very difficult to parse the original XML files from wiktionary.

I found kaikki.org. It has monlingual wiktionaries in a format that can be easily imported, I will start with these. It has monolingual wiktionaries in these languages:

  • Chinese
  • French
  • German
  • Russian
  • Spanish

I plan to add more, and go through your sources, it will just take some time.

simjanos-dev avatar Mar 26 '24 19:03 simjanos-dev

Will you consider supporting the Yomichan format (JSON)? It will gain access to a wealth of dictionaries made by Yomichan users in many languages, not just Japanese. It can be bring-your-own-dictionary and you don't need to worry about copyright.

lef-est avatar Jun 28 '24 12:06 lef-est