wikitextprocessor
wikitextprocessor copied to clipboard
Python package for WikiMedia dump processing (Wiktionary, Wikipedia etc). Wikitext parsing, template expansion, Lua module execution. For data extraction, bulk syntax checking, error detection, and o...
I'm processing a recent English Wikipedia dump and getting `assign to undeclared variable` errors from modules that don't have a `require ('strict');` in them. Here's stripped down code to replicate...
In some analyzed texts we have the presence of characters such as `{{!-}}`, `}}|`, `{{{blbla bla|}}}` etc. For example in the following articles: - [Alpes-de-Haute-Provence](https://fr.wikipedia.org/wiki/Alpes-de-Haute-Provence) - [Akhenaton](https://fr.wikipedia.org/wiki/Akhenaton) - [Anubis](https://fr.wikipedia.org/wiki/Anubis) -...
Page: https://ru.wiktionary.org/wiki/footer Template: https://ru.wiktionary.org/wiki/Шаблон:длина_слова The "длина_слова" template calls itself if it is used for substitution, or use "main other" template. I think it's "{{{|safesubst:}}}" in the template delays the expansion...
Page: https://en.wiktionary.org/wiki/वाक्नु Error: https://kaikki.org/dictionary/errors/details-Traceback--most-recent-call-last-----F-Plr1Wuzg.html ``` वाक्नु (Nepali verb) LUA error in #invoke('ne-conj', 'show') parent ('Template:ne-conj', {'i': 'y'}) Traceback (most recent call last): File "/home/ubuntu/temp-wiktionary/venv/lib/python3.10/site-packages/wikitextprocessor/luaexec.py", line 745, in call_lua_sandbox ret: tuple[bool,...
<ref> elements (and probably other html-like tags) inside list items can seeminly contain newlines
This is annoying, because of the structure of our parser. If we have the source (from comprise/English): ``` # {{...}} To [[compose]]; to [[constitute]].Traditionally, the whole comprised its parts, ......
There are some `TEMPLATE not properly closed` debug messages in fr edition: https://kaikki.org/frwiktionary/errors/subpage6/details-TEMPLATE-not-properly-closed.html Here is part of wikitext from page https://fr.wiktionary.org/wiki/Conjugaison:français/bayer ``` {{Onglets conjugaison | onglet1 =/ba.je/ avec « y...