opencorpora icon indicating copy to clipboard operation
opencorpora copied to clipboard

Added python script for parsing xml -> tsv or json.

Open alvadia opened this issue 5 years ago • 1 comments

Added python script for parsing xml -> tsv or json. First argument: from (example - dict.xml) Second argument: to (example - dict.json) Third argument: mode (example - json).

Sample format of tsv (' ' means space): id \t root \t data \t extra \n #header \t dictionary \t version \t revision \n OpenCorpora \t dictionary \t <version_from_xml> \t \n #lemmas lemma variants empty [ \t ' ' <';'.join(attributes)> \t ' ' <';'.join(attributes)> [, ' ' <';'.join(attributes)>]* \t \n]* #gramemes \t parent \t alias \t description \n [ \t \t \t \n]* #links \t from \t to \t type \n [ \t \t \t \n]*

It requires much less space. This script is a sample, it requires a .sh wrapper.

alvadia avatar Mar 04 '21 19:03 alvadia

#5

alvadia avatar Mar 04 '21 19:03 alvadia