Jesse de Does

Results 11 comments of Jesse de Does

Dear John, You are right, the plain text output from ALTO is execrable. The reason is that conversion takes place indirectly, ALTO --> tokenized TEI with zoning --> plain text....

I have added content in the lib directory. Please let me know if you have any problems!

Hello all, sorry to catch up only today - The right command line for conversion from txt to TEI is (txt not text) java -jar OpenConvert.jar -from txt -to TEI...

Thanks both!! I can install @PonteIneptique's version. I run into cuda issues later on, but that is most likely a problem of my local machine.

Thanks again! (My machine does have cuda, but it magically gets mixed up on system updates from time to time)

First the easy ones: - We fixed the validation issue found by Tomaz in one of the files - We removed the resp statement for linguistic annotation from the annotated...

- missing text: this has to do with text paragraphs which could not automatically be classified in the first step op the conversion from HTML to TEI. In the first...

* Using common taxonomies. We have tried to do this as much as possible now. When categories we need are missing from the common taxonomy, we add a -BE file...

Multipe speaker types indeed break the validation: ``` Error: Type error on line 332 column 49 of parlamint-lib.xsl: XTTE0780 A sequence of more than one item is not allowed as...

Summarizing: - Gap becomes note for the unclassified paragraphs. Some way to characterize this content would be welcome. Maybe allow `subtype="problematic_content"` or something along those lines? - We removed some...