LEMLAT3 icon indicating copy to clipboard operation
LEMLAT3 copied to clipboard

Error while processing some input strings (batch processing)

Open gersh0m opened this issue 6 years ago • 0 comments

Three types of input cause error ('segmentation fault') in batch mode:

  • strings containing backslash character '\'
  • strings containing some non ascii (further investigation needed to know exactly)
  • strings longer than 30 characters

A quick workaround is to filter input file in advance (note that the filtered out words would not be analyzed anyway) for example with this simple cascade of sed command in bash

LANG=C sed  's/\\/ /g' input_file  | sed -E  "s/[^\x00-\x7F]+/ /g" |\
 sed 's/[-_\/\$[:alnum:]]\{30,\}/ /g' > alt_input_file

THANX TO Enrique

gersh0m avatar Oct 18 '19 10:10 gersh0m