LEMLAT3
LEMLAT3 copied to clipboard
Error while processing some input strings (batch processing)
Three types of input cause error ('segmentation fault') in batch mode:
- strings containing backslash character
'\' - strings containing some non ascii (further investigation needed to know exactly)
- strings longer than 30 characters
A quick workaround is to filter input file in advance (note that the filtered out words would not be analyzed anyway) for example with this simple cascade of sed command in bash
LANG=C sed 's/\\/ /g' input_file | sed -E "s/[^\x00-\x7F]+/ /g" |\
sed 's/[-_\/\$[:alnum:]]\{30,\}/ /g' > alt_input_file
THANX TO Enrique