myrobotlab icon indicating copy to clipboard operation
myrobotlab copied to clipboard

Remove logic that strips special characters from programab input.

Open kwatters opened this issue 7 years ago • 1 comments

ProgramAB is designed to take free form input from end users as is. We should not sanitize this input. This ticket is to remove the logic that strips punctuation and other characters from the input. All logic for handling this should be taken care of in the bot specific normal.txt and denormal.txt.

kwatters avatar Dec 03 '18 14:12 kwatters

Supporting languages that use a lot of hyphens and other syntax is a pain with current programab / aiml. The better approach here would be to swap out how programab tokenizes text. The idea is we would/could replace the current tokenization with the lucene tokenizers. This was already done to support Japanese, we can extend it to support a lot more languages.

For reference, here's the javadoc for the French Analyzer .. https://lucene.apache.org/core/7_5_0/analyzers-common/org/apache/lucene/analysis/fr/FrenchAnalyzer.html

kwatters avatar Dec 03 '18 15:12 kwatters