OpeningHoursParser icon indicating copy to clipboard operation
OpeningHoursParser copied to clipboard

Possibility to exclude certain strings from parsing

Open ges1227 opened this issue 8 years ago • 5 comments

Hey @simonpoole, I tried to parse some of the following examples:

daily 05.00 am - 09.00 pm and also 06.00 a.m. - 07.00 p.m..

Unfortunately, both didn't pass the non-strict mode of the parser, due to the fragments 'daily' and 'a.m.'. Is a future implementation planned?

In the meantime, would it be possible to provide an additional functionality to help us out? Perhaps one, in which the user is able to exclude certain strings like 'a.m.', 'daily' and others by defining them in advance?

ges1227 avatar Oct 19 '17 14:10 ges1227

In general there are two ways this could be done:

  • restarting parsing after it fails for unknown tokens (however this wouldn't result in a valid spec in many cases)
  • skipping certain predefined strings in lexical analysis, disadvantage: they have to be predefined so would need to be fairly common for this to make sense

As to "a.m." - "p.m." the same goes as above, there needs to a non-neligble amount of use for adding these to make sense, we already have a bad case of diminishing returns with a lot of the special cases we are handling in non-strict mode.

simonpoole avatar Oct 19 '17 15:10 simonpoole

skipping certain predefined strings in lexical analysis, disadvantage: they have to be predefined so would need to be fairly common for this to make sense

Maybe that helps: https://github.com/opening-hours/opening_hours.js/blob/master/locales/word_error_correction.yaml

ypid avatar Oct 19 '17 17:10 ypid

@ges1227 I've done some work on this by skipping such token in lexical analysis. This works well from a pure functional pov, unluckily it makes implementation of strict/non-strict modes of the parser rather messy and forces us to return a JAVA Error instead of an Exception if we detect such a token in strict mode, which will cause validators to moan endlessly (this is due to an architectural wart of javacc), So I'm not quite sure if the code should really be included. Need to think about it a bit.

simonpoole avatar Oct 22 '17 13:10 simonpoole

@simonpoole, @ypid Thanks for your support, it really brought me further! So I have been experimenting with the YAML file and achieved a rather satisfing solution. Basically my input strings for the OpeningHoursParser will be filtered by replacing foul words according to the definitions in the YAML file (little example attached).

Therefore no worries about a messy parser anymore, your tips helped me to manage the problem. As a sidenote, the code is far from perfect.. maybe two YAML files (one for regex, one for 'normal' words) are more helpful to distinguish, whether a regex or 'normal' replacement should be applied onto the string.

ges1227 avatar Oct 26 '17 12:10 ges1227

a.m. and p.m. supported via https://github.com/simonpoole/OpeningHoursParser/commit/0acfaa6f5be1d90b8b7316e9e4e98cde2f9bbc9c

simonpoole avatar Oct 26 '17 21:10 simonpoole