JSGF Grammer not working as expected
I have created a simple JSGF as follows and using it with PocketSphinx-5prealpha (Python API)
#JSGF V1.0;
grammar testGrammar;
<unit> = (METER|CENTIMETER|MILE);
<number> = (ONE|TWO|THREE|FOUR|FIVE|SIX|SEVEN|EIGHT|NINE|TEN|HUNDRED|THOUSAND)+;
public <phrases> = (WHAT IS YOUR NAME) | (<number> <unit> (EQUAL TO) [HOW MANY] <unit>) ;
Output what I am expecting (always) out of above grammer: either WHAT IS YOUR NAME or phrases like "ONE THOUSAND FIVE HUNDRED TEN METER EQUAL TO MILE" "ONE THOUSAND FIVE HUNDRED TEN METER EQUAL TO HOW MANY MILE" "ONE MILE EQUAL TO METER"
Which I am getting most of the time, but sometime I also get output like: "ONE TWO WHAT IS YOUR NAME" "THOUSAND WHAT IS YOUR NAME"
I don't want such phrases, How to avoid this ?
If I remove '+' (one-or-many) operator from below line in grammer file:
<number> = (ONE|TWO|THREE|FOUR|FIVE|SIX|SEVEN|EIGHT|NINE|TEN|HUNDRED|THOUSAND)+;
Grammer works as expected but then I cant repeat the numbers and able to use only one number at a time.
for example
"HUNDRED MILE EQUAL TO HOW MANY METER"
and not like
"FIVE HUNDRED MILE EQUAL TO HOW MANY METER"
@nshmyrev confirmed its a bug. https://sourceforge.net/p/cmusphinx/discussion/help/thread/077e2341/
hello @nshmyrev, any progress on the issue?
Sorry, no time yet
hello @nshmyrev, any workaround available for this issue?
@G10DRAS, the workaround would be to revert https://github.com/cmusphinx/sphinxbase/commit/e59cac40480bcb41cbe955e3d417502ce55a8b77
Thanks @nshmyrev, I will take a look in to code but for now I found a temporary workaround for + operator.
Workaround is -
If
<number> = ONE|TWO|THREE|FOUR|FIVE|SIX|SEVEN|EIGHT|NINE|TEN|HUNDRED|THOUSAND;
then
<number>+ = <number> [<number>*]
After update my JSGF looks like below
#JSGF V1.0;
grammar testGrammar;
<unit> = METER|CENTIMETER|MILE;
<number> = ONE|TWO|THREE|FOUR|FIVE|SIX|SEVEN|EIGHT|NINE|TEN|HUNDRED|THOUSAND;
public <phrases> = (WHAT IS YOUR NAME) | <number> [<number>*] <unit> (EQUAL TO) [HOW MANY] <unit> ;
Now it is working as expected.
I wrote the changes referred to above. I've been using them for years and haven't encountered the problem you describe. I'm trying to reproduce your issue with the latest code & haven't seen it yet.
I'm happy to try to track it down, bisecting the repo if necessary, but first I need to reproduce it.
Well generate FSG from JSGF, you will see the problem.
Look at the commit I made in my fork -- let me know if it works for you. If so, I have one more change I want to make (to make right recursion more space-efficient) and then I'll make a pull request.
is this you changed ? https://github.com/ulatekh/sphinxbase/commit/2b36dd4726fea2e59260c05378864faa2ade53ce
Actually, it's ulatekh/sphinxbase@ac8d70d189 now...I made a bug fix.
ok will try and let you know.
It's been five months. Have you had a chance to look at this? It's been working fine for me.
I have a bunch of other changes I'd like to submit, but they're being held up by this.
sorry not yet. could you generate and compare FSG for both the versions of PS and go ahead.
Unfortunately, the fix in that PR still creates incorrect grammars for several of the test cases. For instance, the JSGF for test.rightRecursion is:
<action> = stop | start;
but with the patch above it produces this FSG, which will fail to accept <action> on its own, though that is obviously accepted by the grammar:
NUM_STATES 4
START_STATE 0
FINAL_STATE 3
TRANSITION 0 1 1.000000 stop
TRANSITION 0 1 1.000000 start
TRANSITION 0 3 1.000000
TRANSITION 1 2 1.000000 and
TRANSITION 2 0 1.000000
FSG_END
My original JSGF implementation was certainly quite inefficient, but it was correct. Because I don't understand the optimization or the fix, I am going to revert the original change. The correct (though perhaps not super-efficient) way to do it is to apply the standard epsilon-removal algorithm (https://www.openfst.org/twiki/bin/view/FST/RmEpsilonDoc) to the generated FSG.