pocketsphinx icon indicating copy to clipboard operation
pocketsphinx copied to clipboard

JSGF Grammer not working as expected

Open G10DRAS opened this issue 9 years ago • 14 comments

I have created a simple JSGF as follows and using it with PocketSphinx-5prealpha (Python API)

#JSGF V1.0; 
grammar testGrammar;
<unit>     = (METER|CENTIMETER|MILE);
<number>  = (ONE|TWO|THREE|FOUR|FIVE|SIX|SEVEN|EIGHT|NINE|TEN|HUNDRED|THOUSAND)+;
public <phrases> = (WHAT IS YOUR NAME)  |  (<number> <unit> (EQUAL TO) [HOW MANY] <unit>) ;

Output what I am expecting (always) out of above grammer: either WHAT IS YOUR NAME or phrases like "ONE THOUSAND FIVE HUNDRED TEN METER EQUAL TO MILE" "ONE THOUSAND FIVE HUNDRED TEN METER EQUAL TO HOW MANY MILE" "ONE MILE EQUAL TO METER"

Which I am getting most of the time, but sometime I also get output like: "ONE TWO WHAT IS YOUR NAME" "THOUSAND WHAT IS YOUR NAME"

I don't want such phrases, How to avoid this ?

If I remove '+' (one-or-many) operator from below line in grammer file: <number> = (ONE|TWO|THREE|FOUR|FIVE|SIX|SEVEN|EIGHT|NINE|TEN|HUNDRED|THOUSAND)+; Grammer works as expected but then I cant repeat the numbers and able to use only one number at a time. for example "HUNDRED MILE EQUAL TO HOW MANY METER" and not like "FIVE HUNDRED MILE EQUAL TO HOW MANY METER"

G10DRAS avatar Jun 04 '16 00:06 G10DRAS

@nshmyrev confirmed its a bug. https://sourceforge.net/p/cmusphinx/discussion/help/thread/077e2341/

G10DRAS avatar Jun 04 '16 01:06 G10DRAS

hello @nshmyrev, any progress on the issue?

G10DRAS avatar Aug 09 '16 21:08 G10DRAS

Sorry, no time yet

nshmyrev avatar Aug 10 '16 13:08 nshmyrev

hello @nshmyrev, any workaround available for this issue?

G10DRAS avatar Sep 26 '16 14:09 G10DRAS

@G10DRAS, the workaround would be to revert https://github.com/cmusphinx/sphinxbase/commit/e59cac40480bcb41cbe955e3d417502ce55a8b77

nshmyrev avatar Sep 26 '16 14:09 nshmyrev

Thanks @nshmyrev, I will take a look in to code but for now I found a temporary workaround for + operator.

Workaround is - If <number> = ONE|TWO|THREE|FOUR|FIVE|SIX|SEVEN|EIGHT|NINE|TEN|HUNDRED|THOUSAND; then <number>+ = <number> [<number>*]

After update my JSGF looks like below

#JSGF V1.0; 
grammar testGrammar;
<unit>     = METER|CENTIMETER|MILE;
<number>  = ONE|TWO|THREE|FOUR|FIVE|SIX|SEVEN|EIGHT|NINE|TEN|HUNDRED|THOUSAND;
public <phrases> = (WHAT IS YOUR NAME) | <number> [<number>*] <unit> (EQUAL TO) [HOW MANY] <unit> ;

Now it is working as expected.

G10DRAS avatar Sep 27 '16 02:09 G10DRAS

I wrote the changes referred to above. I've been using them for years and haven't encountered the problem you describe. I'm trying to reproduce your issue with the latest code & haven't seen it yet.

I'm happy to try to track it down, bisecting the repo if necessary, but first I need to reproduce it.

ulatekh avatar Sep 06 '18 19:09 ulatekh

Well generate FSG from JSGF, you will see the problem.

G10DRAS avatar Sep 06 '18 21:09 G10DRAS

Look at the commit I made in my fork -- let me know if it works for you. If so, I have one more change I want to make (to make right recursion more space-efficient) and then I'll make a pull request.

ulatekh avatar Sep 11 '18 01:09 ulatekh

is this you changed ? https://github.com/ulatekh/sphinxbase/commit/2b36dd4726fea2e59260c05378864faa2ade53ce

G10DRAS avatar Sep 12 '18 02:09 G10DRAS

Actually, it's ulatekh/sphinxbase@ac8d70d189 now...I made a bug fix.

ulatekh avatar Sep 12 '18 14:09 ulatekh

ok will try and let you know.

G10DRAS avatar Sep 12 '18 19:09 G10DRAS

It's been five months. Have you had a chance to look at this? It's been working fine for me.

I have a bunch of other changes I'd like to submit, but they're being held up by this.

ulatekh avatar Feb 15 '19 21:02 ulatekh

sorry not yet. could you generate and compare FSG for both the versions of PS and go ahead.

G10DRAS avatar Feb 15 '19 23:02 G10DRAS

Unfortunately, the fix in that PR still creates incorrect grammars for several of the test cases. For instance, the JSGF for test.rightRecursion is:

<action> = stop | start;

but with the patch above it produces this FSG, which will fail to accept <action> on its own, though that is obviously accepted by the grammar:

NUM_STATES 4
START_STATE 0
FINAL_STATE 3
TRANSITION 0 1 1.000000 stop
TRANSITION 0 1 1.000000 start
TRANSITION 0 3 1.000000 
TRANSITION 1 2 1.000000 and
TRANSITION 2 0 1.000000 
FSG_END

My original JSGF implementation was certainly quite inefficient, but it was correct. Because I don't understand the optimization or the fix, I am going to revert the original change. The correct (though perhaps not super-efficient) way to do it is to apply the standard epsilon-removal algorithm (https://www.openfst.org/twiki/bin/view/FST/RmEpsilonDoc) to the generated FSG.

dhdaines avatar Sep 28 '22 21:09 dhdaines