Case: fstalign ignores symbols table and does not find alignment in a simple transcript
Hi,
fstalign 1.6.1 does not load fst symbol tables properly and modifies the hypothesis FST so it's completely borked:
Here is the output of the command /fstalign/bin/fstalign wer --ref /data/customer/ref.txt --hyp /data/customer/hyp.nlp --symbols /data/customer/hyp.sym --output-sbs /data/customer/res.sbs --log /data/customer/res.log in the current docker. It happens both with txt file (one gold transcript word per line) and ctm with time aligned gold transcript.
[2023-02-16 10:51:17.854] [console] [info] loggers initialized
[+++] [10:51:17] [console] fstalign version is 1.6.1
[+++] [10:51:17] [console] reading reference plain text from /data/customer/ref.txt
[+++] [10:51:17] [console] reading hypothesis fst from /data/customer/hyp.fst
[+++] [10:51:17] [fstalign] starting conversion to int vector
[+++] [10:51:17] [fstalign] converting ref to int vector
[+++] [10:51:17] [OneBestFstLoader] creating std::vector<int> for OneBestFstLoader for 27 tokens
[+++] [10:51:17] [fstalign] converting hyp to int vector
[+++] [10:51:17] [FstFileLoader] convertToIntVector isn't implemented for FST inputs
[+++] [10:51:17] [fstalign] Either ref or hyp is really small, skipping over the levenstein distance, ref size: 27, hyp size: 0
[+++] [10:51:17] [FstFileLoader] Total FST has 27 states.
[+++] [10:51:17] [fstalign] generating ref synonyms from symbol table
[+++] [10:51:17] [fstalign] applying ref synonyms on ref fst
[+++] [10:51:17] [SynonymEngine] we have 0 registered first word rules label id
[+++] [10:51:17] [fstalign] printing ref fst
[+++] [10:51:17] [fstalign] 0 1 8/hello 8/hello 0.0
[+++] [10:51:17] [fstalign] 1 2 9/i'm 9/i'm 0.0
[+++] [10:51:17] [fstalign] 2 3 10/fine 10/fine 0.0
[+++] [10:51:17] [fstalign] 3 4 11/suzana 11/suzana 0.0
[+++] [10:51:17] [fstalign] 4 5 12/how 12/how 0.0
[+++] [10:51:17] [fstalign] 5 6 13/are 13/are 0.0
[+++] [10:51:17] [fstalign] 6 7 14/you 14/you 0.0
[+++] [10:51:17] [fstalign] 7 8 15/mhm 15/mhm 0.0
[+++] [10:51:17] [fstalign] 8 9 16/sure 16/sure 0.0
[+++] [10:51:17] [fstalign] 9 10 17/yes 17/yes 0.0
[+++] [10:51:17] [fstalign] 10 11 18/okay 18/okay 0.0
[+++] [10:51:17] [fstalign] 11 12 19/ah 19/ah 0.0
[+++] [10:51:17] [fstalign] 12 13 20/just 20/just 0.0
[+++] [10:51:17] [fstalign] 13 14 21/a 21/a 0.0
[+++] [10:51:17] [fstalign] 14 15 22/couple 22/couple 0.0
[+++] [10:51:17] [fstalign] 15 16 23/of 23/of 0.0
[+++] [10:51:17] [fstalign] 16 17 24/minutes 24/minutes 0.0
[+++] [10:51:17] [fstalign] 17 18 9/i'm 9/i'm 0.0
[+++] [10:51:17] [fstalign] 18 19 25/on 25/on 0.0
[+++] [10:51:17] [fstalign] 19 20 26/my 26/my 0.0
[+++] [10:51:17] [fstalign] 20 21 27/way 27/way 0.0
[+++] [10:51:17] [fstalign] 21 22 28/into 28/into 0.0
[+++] [10:51:17] [fstalign] 22 23 21/a 21/a 0.0
[+++] [10:51:17] [fstalign] 23 24 29/doctor's 29/doctor's 0.0
[+++] [10:51:17] [fstalign] 24 25 30/appointment 30/appointment 0.0
[+++] [10:51:17] [fstalign] 25 26 18/okay 18/okay 0.0
[+++] [10:51:17] [fstalign] 26 27 18/okay 18/okay 0.0
[+++] [10:51:17] [fstalign] 27 28 0/<eps> 0/<eps> 0.0
[+++] [10:51:17] [fstalign] printing hyp fst
[+++] [10:51:17] [fstalign] 0 1 23/of 23/of 0.99158907
[+++] [10:51:17] [fstalign] 0 1 24/minutes 24/minutes 0.008410932
[+++] [10:51:17] [fstalign] 1 2 25/on 25/on 0.8511446
[+++] [10:51:17] [fstalign] 1 2 2/<ins> 2/<ins> 0.05130972
[+++] [10:51:17] [fstalign] 1 2 1/<oov> 1/<oov> 0.047072377
[+++] [10:51:17] [fstalign] 1 2 26/my 26/my 0.0155186895
[+++] [10:51:17] [fstalign] 1 2 27/way 27/way 0.005192776
[+++] [10:51:17] [fstalign] 1 2 28/into 28/into 0.0033090003
[+++] [10:51:17] [fstalign] 1 2 23/of 23/of 0.0015563301
[+++] [10:51:17] [fstalign] 2 3 29/doctor's 29/doctor's 0.3886394
[+++] [10:51:17] [fstalign] 2 3 30/appointment 30/appointment 0.19079825
[+++] [10:51:17] [fstalign] 2 3 31/ 31/ 0.10623746
[+++] [10:51:17] [fstalign] 2 3 3/<del> 3/<del> 0.04679449
[+++] [10:51:17] [fstalign] 2 3 2/<ins> 2/<ins> 0.034456342
[+++] [10:51:17] [fstalign] 2 3 25/on 25/on 0.0081548225
[+++] [10:51:17] [fstalign] 2 3 32/ 32/ 0.004374613
[+++] [10:51:17] [fstalign] 2 3 26/my 26/my 0.002221351
[+++] [10:51:17] [fstalign] 2 3 1/<oov> 1/<oov> 0.002075816
[+++] [10:51:17] [fstalign] 2 3 33/ 33/ 0.0019667444
[+++] [10:51:17] [fstalign] 2 3 34/ 34/ 0.0005650157
[+++] [10:51:17] [fstalign] 3 4 4/<sub> 4/<sub> 0.69604653
[+++] [10:51:17] [fstalign] 3 4 3/<del> 3/<del> 0.15710989
[+++] [10:51:17] [fstalign] 3 4 30/appointment 30/appointment 0.014679594
[+++] [10:51:17] [fstalign] 3 4 35/ 35/ 0.011887497
[+++] [10:51:17] [fstalign] 3 4 36/ 36/ 0.0065641715
[+++] [10:51:17] [fstalign] 3 4 2/<ins> 2/<ins> 0.0056938045
[+++] [10:51:17] [fstalign] 3 4 25/on 25/on 0.0021069725
[+++] [10:51:17] [fstalign] 3 4 26/my 26/my 0.001466121
[+++] [10:51:17] [fstalign] 3 4 29/doctor's 29/doctor's 0.0013138258
[+++] [10:51:17] [fstalign] 3 4 1/<oov> 1/<oov> 0.0012702389
[+++] [10:51:17] [fstalign] 3 4 17/yes 17/yes 0.0009687771
[+++] [10:51:17] [fstalign] 4 5 5/<inaudible> 5/<inaudible> 0.5213346
[+++] [10:51:17] [fstalign] 4 5 37/ 37/ 0.17481348
[+++] [10:51:17] [fstalign] 4 5 38/ 38/ 0.14042015
[+++] [10:51:17] [fstalign] 4 5 36/ 36/ 0.053299483
[+++] [10:51:17] [fstalign] 4 5 3/<del> 3/<del> 0.042188246
[+++] [10:51:17] [fstalign] 4 5 39/ 39/ 0.011979131
[+++] [10:51:17] [fstalign] 4 5 2/<ins> 2/<ins> 0.0036785
[+++] [10:51:17] [fstalign] 4 5 35/ 35/ 0.002629472
[+++] [10:51:17] [fstalign] 4 5 30/appointment 30/appointment 0.0022271618
[+++] [10:51:17] [fstalign] 4 5 40/ 40/ 0.002156982
[+++] [10:51:17] [fstalign] 4 5 41/ 41/ 0.0016741576
[+++] [10:51:17] [fstalign] 5 6 6/<silence> 6/<silence> 0.6163309
[+++] [10:51:17] [fstalign] 5 6 42/ 42/ 0.18521181
[+++] [10:51:17] [fstalign] 5 6 36/ 36/ 0.056179322
[+++] [10:51:17] [fstalign] 5 6 43/ 43/ 0.038890716
[+++] [10:51:17] [fstalign] 5 6 44/ 44/ 0.0326784
[+++] [10:51:17] [fstalign] 5 6 3/<del> 3/<del> 0.026641503
[+++] [10:51:17] [fstalign] 5 6 45/ 45/ 0.020304155
[+++] [10:51:17] [fstalign] 5 6 46/ 46/ 0.006060596
[+++] [10:51:17] [fstalign] 5 6 5/<inaudible> 5/<inaudible> 0.0041220332
[+++] [10:51:17] [fstalign] 5 6 30/appointment 30/appointment 0.0033203475
[+++] [10:51:17] [fstalign] 5 6 47/ 47/ 0.0029174143
[+++] [10:51:17] [fstalign] 6 7 48/ 48/ 0.34014535
[+++] [10:51:17] [fstalign] 6 7 24/minutes 24/minutes 0.2984986
[+++] [10:51:17] [fstalign] 6 7 49/ 49/ 0.19404508
[+++] [10:51:17] [fstalign] 6 7 10/fine 10/fine 0.016649699
[+++] [10:51:17] [fstalign] 6 7 0/<eps> 0/<eps> 0.013604832
[+++] [10:51:17] [fstalign] 6 7 50/ 50/ 0.0062978663
[+++] [10:51:17] [fstalign] 6 7 51/ 51/ 0.0049225087
[+++] [10:51:17] [fstalign] 6 7 52/ 52/ 0.0039882683
[+++] [10:51:17] [fstalign] 6 7 23/of 23/of 0.0033028126
[+++] [10:51:17] [fstalign] 6 7 53/ 53/ 0.0029480883
[+++] [10:51:17] [fstalign] 6 7 54/ 54/ 0.002575561
[+++] [10:51:17] [fstalign] 7 8 55/ 55/ 0.43735883
[+++] [10:51:17] [fstalign] 7 8 8/hello 8/hello 0.40650827
[+++] [10:51:17] [fstalign] 7 8 56/ 56/ 0.038571022
[+++] [10:51:17] [fstalign] 7 8 57/ 57/ 0.010218942
[+++] [10:51:17] [fstalign] 7 8 58/ 58/ 0.009362684
[+++] [10:51:17] [fstalign] 8 9 59/ 59/ 0.81295073
[+++] [10:51:17] [fstalign] 8 9 60/ 60/ 0.05420902
[+++] [10:51:17] [fstalign] 8 9 61/ 61/ 0.02045335
[+++] [10:51:17] [fstalign] 8 9 24/minutes 24/minutes 0.018062603
[+++] [10:51:17] [fstalign] 8 9 54/ 54/ 0.012383968
[+++] [10:51:17] [fstalign] 8 9 62/ 62/ 0.007653652
[+++] [10:51:17] [fstalign] 9 10 10/fine 10/fine 1.0
[+++] [10:51:17] [fstalign] 10 11 11/suzana 11/suzana 0.7825733
[+++] [10:51:17] [fstalign] 10 11 35/ 35/ 0.10785312
[+++] [10:51:17] [fstalign] 10 11 63/ 63/ 0.09928422
[+++] [10:51:17] [fstalign] 10 11 13/are 13/are 0.010289361
[+++] [10:51:17] [fstalign] 11 12 12/how 12/how 1.0
[+++] [10:51:17] [fstalign] 12 13 13/are 13/are 1.0
[+++] [10:51:17] [fstalign] 13 14 14/you 14/you 1.0
[+++] [10:51:17] [fstalign] 14 15 15/mhm 15/mhm 1.0
[+++] [10:51:17] [fstalign] 15 16 16/sure 16/sure 1.0
[+++] [10:51:17] [fstalign] 16 17 1/<oov> 1/<oov> 0.9930773
[+++] [10:51:17] [fstalign] 16 17 35/ 35/ 0.006035227
[+++] [10:51:17] [fstalign] 16 17 64/ 64/ 0.0008874871
[+++] [10:51:17] [fstalign] 17 18 17/yes 17/yes 0.99649245
[+++] [10:51:17] [fstalign] 17 18 65/ 65/ 0.003507566
[+++] [10:51:17] [fstalign] 18 19 18/okay 18/okay 1.0
[+++] [10:51:17] [fstalign] 19 20 19/ah 19/ah 1.0
[+++] [10:51:17] [fstalign] 20 21 20/just 20/just 0.8747955
[+++] [10:51:17] [fstalign] 20 21 66/ 66/ 0.08273692
[+++] [10:51:17] [fstalign] 21 22 13/are 13/are 0.879016
[+++] [10:51:17] [fstalign] 21 22 39/ 39/ 0.052434582
[+++] [10:51:17] [fstalign] 21 22 66/ 66/ 0.029785942
[+++] [10:51:17] [fstalign] 21 22 67/ 67/ 0.0126816565
[+++] [10:51:17] [fstalign] 21 22 68/ 68/ 0.00951749
[+++] [10:51:17] [fstalign] 21 22 69/ 69/ 0.007649917
[+++] [10:51:17] [fstalign] 21 22 35/ 35/ 0.0052877315
[+++] [10:51:17] [fstalign] 21 22 70/ 70/ 0.0021538541
[+++] [10:51:17] [fstalign] 21 22 71/ 71/ 0.0014728603
[+++] [10:51:17] [fstalign] 22 23 21/a 21/a 0.9477601
[+++] [10:51:17] [fstalign] 22 23 72/ 72/ 0.052239873
[+++] [10:51:17] [fstalign] 23 24 22/couple 22/couple 1.0
[+++] [10:51:17] [fstalign] 24 25 10/fine 10/fine 0.9680188
[+++] [10:51:17] [fstalign] 24 25 54/ 54/ 0.02020042
[+++] [10:51:17] [fstalign] 24 25 56/ 56/ 0.011780825
[+++] [10:51:17] [fstalign] 25 26 10/fine 10/fine 1.0
[+++] [10:51:17] [walker] starting a walk in the park
[+++] [10:51:17] [walker] we have 0 candidates after 28 loops
[+++] [10:51:17] [fstalign] done walking the graph
terminate called after throwing an instance of 'std::runtime_error'
what(): no alignment produced
Aborted (core dumped)
The proper FST is however:
0 1 0 0 0.991589
0 1 1 1 0.00841093
1 2 2 2 0.851145
1 2 3 3 0.0513097
1 2 4 4 0.0470724
1 2 5 5 0.0155187
2 3 6 6 0.388639
2 3 7 7 0.190798
2 3 8 8 0.106237
2 3 9 9 0.0467945
3 4 10 10 0.696047
3 4 9 9 0.15711
3 4 7 7 0.0146796
3 4 11 11 0.0118875
4 5 12 12 0.521335
4 5 13 13 0.174813
4 5 14 14 0.14042
4 5 15 15 0.0532995
5 6 16 16 0.616331
5 6 17 17 0.185212
5 6 15 15 0.0561793
5 6 18 18 0.0388907
6 7 19 19 0.340145
6 7 1 1 0.298499
6 7 20 20 0.194045
6 7 21 21 0.0166497
7 8 22 22 0.437359
7 8 23 23 0.406508
7 8 24 24 0.038571
7 8 25 25 0.0102189
8 9 26 26 0.812951
8 9 27 27 0.054209
8 9 28 28 0.0204534
8 9 1 1 0.0180626
9 10 21 21 1
10 11 29 29 0.782573
10 11 11 11 0.107853
10 11 30 30 0.0992842
10 11 31 31 0.0102894
11 12 32 32 1
12 13 31 31 1
13 14 33 33 1
14 15 34 34 1
15 16 35 35 1
16 17 4 4 0.993077
16 17 11 11 0.00603523
16 17 36 36 0.000887487
17 18 37 37 0.996492
17 18 38 38 0.00350757
18 19 39 39 1
19 20 40 40 1
20 21 41 41 0.874795
20 21 42 42 0.0827369
21 22 31 31 0.879016
21 22 43 43 0.0524346
21 22 42 42 0.0297859
21 22 44 44 0.0126817
22 23 45 45 0.94776
22 23 46 46 0.0522399
23 24 47 47 1
24 25 21 21 0.968019
24 25 48 48 0.0202004
24 25 24 24 0.0117808
25 26 21 21 1
26
with a symbol table:
hello 0
i'm 1
fine 2
suzana 3
how 4
are 5
you 6
mhm 7
sure 8
yes 9
okay 10
ah 11
just 12
a 13
couple 14
of 15
minutes 16
on 17
my 18
way 19
into 20
doctor's 21
appointment 22
oh 23
ooh 24
foreign 25
i 26
foreigners 27
foreigner 28
shawna 29
sean 30
shaun 31
sharon 32
showing 33
show 34
or 35
howard 36
it 37
how're 38
our 39
hard 40
hour 41
is 42
here 43
there 44
ya 45
today 46
avenue 47
hum 48
huh 49
hm 50
wow 51
yeah 52
hey 53
right 54
sir 55
sorry 56
share 57
star 58
no 59
nope 60
know 61
most 62
uh 63
um 64
more 65
enjoy 66
enjoyed 67
er 68
your 69
we're 70
her 71
doctors 72
The bug is thus:
- loading a
hyp size: 0when it is not 0 - symbol table is ignored and symbols in fst are completely botched, the first two lines should be:
0 1 23/oh 23/oh 0.99158907
0 1 24/ooh 24/ooh 0.008410932
but were
[+++] [23:45:44] [fstalign] 0 1 23/of 23/of 0.99158907
[+++] [23:45:44] [fstalign] 0 1 24/minutes 24/minutes 0.008410932
i. e. ids were mistakenly shifted -8 in mapping to symbols.
- strange elements in hyp FST after loading like
(never happens in the original fst)? actually the loaded fst looks quite different from the original one! - arcs that are not in hyp fst - like - [+++] [23:45:44] [fstalign] 5 6 6/
6/ 0.6163309 - no alignment
Some progress:
i've started playing with the symbol loading outputs and it turns out it wasn't loading a file (it wasn't there but wasn't failing), after fixing that and adding asr control symbols, I have:
symbol table:
<eps> 0
<oov> 1
<ins> 2
<del> 3
<sub> 4
<inaudible> 5
<silence> 6
<unk> 7
hello 8
i'm 9
fine 10
suzana 11
how 12
are 13
you 14
mhm 15
sure 16
yes 17
okay 18
ah 19
just 20
a 21
couple 22
of 23
minutes 24
on 25
my 26
way 27
into 28
doctor's 29
appointment 30
oh 31
ooh 32
foreign 33
i 34
foreigners 35
foreigner 36
shawna 37
sean 38
shaun 39
sharon 40
showing 41
show 42
or 43
howard 44
it 45
how're 46
our 47
hard 48
hour 49
is 50
here 51
there 52
ya 53
today 54
avenue 55
hum 56
huh 57
hm 58
wow 59
yeah 60
hey 61
right 62
sir 63
sorry 64
share 65
star 66
no 67
nope 68
know 69
most 70
uh 71
um 72
more 73
enjoy 74
enjoyed 75
er 76
your 77
we're 78
her 79
doctors 80
and fstalign correctly prints out the fst:
[2023-02-21 10:40:35.807] [console] [info] loggers initialized
[+++] [10:40:35] [console] fstalign version is 1.6.1
[+++] [10:40:35] [console] reading reference plain text from /data/ctm-fst-align/22-08E6AADCCBB0305AFB_customer/ref.txt
[2023-02-21 10:41:09.182] [console] [info] loggers initialized
[+++] [10:41:09] [console] fstalign version is 1.6.1
[+++] [10:41:09] [console] reading reference ctm from /data/ctm-fst-align/22-08E6AADCCBB0305AFB_customer/ref.ctm
[+++] [10:41:09] [console] reading hypothesis fst from /data/ctm-fst-align/22-08E6AADCCBB0305AFB_customer/hyp.fst
[+++] [10:41:09] [fstalign] starting conversion to int vector
[+++] [10:41:09] [fstalign] converting ref to int vector
[+++] [10:41:09] [ctmloader] creating std::vector<int> for CTM for 27 tokens
[+++] [10:41:09] [fstalign] converting hyp to int vector
[+++] [10:41:09] [FstFileLoader] convertToIntVector isn't implemented for FST inputs
[+++] [10:41:09] [fstalign] Either ref or hyp is really small, skipping over the levenstein distance, ref size: 27, hyp size: 0
[+++] [10:41:09] [FstFileLoader] Total FST has 27 states.
[+++] [10:41:09] [fstalign] generating ref synonyms from symbol table
[+++] [10:41:09] [fstalign] applying ref synonyms on ref fst
[+++] [10:41:09] [SynonymEngine] we have 0 registered first word rules label id
[+++] [10:41:09] [fstalign] printing ref fst
[+++] [10:41:09] [fstalign] 0 1 0/hello 0/hello 0.0
[+++] [10:41:09] [fstalign] 1 2 1/i'm 1/i'm 0.0
[+++] [10:41:09] [fstalign] 2 3 2/fine 2/fine 0.0
[+++] [10:41:09] [fstalign] 3 4 3/suzana 3/suzana 0.0
[+++] [10:41:09] [fstalign] 4 5 4/how 4/how 0.0
[+++] [10:41:09] [fstalign] 5 6 5/are 5/are 0.0
[+++] [10:41:09] [fstalign] 6 7 6/you 6/you 0.0
[+++] [10:41:09] [fstalign] 7 8 7/mhm 7/mhm 0.0
[+++] [10:41:09] [fstalign] 8 9 8/sure 8/sure 0.0
[+++] [10:41:09] [fstalign] 9 10 9/yes 9/yes 0.0
[+++] [10:41:09] [fstalign] 10 11 10/okay 10/okay 0.0
[+++] [10:41:09] [fstalign] 11 12 11/ah 11/ah 0.0
[+++] [10:41:09] [fstalign] 12 13 12/just 12/just 0.0
[+++] [10:41:09] [fstalign] 13 14 13/a 13/a 0.0
[+++] [10:41:09] [fstalign] 14 15 14/couple 14/couple 0.0
[+++] [10:41:09] [fstalign] 15 16 15/of 15/of 0.0
[+++] [10:41:09] [fstalign] 16 17 16/minutes 16/minutes 0.0
[+++] [10:41:09] [fstalign] 17 18 1/i'm 1/i'm 0.0
[+++] [10:41:09] [fstalign] 18 19 17/on 17/on 0.0
[+++] [10:41:09] [fstalign] 19 20 18/my 18/my 0.0
[+++] [10:41:09] [fstalign] 20 21 19/way 19/way 0.0
[+++] [10:41:09] [fstalign] 21 22 20/into 20/into 0.0
[+++] [10:41:09] [fstalign] 22 23 13/a 13/a 0.0
[+++] [10:41:09] [fstalign] 23 24 21/doctor's 21/doctor's 0.0
[+++] [10:41:09] [fstalign] 24 25 22/appointment 22/appointment 0.0
[+++] [10:41:09] [fstalign] 25 26 10/okay 10/okay 0.0
[+++] [10:41:09] [fstalign] 26 27 10/okay 10/okay 0.0
[+++] [10:41:09] [fstalign] printing hyp fst
[+++] [10:41:09] [fstalign] 0 1 23/oh 23/oh 0.99158907
[+++] [10:41:09] [fstalign] 0 1 24/ooh 24/ooh 0.008410932
[+++] [10:41:09] [fstalign] 1 2 25/foreign 25/foreign 0.8511446
[+++] [10:41:09] [fstalign] 1 2 2/fine 2/fine 0.05130972
[+++] [10:41:09] [fstalign] 1 2 1/i'm 1/i'm 0.047072377
[+++] [10:41:09] [fstalign] 1 2 26/i 26/i 0.0155186895
[+++] [10:41:09] [fstalign] 1 2 27/foreigners 27/foreigners 0.005192776
[+++] [10:41:09] [fstalign] 1 2 28/foreigner 28/foreigner 0.0033090003
[+++] [10:41:09] [fstalign] 1 2 23/oh 23/oh 0.0015563301
[+++] [10:41:09] [fstalign] 2 3 29/shawna 29/shawna 0.3886394
[+++] [10:41:09] [fstalign] 2 3 30/sean 30/sean 0.19079825
[+++] [10:41:09] [fstalign] 2 3 31/shaun 31/shaun 0.10623746
[+++] [10:41:09] [fstalign] 2 3 3/suzana 3/suzana 0.04679449
[+++] [10:41:09] [fstalign] 2 3 2/fine 2/fine 0.034456342
[+++] [10:41:09] [fstalign] 2 3 25/foreign 25/foreign 0.0081548225
[+++] [10:41:09] [fstalign] 2 3 32/sharon 32/sharon 0.004374613
[+++] [10:41:09] [fstalign] 2 3 26/i 26/i 0.002221351
[+++] [10:41:09] [fstalign] 2 3 1/i'm 1/i'm 0.002075816
[+++] [10:41:09] [fstalign] 2 3 33/showing 33/showing 0.0019667444
[+++] [10:41:09] [fstalign] 2 3 34/show 34/show 0.0005650157
[+++] [10:41:09] [fstalign] 3 4 4/how 4/how 0.69604653
[+++] [10:41:09] [fstalign] 3 4 3/suzana 3/suzana 0.15710989
[+++] [10:41:09] [fstalign] 3 4 30/sean 30/sean 0.014679594
[+++] [10:41:09] [fstalign] 3 4 35/or 35/or 0.011887497
[+++] [10:41:09] [fstalign] 3 4 36/howard 36/howard 0.0065641715
[+++] [10:41:09] [fstalign] 3 4 2/fine 2/fine 0.0056938045
[+++] [10:41:09] [fstalign] 3 4 25/foreign 25/foreign 0.0021069725
[+++] [10:41:09] [fstalign] 3 4 26/i 26/i 0.001466121
[+++] [10:41:09] [fstalign] 3 4 29/shawna 29/shawna 0.0013138258
[+++] [10:41:09] [fstalign] 3 4 1/i'm 1/i'm 0.0012702389
[+++] [10:41:09] [fstalign] 3 4 17/on 17/on 0.0009687771
[+++] [10:41:09] [fstalign] 4 5 5/are 5/are 0.5213346
[+++] [10:41:09] [fstalign] 4 5 37/it 37/it 0.17481348
[+++] [10:41:09] [fstalign] 4 5 38/how're 38/how're 0.14042015
[+++] [10:41:09] [fstalign] 4 5 36/howard 36/howard 0.053299483
[+++] [10:41:09] [fstalign] 4 5 3/suzana 3/suzana 0.042188246
[+++] [10:41:09] [fstalign] 4 5 39/our 39/our 0.011979131
[+++] [10:41:09] [fstalign] 4 5 2/fine 2/fine 0.0036785
[+++] [10:41:09] [fstalign] 4 5 35/or 35/or 0.002629472
[+++] [10:41:09] [fstalign] 4 5 30/sean 30/sean 0.0022271618
[+++] [10:41:09] [fstalign] 4 5 40/hard 40/hard 0.002156982
[+++] [10:41:09] [fstalign] 4 5 41/hour 41/hour 0.0016741576
[+++] [10:41:09] [fstalign] 5 6 6/you 6/you 0.6163309
[+++] [10:41:09] [fstalign] 5 6 42/is 42/is 0.18521181
[+++] [10:41:09] [fstalign] 5 6 36/howard 36/howard 0.056179322
[+++] [10:41:09] [fstalign] 5 6 43/here 43/here 0.038890716
[+++] [10:41:09] [fstalign] 5 6 44/there 44/there 0.0326784
[+++] [10:41:09] [fstalign] 5 6 3/suzana 3/suzana 0.026641503
[+++] [10:41:09] [fstalign] 5 6 45/ya 45/ya 0.020304155
[+++] [10:41:09] [fstalign] 5 6 46/today 46/today 0.006060596
[+++] [10:41:09] [fstalign] 5 6 5/are 5/are 0.0041220332
[+++] [10:41:09] [fstalign] 5 6 30/sean 30/sean 0.0033203475
[+++] [10:41:09] [fstalign] 5 6 47/avenue 47/avenue 0.0029174143
[+++] [10:41:09] [fstalign] 6 7 48/hum 48/hum 0.34014535
[+++] [10:41:09] [fstalign] 6 7 24/ooh 24/ooh 0.2984986
[+++] [10:41:09] [fstalign] 6 7 49/huh 49/huh 0.19404508
[+++] [10:41:09] [fstalign] 6 7 10/okay 10/okay 0.016649699
[+++] [10:41:09] [fstalign] 6 7 0/hello 0/hello 0.013604832
[+++] [10:41:09] [fstalign] 6 7 50/hm 50/hm 0.0062978663
[+++] [10:41:09] [fstalign] 6 7 51/wow 51/wow 0.0049225087
[+++] [10:41:09] [fstalign] 6 7 52/yeah 52/yeah 0.0039882683
[+++] [10:41:09] [fstalign] 6 7 23/oh 23/oh 0.0033028126
[+++] [10:41:09] [fstalign] 6 7 53/hey 53/hey 0.0029480883
[+++] [10:41:09] [fstalign] 6 7 54/right 54/right 0.002575561
[+++] [10:41:09] [fstalign] 7 8 55/sir 55/sir 0.43735883
[+++] [10:41:09] [fstalign] 7 8 8/sure 8/sure 0.40650827
[+++] [10:41:09] [fstalign] 7 8 56/sorry 56/sorry 0.038571022
[+++] [10:41:09] [fstalign] 7 8 57/share 57/share 0.010218942
[+++] [10:41:09] [fstalign] 7 8 58/star 58/star 0.009362684
[+++] [10:41:09] [fstalign] 8 9 59/no 59/no 0.81295073
[+++] [10:41:09] [fstalign] 8 9 60/nope 60/nope 0.05420902
[+++] [10:41:09] [fstalign] 8 9 61/know 61/know 0.02045335
[+++] [10:41:09] [fstalign] 8 9 24/ooh 24/ooh 0.018062603
[+++] [10:41:09] [fstalign] 8 9 54/right 54/right 0.012383968
[+++] [10:41:09] [fstalign] 8 9 62/most 62/most 0.007653652
[+++] [10:41:09] [fstalign] 9 10 10/okay 10/okay 1.0
[+++] [10:41:09] [fstalign] 10 11 11/ah 11/ah 0.7825733
[+++] [10:41:09] [fstalign] 10 11 35/or 35/or 0.10785312
[+++] [10:41:09] [fstalign] 10 11 63/uh 63/uh 0.09928422
[+++] [10:41:09] [fstalign] 10 11 13/a 13/a 0.010289361
[+++] [10:41:09] [fstalign] 11 12 12/just 12/just 1.0
[+++] [10:41:09] [fstalign] 12 13 13/a 13/a 1.0
[+++] [10:41:09] [fstalign] 13 14 14/couple 14/couple 1.0
[+++] [10:41:09] [fstalign] 14 15 15/of 15/of 1.0
[+++] [10:41:09] [fstalign] 15 16 16/minutes 16/minutes 1.0
[+++] [10:41:09] [fstalign] 16 17 1/i'm 1/i'm 0.9930773
[+++] [10:41:09] [fstalign] 16 17 35/or 35/or 0.006035227
[+++] [10:41:09] [fstalign] 16 17 64/um 64/um 0.0008874871
[+++] [10:41:09] [fstalign] 17 18 17/on 17/on 0.99649245
[+++] [10:41:09] [fstalign] 17 18 65/more 65/more 0.003507566
[+++] [10:41:09] [fstalign] 18 19 18/my 18/my 1.0
[+++] [10:41:09] [fstalign] 19 20 19/way 19/way 1.0
[+++] [10:41:09] [fstalign] 20 21 20/into 20/into 0.8747955
[+++] [10:41:09] [fstalign] 20 21 66/enjoy 66/enjoy 0.08273692
[+++] [10:41:09] [fstalign] 21 22 13/a 13/a 0.879016
[+++] [10:41:09] [fstalign] 21 22 39/our 39/our 0.052434582
[+++] [10:41:09] [fstalign] 21 22 66/enjoy 66/enjoy 0.029785942
[+++] [10:41:09] [fstalign] 21 22 67/enjoyed 67/enjoyed 0.0126816565
[+++] [10:41:09] [fstalign] 21 22 68/er 68/er 0.00951749
[+++] [10:41:09] [fstalign] 21 22 69/your 69/your 0.007649917
[+++] [10:41:09] [fstalign] 21 22 35/or 35/or 0.0052877315
[+++] [10:41:09] [fstalign] 21 22 70/we're 70/we're 0.0021538541
[+++] [10:41:09] [fstalign] 21 22 71/her 71/her 0.0014728603
[+++] [10:41:09] [fstalign] 22 23 21/doctor's 21/doctor's 0.9477601
[+++] [10:41:09] [fstalign] 22 23 72/doctors 72/doctors 0.052239873
[+++] [10:41:09] [fstalign] 23 24 22/appointment 22/appointment 1.0
[+++] [10:41:09] [fstalign] 24 25 10/okay 10/okay 0.9680188
[+++] [10:41:09] [fstalign] 24 25 54/right 54/right 0.02020042
[+++] [10:41:09] [fstalign] 24 25 56/sorry 56/sorry 0.011780825
[+++] [10:41:09] [fstalign] 25 26 10/okay 10/okay 1.0
[+++] [10:41:09] [walker] starting a walk in the park
[+++] [10:41:09] [walker] we have 0 candidates after 27 loops
[+++] [10:41:09] [fstalign] done walking the graph
terminate called after throwing an instance of 'std::runtime_error'
what(): no alignment produced
Aborted (core dumped)
So the problem still persists - with correct fst, and easily alignable transcript by hand, the graph walker fails to align the transcripts.
Changing composition approach to standard fixed the issue I think
Hi, we don't support FST input yet for the composition we made default in https://github.com/revdotcom/fstalign/releases/tag/1.2.0, so you would have to use the standard composition approach.
[+++] [10:51:17] [fstalign] converting hyp to int vector
[+++] [10:51:17] [FstFileLoader] convertToIntVector isn't implemented for FST inputs
```.
We will make this more clear in our docs, thank you for the detailed issue!
Thank you, happy I could perhaps help someone looking for this in the future! Your library is amazing!