espnet TCPGen implementation

This is the implementation of the tree-constrained pointer generator for contextual biasing in ESPNet2 using LibriSpeech train-clean-100 data.

Jun 26 '23 22:06 BriansIDP

This is a very cool function, so we can add more detailed documents. Maybe, we can add a usage in egs2/librispeech_100/asr1_biasing/README.md

We should also add your paper and ask people to cite it ;)

Jun 27 '23 15:06 sw005320

Codecov Report

Attention: Patch coverage is 0% with 620 lines in your changes missing coverage. Please review.

Project coverage is 24.05%. Comparing base (b1335e7) to head (dd43019). Report is 1601 commits behind head on master.

Files	Patch %	Lines
espnet2/asr/tcpgen_biasing_espnet_model.py	0.00%	248 Missing :warning:
espnet2/asr/transducer/beam_search_transducer.py	0.00%	148 Missing :warning:
espnet2/asr/layers/gnn.py	0.00%	103 Missing :warning:
espnet2/text/Butils.py	0.00%	69 Missing :warning:
espnet2/bin/asr_inference.py	0.00%	29 Missing :warning:
espnet2/asr_transducer/joint_network.py	0.00%	11 Missing :warning:
espnet2/train/trainer.py	0.00%	6 Missing :warning:
espnet2/tasks/asr.py	0.00%	4 Missing :warning:
espnet2/asr_transducer/decoder/rnn_decoder.py	0.00%	1 Missing :warning:
espnet2/tasks/abs_task.py	0.00%	1 Missing :warning:

:exclamation: There is a different number of reports uploaded between BASE (b1335e7) and HEAD (dd43019). Click for more details.

HEAD has 10 uploads less than BASE

Flag BASE (b1335e7) HEAD (dd43019)

test_python_espnet2 4 0

test_utils 4 0

test_configuration_espnet2 2 0

Flag	BASE (b1335e7)	HEAD (dd43019)
test_python_espnet2	4	0
test_utils	4	0
test_configuration_espnet2	2	0

Additional details and impacted files

@@             Coverage Diff             @@
##           master    #5261       +/-   ##
===========================================
- Coverage   70.92%   24.05%   -46.87%     
===========================================
  Files         709      714        +5     
  Lines       65361    66369     +1008     
===========================================
- Hits        46355    15967    -30388     
- Misses      19006    50402    +31396

Flag	Coverage Δ
test_configuration_espnet2	`?`
test_integration_espnet1	`65.67% <ø> (ø)`
test_python_espnet1	`18.85% <0.00%> (-0.30%)`	:arrow_down:
test_python_espnet2	`?`
test_utils	`?`

Flags with carried forward coverage won't be shown. Click here to find out more.

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

Jul 07 '23 01:07 codecov[bot]

Hi @sw005320 @simpleoier @emonosuke , I think I have addressed most of the comments except for the unit-test. I will upload the model to huggingface now and will go through your comments again this week. Thank you!

Jul 10 '23 00:07 BriansIDP

Hi @sw005320 @simpleoier @emonosuke , I think I have addressed most of the comments except for the unit-test. I will upload the model to huggingface now and will go through your comments again this week. Thank you!

I have uploaded the model now named tcpgen_biasing under espnet. As this is my first time pushing a model to huggingface, I am not sure how I could verify "using the inference demo". Any help would be appreciated! Thank you!

Jul 10 '23 23:07 BriansIDP

@BriansIDP You need to follow stage 14 & 16 in asr.sh.

Jul 11 '23 00:07 simpleoier

Thanks for the pointer. I have done following the instructions in stage 14 & 16 and the model now is at: tcpgen_biasing.

Jul 11 '23 00:07 BriansIDP

@BriansIDP Another suggestion: it may be better to update the huggingface repo name. If you notice other pretrained checkpoints, you can at least find the name follows the pattern: contributer_dataset_expname.

Jul 11 '23 01:07 simpleoier

@BriansIDP Another suggestion: it may be better to update the huggingface repo name. If you notice other pretrained checkpoints, you can at least find the name follows the pattern: contributer_dataset_expname.

Thanks. I got that done now

Jul 11 '23 13:07 BriansIDP

This pull request is now in conflict :(

Jul 20 '23 16:07 mergify[bot]

Another comment is that the file list is too large to be maintained as a text file in Git. We may change it to the compressed (zip) file, but the current way is fine if this is complicated.

Jul 21 '23 13:07 sw005320

Another comment is that the file list is too large to be maintained as a text file in Git. We may change it to the compressed (zip) file, but the current way is fine if this is complicated.

Hi Shinji. Thanks for the comments. I am trying to address those comments you highlighted. I have already uploaded my model to Huggingface. The only concern is that making biasing a new model class would influence my uploaded model. I can do the training etc again and upload a new model, but that might take a bit long since I am not in the UK and have limited access to the computing resources there. Thank you very much!

Jul 21 '23 14:07 BriansIDP

We could upload a model later. A valid source is more important.

Jul 21 '23 16:07 sw005320

Hi Shinji @sw005320 . Sorry for the late reply! I have modified the model file and the beam search so that they are now separate files. I have also trained a model which yielded similar results. I will upload the model soon but the code review could continue on the newly added files. Thank you!

Aug 13 '23 09:08 BriansIDP

Thanks! There are several conflicts. So, can you fix them? Then, we'll have another round of reviews.

I really want to merge this PR. So, let's work together to refine it.

Aug 14 '23 11:08 sw005320

Thanks! There are several conflicts. So, can you fix them? Then, we'll have another round of reviews.

I really want to merge this PR. So, let's work together to refine it.

Thank you. I have now resolved the conflicts.

Aug 15 '23 00:08 BriansIDP

This pull request is now in conflict :(

Oct 25 '23 11:10 mergify[bot]

Thanks! There are several conflicts. So, can you fix them? Then, we'll have another round of reviews.

I really want to merge this PR. So, let's work together to refine it.

Hi Shinji. I wonder if there will be another round of review and what the further steps for this could be. It would be greatly appreciated if this could be integrated into ESPNet. Thank you!

Oct 29 '23 08:10 BriansIDP

Please fix the CI error https://github.com/espnet/espnet/actions/runs/6710579965/job/18236089339?pr=5261#step:8:259

Oct 31 '23 23:10 sw005320

This PR requires a unit test. Please check https://github.com/espnet/espnet/tree/master/test/espnet2. You can find a lot of examples.

Oct 31 '23 23:10 sw005320

Hi Shinji @sw005320. I have written some unittest for my newly added code, but they failed because "warp-rnnt" is not installed. I noticed that the current version of ESPNet does not support this library. I wonder what the reason is, and what I should do. Thank you!

Nov 03 '23 19:11 BriansIDP

We are using https://github.com/espnet/espnet/blob/master/tools/installers/install_warp-transducer.sh#L29, which will be installed in our normal CI configuration in https://github.com/espnet/espnet/blob/master/ci/install.sh#L24 Can. you double-check it?

Nov 05 '23 20:11 sw005320

We are using https://github.com/espnet/espnet/blob/master/tools/installers/install_warp-transducer.sh#L29, which will be installed in our normal CI configuration in https://github.com/espnet/espnet/blob/master/ci/install.sh#L24 Can. you double-check it?

Hi Shinji. Yes, I understand that warp-transducer is used. However, warp-transducer does not support log probabilities as input whereas the output of TCPGen performs distribution-level interpolation and hence requires log probabilities to be sent to the transducer loss. Therefore I used warp-rnnt instead. I remember that in the older versions of ESPNet, warp-rnnt was supported. I wonder if we could make use of that installer. Thank you!

Nov 05 '23 20:11 BriansIDP

I see. We removed warp-rnnt in https://github.com/espnet/espnet/pull/3217. @b-flo, do you think we can update the interface to support it in warp-transducer?

Nov 05 '23 20:11 sw005320

Hi,

@b-flo, do you think we can update the interface to support it in warp-transducer?

Sure, let me take a look later this week. For the GPU version the log softmax is computed inside the implementation, I could add an option to make it optional I guess.

That being said, it seems the implementation is for both ASRTask with transducer mode and ASRTransducerTask, is it intended? For the later, k2 transducer loss can be used instead (with the proper k2_pruned_loss_args)

Nov 06 '23 08:11 b-flo

Hi,

@b-flo, do you think we can update the interface to support it in warp-transducer?

Sure, let me take a look later this week. For the GPU version the log softmax is computed inside the implementation, I could add an option to make it optional I guess.

That being said, it seems the implementation is for both ASRTask with transducer mode and ASRTransducerTask, is it intended? For the later, k2 transducer loss can be used instead (with the proper k2_pruned_loss_args)

Hi Florian. Thank you for the reply! Actually the implementation currently is only for ASRTask with transducer model.

Nov 06 '23 09:11 BriansIDP

Actually the implementation currently is only for ASRTask with transducer model.

Got it sorry, the decoder and joint network are shared across version. I thought I made the task/models fully separated in the past. Hum, we should maybe define another JointNetwork for ASRTask to avoid dead codes for each version. Let me check alongside the other problem.

Nov 06 '23 10:11 b-flo

Actually the implementation currently is only for ASRTask with transducer model.

Got it sorry, the decoder and joint network are shared across version. I thought I made the task/models fully separated in the past. Hum, we should maybe define another JointNetwork for ASRTask to avoid dead codes for each version. Let me check alongside the other problem.

Hi @b-flo, I wonder if there is any update on this problem. Thank you!

Nov 10 '23 17:11 BriansIDP

This pull request is now in conflict :(

Feb 03 '24 13:02 mergify[bot]

This pull request is now in conflict :(

Feb 06 '24 02:02 mergify[bot]