Adding fix to absolute embedding
What does this PR do ?
Fixes ASR transformer embedding behavior to allow adaption to strings larger than max_sequence_length.
Before, transformer module assumed that Absolute Positional Embeddings resized for cases where seq_length > max_sequence_length. This actually was not implemented and had no safeguards, leading to indexing errors in edge cases for Canary decoding.
Now automatically resizes and adds better logging to alert user.
Collection: [Note which collection this PR will affect] asr
Changelog
- Refactored
transformer_modules.FixedPositionalEncodingso the absolute positional information is a standalone method - Added check to
FixedPositionalEncodingto resize when position_ids tokens are longer thanmax_sequence_length - Added flags to alert user when seq_length > max_sequence_length iin TransformerEmbedding module
GitHub Actions CI
The Jenkins CI system has been replaced by GitHub Actions self-hosted runners.
The GitHub Actions CI will run automatically when the "Run CICD" label is added to the PR. To re-run CI remove and add the label again. To run CI on an untrusted fork, a NeMo user with write access must first click "Approve and run".
Before your PR is "Ready for review"
Pre checks:
- [y ] Make sure you read and followed Contributor guidelines
- [ n] Did you write any new necessary tests?
- [ n] Did you add or update any necessary documentation?
- [ n] Does the PR affect components that are optional to install? (Ex: Numba, Pynini, Apex etc)
- [ ] Reviewer: Does the PR have correct import guards for all optional libraries?
PR Type:
- [ y] New Feature
- [y ] Bugfix
- [ ] Documentation