NeMo icon indicating copy to clipboard operation
NeMo copied to clipboard

[WIP] TTS tokenizers moved to collections.common.tokenizers

Open AlexGrinch opened this issue 3 years ago • 3 comments

What does this PR do ?

Move TTS tokenizers to collections.common.tokenizers so they can be used in cross-collections pipelines.

For implementing ASR/ST model training which generates samples on-the-fly with TTS, we need to use TTS tokenizers. Moving them to collections.common.tokenizers allows to write a wrapper which allows to use them directly in ASR and NLP datasets. New location also seems more logical as:

  1. TTS tokenizers are the only tokenizers located outside of collections.common.tokenizers
  2. TTS tokenizers are currently located in collections.tts.torch but the only torch function they use is torch.distributed

Ideally, I would recommend refactoring them around TokenizerSpec in the future to have unified implementation of all tokenizers.

Collection: [Note which collection this PR will affect] collections.tts and collections.common.tokenizers

Changelog

  • Add specific line by line info of high level changes in this PR.

Usage

  • You can potentially add a usage example below
# Add a code snippet demonstrating how to use this 

Before your PR is "Ready for review"

Pre checks:

  • [ ] Make sure you read and followed Contributor guidelines
  • [ ] Did you write any new necessary tests?
  • [ ] Did you add or update any necessary documentation?
  • [ ] Does the PR affect components that are optional to install? (Ex: Numba, Pynini, Apex etc)
    • [ ] Reviewer: Does the PR have correct import guards for all optional libraries?

PR Type:

  • [ ] New Feature
  • [ ] Bugfix
  • [ ] Documentation

If you haven't finished some of the above items you can still open "Draft" PR.

Who can review?

Anyone in the community is free to review the PR once the checks have passed. Contributor guidelines contains specific people who can review PRs to various areas.

Additional Information

  • Related to # (issue)

AlexGrinch avatar Aug 05 '22 23:08 AlexGrinch

This pull request introduces 2 alerts when merging 59daff26b9ce351b137a894405a66c9fb4407134 into 987674e29ea90f9a2f663bf95d74bd947d76bbc0 - view on LGTM.com

new alerts:

  • 2 for Unused import

lgtm-com[bot] avatar Aug 08 '22 23:08 lgtm-com[bot]

This pull request introduces 2 alerts when merging 2f65e763e18fd260968bea2acc0219c0d1d830d7 into 987674e29ea90f9a2f663bf95d74bd947d76bbc0 - view on LGTM.com

new alerts:

  • 2 for Unused import

lgtm-com[bot] avatar Aug 08 '22 23:08 lgtm-com[bot]

This pull request introduces 2 alerts when merging 049b9fc9231463dd5fa93578ed9d8896436e4ce2 into f921ebe0436e55f7547b183ca83a623f6678422d - view on LGTM.com

new alerts:

  • 2 for Unused import

lgtm-com[bot] avatar Aug 09 '22 18:08 lgtm-com[bot]

This pull request introduces 2 alerts when merging d937b25db57bcafce40f8dd5214ab102df303250 into da9f4138137565fe048a3f99dac343dcfd40aee4 - view on LGTM.com

new alerts:

  • 2 for Unused import

lgtm-com[bot] avatar Aug 12 '22 19:08 lgtm-com[bot]

This pull request introduces 2 alerts when merging b2d947481d4303c68ce4cdd2b189a36a86b0e10c into 4bf54b715b1ba8832ec3beedcb6983acf55ff096 - view on LGTM.com

new alerts:

  • 2 for Unused import

lgtm-com[bot] avatar Aug 17 '22 20:08 lgtm-com[bot]

This pull request introduces 2 alerts when merging 5fc8ff2c59c24470a73da2ada1a991d9fc35d77f into 8845addc6562f2df3740c95ee496a26849b526d0 - view on LGTM.com

new alerts:

  • 2 for Unused import

lgtm-com[bot] avatar Aug 17 '22 21:08 lgtm-com[bot]

This pull request introduces 2 alerts when merging 676ec997bd6a7b26c021db9e5cb3acf34f3c2d45 into 8845addc6562f2df3740c95ee496a26849b526d0 - view on LGTM.com

new alerts:

  • 2 for Unused import

lgtm-com[bot] avatar Aug 18 '22 19:08 lgtm-com[bot]

This pull request introduces 2 alerts when merging 03b90e0f0bf7cf92548efea1aef766d5adfde5fb into 6abfbbfda654f44313068b950edb0f70b01449b1 - view on LGTM.com

new alerts:

  • 2 for Unused import

lgtm-com[bot] avatar Aug 18 '22 20:08 lgtm-com[bot]

This pull request introduces 2 alerts when merging 99a108a7aeebef8865775fe7ba2d78baf8459587 into 8e73224d9ee173226533061b28f5783d6705160c - view on LGTM.com

new alerts:

  • 2 for Unused import

lgtm-com[bot] avatar Aug 19 '22 00:08 lgtm-com[bot]

This pull request introduces 2 alerts when merging 1cc127ffb5b0738fe1fe54762be20cc4ee6af78f into 28524d6accb64f4ff4cf60fe3e86532d1e6f6738 - view on LGTM.com

new alerts:

  • 2 for Unused import

lgtm-com[bot] avatar Aug 19 '22 18:08 lgtm-com[bot]

This pull request introduces 2 alerts when merging 6990780067d2e8f3cf6a020736d0c83ca318eb48 into 6127d79dfa42ba8a2f0f70dab42372d20d353b33 - view on LGTM.com

new alerts:

  • 2 for Unused import

lgtm-com[bot] avatar Aug 19 '22 22:08 lgtm-com[bot]

This pull request introduces 2 alerts when merging f123e7b521e2437d59c8029dc0024f523a57be48 into 6127d79dfa42ba8a2f0f70dab42372d20d353b33 - view on LGTM.com

new alerts:

  • 2 for Unused import

lgtm-com[bot] avatar Aug 19 '22 23:08 lgtm-com[bot]

This pull request introduces 2 alerts when merging 58d14913ce7ebbaeefc44d498643a9e987fdd71b into 3ba267290927b1a2e78d82c7afbc538874c01bcc - view on LGTM.com

new alerts:

  • 2 for Unused import

lgtm-com[bot] avatar Aug 20 '22 15:08 lgtm-com[bot]

This pull request introduces 2 alerts when merging d622b59bd14a4ed8f9bc1cfe7fd7ac65f51647c8 into c0bfa6f07f766a3abd1804f5b666474887e0a1e4 - view on LGTM.com

new alerts:

  • 2 for Unused import

lgtm-com[bot] avatar Aug 23 '22 19:08 lgtm-com[bot]