wyh2000

Results 9 issues of wyh2000

## What? We add voice conversion task in espnet2, and support both parallel data and non-parallel data. ## Why? To support more voice conversion models and datasets ## See also...

New Features
ESPnet2
README
VC

Hi, thanks for sharing this nice work. Could you share some example code for how to reconstruct images by DiffAE when only z_{sem} is encoded from original images but x_T...

We add a HifiGAN based vocoder which can decode hubert tokens to waveform. It supports LJSpeech now. It is aligned with configurations in discrete TTS from [ESPnet](https://github.com/espnet/espnet/pull/5626).

## What? Add evaluation scripts for long speech ASR. ## Why? For long speech (e.g. longer than 20 seconds), we should first split it into shorter segments and then evaluate...

Stale
ESPnet2
conflicts

## What? Adding recipe for training SpeechComposer on voice conversion and speech enhancement. ## Why? Allow training for language models on new tasks and adding recipe. ## See also This...

Stale
ESPnet2
README
conflicts

## What? This is an implementation of EVA: Robust Audiovisual Speech Recognition Models with Mixture-of-Experts. It supports audiovisual ASR for unconstrained videos. EVA implementation is based on OWSM v3.1, with...

ESPnet2
README

We want to develop the ESPnet2 for the voice conversion task. We aim to develop voice conversion systems which support both parallel data and un-parallel data. # Progress & Plan...

Roadmap
New Features
VC

Hi, I'd like to know if DPOTrainer supports more than one rejected sample. As the original DPO paper mentioned, they can support Plackett-Luce model for a set of possible responses.

enhancement

Hi, Thanks for your great job! Do you have any plan to release the training code of Video-Salmonn?