wyh2000 issues

Results 9 issues of


                                            wyh2000

Voice conversion task in Espnet2

## What? We add voice conversion task in espnet2, and support both parallel data and non-parallel data. ## Why? To support more voice conversion models and datasets ## See also...

New Features

ESPnet2

README

Reconstruct image with only z_{sem}, with x_T is sampled from N (0, I)

Hi, thanks for sharing this nice work. Could you share some example code for how to reconstruct images by DiffAE when only z_{sem} is encoded from original images but x_T...

add discrete vocoder

We add a HifiGAN based vocoder which can decode hubert tokens to waveform. It supports LJSpeech now. It is aligned with configurations in discrete TTS from [ESPnet](https://github.com/espnet/espnet/pull/5626).

add ASR evaluation for long speech

## What? Add evaluation scripts for long speech ASR. ## Why? For long speech (e.g. longer than 20 seconds), we should first split it into shorter segments and then evaluate...

Stale

ESPnet2

conflicts

PR SpeechComposer

## What? Adding recipe for training SpeechComposer on voice conversion and speech enhancement. ## Why? Allow training for language models on new tasks and adding recipe. ## See also This...

Stale

ESPnet2

README

conflicts

Implementation of EVA: Robust Audiovisual Speech Recognition Models with Mixture-of-Experts

## What? This is an implementation of EVA: Robust Audiovisual Speech Recognition Models with Mixture-of-Experts. It supports audiovisual ASR for unconstrained videos. EVA implementation is based on OWSM v3.1, with...

ESPnet2

README

Development plan for ESPnet2 voice conversion

We want to develop the ESPnet2 for the voice conversion task. We aim to develop voice conversion systems which support both parallel data and un-parallel data. # Progress & Plan...

Roadmap

New Features

Support more than one rejected responses

Hi, I'd like to know if DPOTrainer supports more than one rejected sample. As the original DPO paper mentioned, they can support Plackett-Luce model for a set of possible responses.

enhancement

Training code release

Hi, Thanks for your great job! Do you have any plan to release the training code of Video-Salmonn?