Wangyou Zhang issues

Results 10 issues of


                                            Wangyou Zhang

Add support for spatial feature extraction on multi-microphone data

### 🚀 The feature In addition to the readily available spectral features (https://pytorch.org/audio/stable/transforms.html#feature-extractions), I would like to propose a request for extracting spatial features from the multi-microphone (multi-channel) speech data....

contributions welcome

New interface for MVDR beamforming

### 🚀 The feature It would be very helpful to provide the following interface for the beamforming module ([torchaudio.transforms.MVDR](https://pytorch.org/audio/master/transforms.html#torchaudio.transforms.MVDR.forward)): ```python forward(specgram: torch.Tensor, psd_s: torch.Tensor, psd_n: torch.Tensor) → torch.Tensor ``` and...

Bugs in the simulation code

I notice that the simulation code is not compatible with the current [pyrirgen](https://github.com/phecda-xu/RIR-Generator). > In [tencent_challenge_rirgenerator.py#L75](https://github.com/ConferencingSpeech/ConferencingSpeech2021/blob/master/simulation/tencent_challenge_rirgenerator.py#L75), it calls the function `pyrirgen.generateRir`, but this API has been refactored to `pyrirgen.rir_generator` since...

Missing data in Audioset

Hello, I was trying to run the simulation with the given selected_list, but I found some of the IDs for Audioset is not accessible now. Below I list part of...

Use torchaudio functions for beamforming related operations in torch 1.12.1+

This PR deprecates the ComlpexTensor-based operations for torch 1.12.1+. Instead, the builtin complex tensors in PyTorch will be used, and [the beamforming-related functions provided in torchaudio](https://pytorch.org/audio/0.12.1/transforms.html#multi-channel) are used. CC. @nateanl

ESPnet2

FFT frequency bins obtained by `torch.linsapce` in `torchaudio.functional.melscale_fbanks`

### 🐛 Describe the bug I notice that in https://github.com/pytorch/audio/blob/main/src/torchaudio/functional/functional.py#L561-L568, ```python if norm is not None and norm != "slaney": raise ValueError('norm must be one of None or "slaney"') #...

Add TF-GridNet and TD-SpeakerBeam to table.csv

[Question] Did you evaluate the performance gain of using the improved Transformer layer instead of the standard Transformer layer?

Hi, I am curious about the importance of the proposed improved Transformer layer compared to the standard one (w/o the positional encoding). But I couldn't find the related information [in...

Add implementations of USES2 speech enhancement models

## What? This PR adds the implementations of the USES2-Swin and USES2-Comp speech enhancement models proposed in the ICASSP 2024 paper "[Improving Design of Input Condition Invariant Speech Enhancement](https://arxiv.org/abs/2401.14271)". ##...

Recipe

Installation

ESPnet2

README

Update of SE functions

## What? This PR updates several speech enhancement related functions in ESPnet: * Update BSRNN to support speech separation (num_spk > 1) * Add an argument `--output_format` to specify the...

Recipe

ESPnet2