Wangyou Zhang

Results 10 issues of Wangyou Zhang

### 🚀 The feature In addition to the readily available spectral features (https://pytorch.org/audio/stable/transforms.html#feature-extractions), I would like to propose a request for extracting spatial features from the multi-microphone (multi-channel) speech data....

contributions welcome

### 🚀 The feature It would be very helpful to provide the following interface for the beamforming module ([torchaudio.transforms.MVDR](https://pytorch.org/audio/master/transforms.html#torchaudio.transforms.MVDR.forward)): ```python forward(specgram: torch.Tensor, psd_s: torch.Tensor, psd_n: torch.Tensor) → torch.Tensor ``` and...

I notice that the simulation code is not compatible with the current [pyrirgen](https://github.com/phecda-xu/RIR-Generator). > In [tencent_challenge_rirgenerator.py#L75](https://github.com/ConferencingSpeech/ConferencingSpeech2021/blob/master/simulation/tencent_challenge_rirgenerator.py#L75), it calls the function `pyrirgen.generateRir`, but this API has been refactored to `pyrirgen.rir_generator` since...

Hello, I was trying to run the simulation with the given selected_list, but I found some of the IDs for Audioset is not accessible now. Below I list part of...

This PR deprecates the ComlpexTensor-based operations for torch 1.12.1+. Instead, the builtin complex tensors in PyTorch will be used, and [the beamforming-related functions provided in torchaudio](https://pytorch.org/audio/0.12.1/transforms.html#multi-channel) are used. CC. @nateanl

ESPnet2
SE

### 🐛 Describe the bug I notice that in https://github.com/pytorch/audio/blob/main/src/torchaudio/functional/functional.py#L561-L568, ```python if norm is not None and norm != "slaney": raise ValueError('norm must be one of None or "slaney"') #...

Hi, I am curious about the importance of the proposed improved Transformer layer compared to the standard one (w/o the positional encoding). But I couldn't find the related information [in...

## What? This PR adds the implementations of the USES2-Swin and USES2-Comp speech enhancement models proposed in the ICASSP 2024 paper "[Improving Design of Input Condition Invariant Speech Enhancement](https://arxiv.org/abs/2401.14271)". ##...

Recipe
Installation
ESPnet2
README
SE

## What? This PR updates several speech enhancement related functions in ESPnet: * Update BSRNN to support speech separation (num_spk > 1) * Add an argument `--output_format` to specify the...

Recipe
ESPnet2
SE