audio icon indicating copy to clipboard operation
audio copied to clipboard

Random cropping for variable length sequences

Open ATriantafyllopoulos opened this issue 2 years ago • 4 comments

🚀 The feature

I am proposing to add a torch.nn.Module transform that automatically crops/pads signals (with different options for padding such as constant/mirroring). I have the implementation already local so I would push it myself if this is alright.

The interface would like as follows:

class RandomCrop(torch.nn.Module):
      def __init__(
             self,
             output_size,  # number of samples to be enforced on output  signal
             axis=-1,  # axis over which to crop
             pad="silence",  # a string controlling the behavior of padding (constant vs reflection)
      )
      def forward(self, signal):  # signal of arbitrary size
             signal = ...
             return signal  # signal now has a fixed size of  `output_size` at `axis`

I am looking for feedback to see if this is also needed/desired by others and whether I should open a PR to add it.

Motivation, pitch

This feature is needed for datasets with variable lengths (a common occurrence for audio). By default, this mismatch in lengths now needs to be handled in the collate function of the dataloader.

With the proposed transform, the user can add it directly to their transform pipeline and/or make it part of their model if they so wish. Moreover, they could simply utilize it in their collate_fn if they want to crop based on the particular batch statistics (e.g. crop/pad to the shortest/longest sample in the batch).

Alternatives

No response

Additional context

A reference implementation and interface can be seen here. As it is implemented with numpy, I would update to torch.

ATriantafyllopoulos avatar Nov 17 '23 10:11 ATriantafyllopoulos

Hi

Thanks for the proposal, however, this project no longer has an active maintainer.

mthrok avatar Dec 01 '23 01:12 mthrok

Hi

Thanks for the proposal, however, this project no longer has an active maintainer.

Wow, I missed that. Was this discussed somewhere (discord/forum)? And is there a way to kickstart this project again or was it migrated anywhere else (e.g. to pytorch core?)

ATriantafyllopoulos avatar Dec 01 '23 10:12 ATriantafyllopoulos

It was not planned or discussed, but it just happened. Sorry.

mthrok avatar Dec 01 '23 17:12 mthrok

Very sad that torchaudio is no longer actively maintained. Many pieces are useful, well-designed, and most importantly, highly performant (e.g. STFT). The audio land doesn't have many good libraries like its vision counterpart :(.

@ATriantafyllopoulos I also keep rolling my own random audio/spectrogram crop across my audio projects. Would be nice if torchaudio has it.

gau-nernst avatar May 23 '24 06:05 gau-nernst