EfficientAT icon indicating copy to clipboard operation
EfficientAT copied to clipboard

Feat: Frame-level Extraction and PyTorch API Updates

Open TioSisai opened this issue 7 months ago • 0 comments

This pull request introduces two main sets of changes: a new feature for frame-level embedding extraction and several updates to ensure compatibility with modern PyTorch versions by replacing deprecated APIs.


New Features:

Frame-level Feature Extraction:

  • Added a frame: bool parameter to the forward methods in both MobileNet (MN) and DyMN models.
  • When frame=True, the model preserves the temporal dimension during the final pooling stage, allowing for the extraction of frame-wise embeddings.
  • This enables more fine-grained temporal analysis, while maintaining backward compatibility with the default clip-level feature extraction.

Fixes & Maintenance:

PyTorch API Modernization:

  • Replaced the deprecated ConvNormActivation with the current Conv2dNormActivation.
  • Updated torch.stft to use return_complex=True and calculated the power magnitude with torch.square(torch.abs(x)) to align with modern complex tensor handling.
  • Replaced torch.cuda.amp.autocast with the more general torch.amp.autocast.

TioSisai avatar Jul 15 '25 14:07 TioSisai