EfficientAT
EfficientAT copied to clipboard
Feat: Frame-level Extraction and PyTorch API Updates
This pull request introduces two main sets of changes: a new feature for frame-level embedding extraction and several updates to ensure compatibility with modern PyTorch versions by replacing deprecated APIs.
New Features:
Frame-level Feature Extraction:
- Added a frame: bool parameter to the forward methods in both MobileNet (MN) and DyMN models.
- When frame=True, the model preserves the temporal dimension during the final pooling stage, allowing for the extraction of frame-wise embeddings.
- This enables more fine-grained temporal analysis, while maintaining backward compatibility with the default clip-level feature extraction.
Fixes & Maintenance:
PyTorch API Modernization:
- Replaced the deprecated ConvNormActivation with the current Conv2dNormActivation.
- Updated torch.stft to use return_complex=True and calculated the power magnitude with torch.square(torch.abs(x)) to align with modern complex tensor handling.
- Replaced torch.cuda.amp.autocast with the more general torch.amp.autocast.