Skip to content

Feat: Frame-level Extraction and PyTorch API Updates #41

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

TioSisai
Copy link

This pull request introduces two main sets of changes: a new feature for frame-level embedding extraction and several updates to ensure compatibility with modern PyTorch versions by replacing deprecated APIs.


New Features:

Frame-level Feature Extraction:

  • Added a frame: bool parameter to the forward methods in both MobileNet (MN) and DyMN models.
  • When frame=True, the model preserves the temporal dimension during the final pooling stage, allowing for the extraction of frame-wise embeddings.
  • This enables more fine-grained temporal analysis, while maintaining backward compatibility with the default clip-level feature extraction.

Fixes & Maintenance:

PyTorch API Modernization:

  • Replaced the deprecated ConvNormActivation with the current Conv2dNormActivation.
  • Updated torch.stft to use return_complex=True and calculated the power magnitude with torch.square(torch.abs(x)) to align with modern complex tensor handling.
  • Replaced torch.cuda.amp.autocast with the more general torch.amp.autocast.

TioSisai added 2 commits July 15, 2025 16:14
- Replace closely deprecated ConvNormActivation with Conv2dNormActivation
- Update torch.stft to use return_complex=True for complex tensor handling and torch.square(torch.abs(x)) for power magnitude computation from complex-valued spectrogram
- Replace torch.cuda.amp.autocast with torch.amp.autocast for better device compatibility

These changes ensure compatibility with newer PyTorch versions while maintaining
backward compatibility and fixing deprecation warnings.
…odels

- Add 'frame' parameter to forward methods in MN and DyMN classes
- Modify _clf_forward and _forward_impl methods to support frame-level feature extraction
- Update adaptive pooling logic to preserve temporal dimension when frame=True
- Maintain backward compatibility with existing clip-level feature extraction
- Enable frame-wise embeddings output alongside classification results

This enhancement allows models to extract features at frame level (preserving temporal dimension)
in addition to the existing clip-level aggregation, enabling more fine-grained temporal analysis.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant