You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In the original paper, section 2.1 "Multi-Headed Self-Attention Module" specifically states:
"We employ multi-headed self-attention (MHSA) while integrating an important technique from Transformer-XL [20], the relative sinusoidal positional encoding scheme. The relative positional encoding allows the self-attention module to generalize better on different input length and the resulting encoder is more robust to the variance of the utterance length."
However, the current implementation in conformer.py uses standard PyTorch MultiheadAttention without implementing the relative positional encoding:
+1
I was about to open an issue about it as well a moment before i came across this one...
I don't see how would it work in any ASR scenario where temporal information should be preserved without the positional encoding.
Also note that it was already asked here: https://discuss.pytorch.org/t/conformer-has-no-positional-encoding/207137
Missing Relative Positional Encoding in Conformer Implementation
Issue Description
The current Conformer implementation in Torchaudio is missing the relative sinusoidal positional encoding scheme that is a key component of the original Conformer architecture as described in the paper "Conformer: Convolution-augmented Transformer for Speech Recognition".
Details
In the original paper, section 2.1 "Multi-Headed Self-Attention Module" specifically states:
However, the current implementation in
conformer.py
uses standard PyTorchMultiheadAttention
without implementing the relative positional encoding:Reference Implementation
For reference, NVIDIA's NeMo library does properly implement the positional encoding in their Conformer implementation: https://github.com/NVIDIA/NeMo/blob/main/nemo/collections/asr/modules/conformer_encoder.py
The text was updated successfully, but these errors were encountered: