Add Positional Encoding in Conformer Implementation #3887

Deep-unlearning · 2025-02-25T15:29:40Z

Missing Relative Positional Encoding in Conformer Implementation

Issue Description

The current Conformer implementation in Torchaudio is missing the relative sinusoidal positional encoding scheme that is a key component of the original Conformer architecture as described in the paper "Conformer: Convolution-augmented Transformer for Speech Recognition".

Details

In the original paper, section 2.1 "Multi-Headed Self-Attention Module" specifically states:

"We employ multi-headed self-attention (MHSA) while integrating an important technique from Transformer-XL [20], the relative sinusoidal positional encoding scheme. The relative positional encoding allows the self-attention module to generalize better on different input length and the resulting encoder is more robust to the variance of the utterance length."

However, the current implementation in conformer.py uses standard PyTorch MultiheadAttention without implementing the relative positional encoding:

self.self_attn = torch.nn.MultiheadAttention(input_dim, num_attention_heads, dropout=dropout)

Reference Implementation

For reference, NVIDIA's NeMo library does properly implement the positional encoding in their Conformer implementation: https://github.com/NVIDIA/NeMo/blob/main/nemo/collections/asr/modules/conformer_encoder.py

The text was updated successfully, but these errors were encountered:

Dannynis · 2025-02-27T22:24:32Z

+1
I was about to open an issue about it as well a moment before i came across this one...
I don't see how would it work in any ASR scenario where temporal information should be preserved without the positional encoding.
Also note that it was already asked here:
https://discuss.pytorch.org/t/conformer-has-no-positional-encoding/207137

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Positional Encoding in Conformer Implementation #3887

Add Positional Encoding in Conformer Implementation #3887

Deep-unlearning commented Feb 25, 2025

Dannynis commented Feb 27, 2025

Add Positional Encoding in Conformer Implementation #3887

Add Positional Encoding in Conformer Implementation #3887

Comments

Deep-unlearning commented Feb 25, 2025

Missing Relative Positional Encoding in Conformer Implementation

Issue Description

Details

Reference Implementation

Dannynis commented Feb 27, 2025