This is the CMU course Speech Recognition Project
This repository contains the official implementation of Mamba in E-Branchformer, thanks to the code from https://github.com/Tonyyouyou/Mamba-in-Speech with paper Mamba in Speech: Towards an Alternative to Self-Attention. Also, thanks to the code framework from ESPnet
To build your ConBiMamba Model, you first need to install Mamba. The causal-conv1d component, which requires CUDA version 11.8 or higher, is essential for Mamba.
- Install the
causal-conv1d
package:pip install causal-conv1d>=1.2.0
- Install the
mamba-ssm
package:pip install mamba-ssm
The following files are used for implementing ASR tasks in ESPnet.
This file contains bidirectional outer mamba. should be placed in /mamba_ssm/modules(in your enviroment package of mamba)
e_branchformer_encoder_mamba.py (this is the file for E-branchformer encoder with self-attention replaced by mamba layer, in this version we only used outer_bimamba)
should be placed in espnet/espnet2/asr/encoder
e_branchformer_encoder_mamba_parallel.py (this is the file for E-branchformer encoder adding another bimamba branch, in this version we only used outer_bimamba)
should be placed in espnet/espnet2/asr/encoder
This is asr.py file, which should be in espnet/espnet2/tasks
This file is the example training config file, showing the required parameters for training Mamba based E-Branchformer Arch.
This file is the example training config file, showing the required parameters for training E-Branchformer Mamba parellel Arch.
Figure 1: External Bidirectional Mamba layer: Two common Mamba layers are applied to process the forward and backward input, ”Act“ means SiLU activation. Figure 2: E-Branchformer Mamba Parallel Structure.For detailed analysis and results, please check our final project report.