NN Speaker ID (NNID) is a speaker identification/verification model based on recurrent neural networks (RNN).
nnid/ # root
evb/ # for evb deployment
build/ # bin files
includes/ # required inlcudes
libs/ # required libs
make/ # make.mk
pack/
src/ # c source codes
Makfile
autogen.mk
ns-nnsp/ # c codes to build nnsp library (used only when re-building library)
python/ # for NN training
README.md # this readme
To work on Apollo4, you need
- Arm GNU Toolchain 11.3
- Segger J-Link v7.56+
This speaker identification model is based on 16kHz sampling rate. The model size is about 110kB. Also, there is one extra VAD (voice activity detection) model to extract the voice data. The extracted data is then sent to the speaker identification mode for verfication.
The NNID model is trained based on several audio dataset, including human speech and noises. Before you use this repo, please read on their license agreements carefully in here.
From the nnid/evb/
directory:
make clean
make
make deploy
Prepare two USB cables. Ensure your board is connected via both theJLINK USB port
and theaudio USB port
. Then turn on the power on EVB.- Plug a mic into the 3.5mm port, and push BTN0 to initiate voice recording
make view
will provide SWO output as the device is running.- On your cmd, type
You should see a GUI popping out. You might need to change the option
$ python ../python/tools/audioview_nnid.py --tty=/dev/tty.usbmodem1234561
--tty
depending on your OS. - On your GUI, press
record
to start recording. This will lead you to enter theenrollment phase
.- The GUI will show you have
0/4
utterances in enrollment as shown inFig. 1.1
. The0/4
means there will be total 4 utterances to be record, and there is 0 utterance recorded now. - You can start to say something. Try to make your utterance last around 2 seconds. If your speech is detected, the GUI will show you have
1/4
utterances in enrollment as shown inFig. 1.2
. This means you'd successfully enrolled the first utterance. Keep say something and repeat the process until all 4 utterances are enrolled. - After all 4 utterances are enrolled, GUI will show you are in the
testing phase
inFig. 1.3
. - In the
testing phase
, try to say something and try to make your utterance last around 2 seconds. If your voice is verified, GUI will showYes, verified
inFig. 1.4
. Conversely, if your voice is not verified, GUI will showNo, not verified
on the top of GUI. - You can repeat testing (try to say something again to see whether your voice is verified).
- If you want to stop the program, just press the
stop
button. Check the two recording files undernnid/evb/audio_result/
.audio_raw.wav
: the raw PCM data from your mic.audio_debug.wav
: the debug infomation.
- You can restart the program by pressing
record
button. You will enter theenrollment phase
again.
- The GUI will show you have
Fig. 1.1: GUI shows the enrollment phase, and `0` utterance is enrolled
Fig. 1.2: GUI shows the enrollment phase, and `1` utterance is enrolled. These are total 4 utterances to enroll.
Fig. 1.3: GUI shows the you are entering testing phase.
Fig. 1.4: in the testing phase, try to say something and try to make your utterance last around 2 seconds. If your voice is verified, GUI will show `Yes, verified`.
Our approach to training the model can be found in README.md. The trained model is saved in evb/src/def_nn4_nnid.c and evb/src/def_nn4_nnid.h.
Library neuralspot NNSP, ns-nnsp.a
, is a C library to build a pipeline including feature extraction and neural network to run on Apollo4. The source code is under the folder ns-nnsp/
. You can modify or rebuild it via NeuralSPOT Ambiq's AI Enablement Library.
In brief, there are two basic building blocks inside ns-nnsp.a
, feature extraction and neural network. In ns-nnsp.a
, we call them FeatureClass
defined in feature_module.h
and NeuralNetClass
in neural_nets.h
, respectively. Furthermore, NNSPClass
in nn_speech.h
encapsulates them to form a concrete instance.
We illustrate this in Fig. 2.
Fig. 2: Illustration of `ns-nnsp`
Also, in our specific s2i NN case, def_nn0_s2i.c
has two purposes:
- For feature extraction, we use Mel spectrogram with 40 Mel-scale. To apply the standarization to the features in training dataset, it requires statistical mean and standard deviation, which is defined in
def_nn0_s2i.c
. - For the neural network, it points to the trained weight table defined in
def_nn0_s2i.c
as well.
If you want to modify or re-build the ns-nnsp.a
library, you can follow the steps here.
- Download NeuralSPOT
$ git clone https://github.com/AmbiqAI/neuralSPOT.git ../neuralSPOT
- Copy the source code of NS-NNSP to NeuralSPOT. Then go to NeuralSPOT folder.
$ cp -a ns-nnsp ../neuralSPOT/neuralspot; cd ../neuralSPOT
- Open
neuralSPOT/Makefile
and append thens-nnsp
to the library modules as below
# NeuralSPOT Library Modules
modules := neuralspot/ns-harness
modules += neuralspot/ns-peripherals
modules += neuralspot/ns-ipc
modules += neuralspot/ns-audio
modules += neuralspot/ns-usb
modules += neuralspot/ns-utils
modules += neuralspot/ns-rpc
modules += neuralspot/ns-i2c
modules += neuralspot/ns-nnsp # <---add this line
# External Component Modules
modules += extern/AmbiqSuite/$(AS_VERSION)
modules += extern/tensorflow/$(TF_VERSION)
modules += extern/SEGGER_RTT/$(SR_VERSION)
modules += extern/erpc/$(ERPC_VERSION)
- Compile
$ make clean; make; make nestall
- Copy the necessary folders back to
nnid
folder
$ cd nest; cp -a pack includes libs ../nnid/evb