Skip to content

Enhance Audio Processing Pipeline with Parameterization and RMS Calculation #62

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

Carpediem324
Copy link

This pull request introduces two key improvements:

  1. streams.py Update:

    • Added channel and audio rate parameters to support flexible audio input configurations.
    • These changes resolve errors caused by mismatches between the user's microphone settings and the engine's requirements.
    • Implemented channel mixing and downsampling to convert the incoming audio to a single channel at 16,000 Hz.
  2. engine.py Update:

    • Enhanced the scoreFrame() function to compute the RMS value of the input audio frame.
    • The calculated RMS value is now included in the detection result, providing an additional diagnostic metric for audio signal intensity.

These updates improve the robustness and compatibility of the audio processing pipeline, ensuring that audio from various sources can be seamlessly integrated with the engine's processing requirements.

Calculate the RMS value of the input audio frame and include it in the scoreFrame() result.
Accept channel and rate parameters in streams.py, and convert the audio to mono 16000Hz to match engine requirements.
@TheSeriousProgrammer
Copy link
Member

Looks good will review it once I reach home

@TheSeriousProgrammer
Copy link
Member

My bad thought the resampling and channel averaging is taken care automatically by pyaudio, the mic with which I test worked seamlessly, after a quick search realized that it's because of the audio driver on its own supporting the same l, this may not be the case with all

@Carpediem324
Copy link
Author

While testing, I discovered that mismatched sample rates resulted in errors, so I updated the code accordingly. Furthermore, based on the idea that RMS helps differentiate keywords across multiple devices, I incorporated RMS into this revision. Thank you.

@TheSeriousProgrammer
Copy link
Member

Also can you give an example of how you use the rms value

@Carpediem324
Copy link
Author

Using RMS (Root Mean Square) in keyword detection provides two main benefits:

First, in a multi-device environment, RMS helps select which device should respond when multiple devices detect the same keyword simultaneously. The device with the highest RMS is typically closest to the user.

Second, even in single-device scenarios, RMS helps filter out false detections. If the RMS value of detected audio is too low, it likely indicates background noise or distant sounds, allowing developers to avoid unintended activations.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants