-
Notifications
You must be signed in to change notification settings - Fork 2.7k
feat: OPTIC-2123: Audio spectrograms #7400
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
feat: OPTIC-2123: Audio spectrograms #7400
Conversation
This commit adds spectrogram visualization capabilities to the audio editor through a new optional 'spectrogram' property in the AudioPlus component. Example usage: <AudioPlus name="audio" value="$audio" height="240" hotkey="space" defaultscale="1" defaultzoom="2" zoom="true" spectrogram="true" sync="group_a" /> Key changes: - Add new 'spectrogram' boolean property to AudioPlus component - Extract window functions into a dedicated WindowFunctions module - Create a new ColorMapper module for spectrogram coloring - Refactor Visualizer class to use the new modules - Add support for different window functions and color schemes - Improve type safety and code organization The spectrogram visualization allows users to: - Toggle spectrogram view using the 'spectrogram' property - View frequency content over time alongside waveform - Switch between different color schemes - Configure window functions for FFT analysis - Adjust visualization parameters (FFT size, dB range) Configuration: - spectrogram: boolean (optional) - When set to true, enables spectrogram visualization alongside the waveform Labels: audio, editor, feature, community:feature-request, community:reviewed Closes HumanSignal#384
Add spectrogram visualization capabilities to the audio editor component with configurable settings and improved UI controls. Key changes: - Extract window functions into separate WindowFunctions module for better code organization - Create new ColorMapper module for handling spectrogram color schemes - Add spectrogram property to AudioPlus component (optional boolean to enable/disable) - Implement FFT-based spectrogram rendering with configurable parameters - Add UI controls for spectrogram settings (FFT size, color scheme, dB range) - Fix CSS styling issues in the configuration modal - Improve section header positioning and spacing Features: - Real-time spectrogram visualization - Configurable FFT window size and type - Multiple color scheme options - Adjustable dB range for visualization - Mel-scale frequency mapping support - Responsive rendering with performance optimizations Labels: - audio - community:feature-request - community:reviewed - editor - feature Closes HumanSignal#384
👷 Deploy request for heartex-docs pending review.Visit the deploys page to approve it
|
👷 Deploy request for label-studio-docs-new-theme pending review.Visit the deploys page to approve it
|
✅ Deploy Preview for label-studio-storybook ready!
To edit notification comments on pull requests, go to your Netlify site configuration. |
To help visualize the new spectrogram functionality implemented in this PR (#7400), I've recorded a short video demonstration: Video Demonstration: Spectrogram Feature What the video shows: The video walks through the spectrogram feature within the Label Studio interface, highlighting:
Hope this provides a helpful overview of the user experience! |
Great PR! How well will it work with long audio files around 1-2 hours? |
Hey @makseq, TL; DR: Yes! it handles long files (1-2 hours) efficiently. The core strategies implemented are:
This approach balances performance, memory, and visual overview. As you zoom in, the detail naturally increases as fewer samples are represented per pixel. Separately, the chosen FFT window size affects the computation time per slice (larger FFTs = more detail but slower slice render). This characteristic is independent of total file length. For the most fluid feel, 512 is often a good balance. To demonstrate this with varied audio content, the video uses a 1-hour file created by concatenating samples from the ESC-50 dataset (https://github.com/karolpiczak/ESC-50). This dataset contains 2000 short environmental sound recordings across 50 categories (like dogs barking, rain, helicopters, etc.), ensuring the test file has diverse spectral characteristics. Video Demo: Spectrogram Performance & FFT Size Impact (1hr ESC-50 file) (Video shows loading/panning the long, varied file & the visible speed difference when switching FFT sizes). |
@cloudmark please rebase your branch on the latest changes from repo to include this commit 9b0487f. It will fix failing checks. |
|
Thank you @makseq for the heads up. I think internally they should resolve to the same component so there are no further updates needed (I believe). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we extract the logic introduced here into a separate set of component files, and hooks/utilities, so we don't end up with a 700+ line file in ConfigControl.tsx. We have cursor rules in this project that outline the best practices, aim for 1 component per file where possible, and extract hooks and utils similarly.
import { createPortal } from "react-dom"; | ||
import {Toggle, Tooltip} from "@humansignal/ui"; | ||
import {Block, Elem} from "../../../utils/bem"; | ||
import {Slider as AntSlider, Select} from "antd"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
import {Slider as AntSlider, Select} from "antd"; | |
import { Slider as AntSlider } from "antd"; |
I noticed additional usage of AntD components, we are actively removing our dependence on these in the project, so we can use a Select component available from our internal ui lib. See the above suggestion.
import {Slider as AntSlider, Select} from "antd"; | ||
import {Range} from "../../../common/Range/Range"; | ||
|
||
import {IconConfig, IconInfoConfig} from "@humansignal/ui"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
import {IconConfig, IconInfoConfig} from "@humansignal/ui"; | |
import { IconConfig, IconInfoConfig } from "@humansignal/icons"; |
import { Slider } from "./Slider"; | ||
import {type FC, type MouseEvent, useContext, useEffect, useMemo, useRef, useState} from "react"; | ||
import { createPortal } from "react-dom"; | ||
import {Toggle, Tooltip} from "@humansignal/ui"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
import {Toggle, Tooltip} from "@humansignal/ui"; | |
import { Select, Toggle, Tooltip } from "@humansignal/ui"; |
Hey @cloudmark 👋, can you please run the linter/formatter over your changes as well as rebase there are currently conflicts with a few files. To lint/fix, from the LabelStudio project root: |
|
||
// Update Windowing Function | ||
if (params.windowingFunction && params.windowingFunction !== this.spectrogramWindowingFunction) { | ||
console.log(`Visualizer: Updating Windowing Function to ${params.windowingFunction}`); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
console.log(`Visualizer: Updating Windowing Function to ${params.windowingFunction}`); |
|
||
// Update Color Scheme | ||
if (params.colorScheme && params.colorScheme !== this.spectrogramColorScheme) { | ||
console.log(`Visualizer: Updating Colors Scheme Function to ${params.colorScheme}`); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
console.log(`Visualizer: Updating Colors Scheme Function to ${params.colorScheme}`); |
// Update dB Range | ||
if ((params.minDb !== undefined && params.minDb !== this.spectrogramMinDb) || | ||
(params.maxDb !== undefined && params.maxDb !== this.spectrogramMaxDb)) { | ||
console.log(`Visualizer: Updating dB Range to ${params.minDb} - ${params.maxDb}`); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
console.log(`Visualizer: Updating dB Range to ${params.minDb} - ${params.maxDb}`); |
console.log('handleDbRangeChange received:', values); | ||
|
||
if (!Array.isArray(values) || values.length !== 2) { | ||
console.log('Invalid values array'); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
console.log('Invalid values array'); |
console.log('Current state before update:', { displayMinDb, displayMaxDb }); | ||
console.log('New values to set:', { newMinDb, newMaxDb }); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
console.log('Current state before update:', { displayMinDb, displayMaxDb }); | |
console.log('New values to set:', { newMinDb, newMaxDb }); |
|
||
// Basic validation | ||
if (isNaN(newMinDb) || isNaN(newMaxDb) || newMinDb >= newMaxDb) { | ||
console.log('Values invalid or crossed:', { newMinDb, newMaxDb }); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
console.log('Values invalid or crossed:', { newMinDb, newMaxDb }); |
if (lastUpdate && currentTime - lastUpdate.time < 100) { | ||
// If we're getting a quick update that would change max when we're moving min | ||
if (lastUpdate.min === newMinDb && lastUpdate.max !== newMaxDb && newMaxDb !== displayMaxDb) { | ||
console.log('Preventing unstable max update'); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
console.log('Preventing unstable max update'); |
} | ||
// If we're getting a quick update that would change min when we're moving max | ||
if (lastUpdate.max === newMaxDb && lastUpdate.min !== newMinDb && newMinDb !== displayMinDb) { | ||
console.log('Preventing unstable min update'); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
console.log('Preventing unstable min update'); |
// Update local state | ||
setDisplayMinDb(newMinDb); | ||
setDisplayMaxDb(newMaxDb); | ||
console.log('State updated to:', { newMinDb, newMaxDb }); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
console.log('State updated to:', { newMinDb, newMaxDb }); |
console.log('Range onChange raw value:', valueArray); | ||
console.log('Current display values:', { displayMinDb, displayMaxDb }); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
console.log('Range onChange raw value:', valueArray); | |
console.log('Current display values:', { displayMinDb, displayMaxDb }); |
if (!Array.isArray(valueArray) || valueArray.length !== 2) return; | ||
|
||
let [newMin, newMax] = valueArray.map(Math.round); | ||
console.log('After rounding:', { newMin, newMax }); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
console.log('After rounding:', { newMin, newMax }); |
// Ensure values stay within bounds | ||
newMin = Math.max(-120, Math.min(0, newMin)); | ||
newMax = Math.max(-120, Math.min(0, newMax)); | ||
console.log('After bounds check:', { newMin, newMax }); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
console.log('After bounds check:', { newMin, newMax }); |
|
||
// Ensure min is always less than max | ||
if (newMin >= newMax) { | ||
console.log('Values crossed or equal, adjusting...'); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
console.log('Values crossed or equal, adjusting...'); |
console.log('Values crossed or equal, adjusting...'); | ||
if (isMinMoving) { | ||
newMin = Math.min(newMin, newMax - 1); | ||
console.log('Adjusted min:', newMin); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
console.log('Adjusted min:', newMin); |
console.log('Adjusted min:', newMin); | ||
} else { | ||
newMax = Math.max(newMax, newMin + 1); | ||
console.log('Adjusted max:', newMax); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
console.log('Adjusted max:', newMax); |
} | ||
} | ||
|
||
console.log('Final values before handleDbRangeChange:', { newMin, newMax }); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
console.log('Final values before handleDbRangeChange:', { newMin, newMax }); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So far other than some linting/code organization changes, this looks amazing 🔥.
Hey @bmartel, I'm going to push an update shortly that speeds up the computations and makes the FFT computations async. LS should be way more responsive than the sync version I presented here. The render pipeline has also been completely rehauled to optimise rendering. |
Here's a short Loom video showing it in action: https://www.loom.com/share/b2aa14fe5f7e4411811a3437bdc3b729 Label Studio (LS) should feel much snappier since I moved the FFT computations to be async and overhauled the rendering pipeline. |
@cloudmark Heads up, I resolved the conflicts on this branch and pushed up the changes, as most of it had to do with the recent release of Dark Mode and Design Tokens. |
…ent with configurable settings and improved UI controls. ### Key changes: - Extracted window functions into a separate `WindowFunctions` module for better code organization. - Created new `ColorMapper` module for handling spectrogram color schemes. - Added `spectrogram` property to the `AudioPlus` component (optional boolean to enable/disable). - Implemented FFT-based spectrogram rendering with configurable parameters. - Added UI controls for spectrogram settings (FFT size, color scheme, dB range). - Fixed CSS styling issues in the configuration modal. - Improved section header positioning and spacing. ### Features: - Real-time spectrogram visualization. - Configurable FFT window size and type. - Multiple color scheme options. - Adjustable dB range for visualization. - Mel-scale frequency mapping support. - Responsive rendering with performance optimizations. ### Labels: - `audio` - `community:feature-request` - `community:reviewed` - `editor` - `feature`
@bmartel branch updated to include all the updates. |
/git merge develop
|
/fm sync |
- Fix spectrogram visibility by including scrollbar height in rendering pipeline - Update color scheme to reflect changes immediately - Optimize progress bar: - Reduce size for better UI integration - Add auto-hide behavior when no progress is present - Improve frequency grid layout with optimal Hz label placement These changes improve the audio player's visual feedback and user experience by ensuring proper rendering of the spectrogram, making the progress indicator more subtle and responsive, and enhancing the readability of frequency labels.
Spectrogram visualization to Audio Component
Reason for change
This PR adds spectrogram visualization support to the audio editor, enabling users to visualize frequency content over time in audio recordings. This feature enhances audio annotation capabilities by providing visual frequency analysis tools, particularly useful for tasks like speech analysis, music transcription, and sound event detection.
The implementation includes:
Screenshots
Shows the labeling interface configuration with the new
spectrogram="true"
property in the XML configuration, demonstrating how the feature can be enabled through the labeling interface.Demonstrates the color scheme selection interface with:
Shows interactive features:
Comprehensive control panel featuring:
Detailed configuration options:
Rollout strategy
The feature is implemented with a progressive enhancement approach:
Testing
Comprehensive testing strategy:
Risks
Reviewer notes
Key areas to review:
Visualizer.ts
: Spectrogram rendering logicWindowFunctions.ts
: Audio processing utilitiesColorMapper.ts
: Color scheme managementGeneral notes
The spectrogram visualization feature provides: