Skip to content

feat: OPTIC-2123: Audio spectrograms #7400

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 10 commits into
base: develop
Choose a base branch
from

Conversation

cloudmark
Copy link
Contributor

@cloudmark cloudmark commented Apr 20, 2025

Spectrogram visualization to Audio Component

Reason for change

This PR adds spectrogram visualization support to the audio editor, enabling users to visualize frequency content over time in audio recordings. This feature enhances audio annotation capabilities by providing visual frequency analysis tools, particularly useful for tasks like speech analysis, music transcription, and sound event detection.

The implementation includes:

  • FFT-based spectrogram rendering with configurable parameters
  • Multiple color scheme options with live preview
  • Mel-scale frequency mapping support
  • Real-time parameter adjustment
  • Performance optimizations for smooth rendering
  • Modular code organization for maintainability

Screenshots

  1. Configuration Interface
Screenshot 2025-04-20 at 19 05 32

Shows the labeling interface configuration with the new spectrogram="true" property in the XML configuration, demonstrating how the feature can be enabled through the labeling interface.

  1. Color Scheme Selection
Screenshot 2025-04-20 at 19 05 01

Demonstrates the color scheme selection interface with:

  • Multiple predefined schemes (Autumn, Bathymetry, Blackbody, etc.)
  • Visual preview for each scheme
  • Real-time application of color changes
  • Smooth transition between schemes
  1. Time Display on Hover
Screenshot 2025-04-20 at 19 04 50

Shows interactive features:

  • Precise time indicator (00:00:00.447) on hover
  • Clear visualization of frequency content
  • Smooth rendering of the spectrogram
  • Integration with the waveform display
  1. Playback and Spectrogram Settings
Screenshot 2025-04-20 at 19 04 47

Comprehensive control panel featuring:

  • Playback speed adjustment
  • Audio zoom y-axis control
  • FFT Samples slider (64-2048)
  • Loop Regions toggle
  • Auto-play New Regions toggle
  • Integrated spectrogram controls
  1. Advanced Spectrogram Controls
Screenshot 2025-04-20 at 19 04 43

Detailed configuration options:

  • FFT Samples selection (512)
  • Mel Bands adjustment (64)
  • Spectrogram dB range (-80 to -10)
  • Windowing Function selection (Blackman)
  • Color Scheme selection (Viridis)
  • View toggles for timeline, audio wave, and spectrogram

Rollout strategy

The feature is implemented with a progressive enhancement approach:

  1. Feature Flag:
<AudioPlus name="audio"
          value="$audio"
          hotkey="space"
          sync="group_a"
          defaultscale="1"
          defaultzoom="2"
          zoom="true"
          spectrogram="true"
/>
  1. Backward Compatibility:
  • Existing audio components continue to work without changes
  • Spectrogram can be enabled/disabled per instance
  • All new parameters have sensible defaults
  1. Performance Considerations:
  • Lazy loading of FFT computation code
  • Progressive rendering for large files
  • Configurable quality settings

Testing

Comprehensive testing strategy:

  1. Unit Tests:
  • WindowFunctions.ts: Window function calculations
  • ColorMapper.ts: Color scheme management
  • FFT computation accuracy
  • Parameter validation
  1. Integration Tests:
  • Audio loading and visualization
  • Real-time parameter updates
  • Color scheme switching
  • Performance benchmarks
  1. Manual Testing Scenarios:
  • Various audio formats (WAV, MP3, OGG)
  • Different file lengths (short clips to long recordings)
  • Multiple sample rates and bit depths
  • Browser compatibility (Chrome, Firefox, Safari)
  • Performance with large files

Risks

  1. Performance:
  • FFT computation is CPU-intensive
  • Mitigated through:
    • Chunked rendering
    • Yield scheduling
    • Canvas optimization
    • Caching mechanisms
  1. Memory Usage:
  • Large audio files require more memory for FFT
  • Mitigated through:
    • Buffer management
    • Cleanup of unused resources
    • Progressive loading

Reviewer notes

Key areas to review:

  1. Visualizer.ts: Spectrogram rendering logic
  2. WindowFunctions.ts: Audio processing utilities
  3. ColorMapper.ts: Color scheme management
  4. Performance optimizations in rendering loops
  5. Error handling and edge cases

General notes

The spectrogram visualization feature provides:

  • Real-time frequency analysis
  • Multiple color schemes for different use cases
  • Configurable parameters for detailed analysis
  • Smooth integration with existing audio tools
  • Optimized performance for large files

robot-ci-heartex and others added 2 commits April 20, 2025 18:36
This commit adds spectrogram visualization capabilities to the audio editor
through a new optional 'spectrogram' property in the AudioPlus component.

Example usage:
<AudioPlus
  name="audio"
  value="$audio"
  height="240"
  hotkey="space"
  defaultscale="1"
  defaultzoom="2"
  zoom="true"
  spectrogram="true"
  sync="group_a"
/>

Key changes:
- Add new 'spectrogram' boolean property to AudioPlus component
- Extract window functions into a dedicated WindowFunctions module
- Create a new ColorMapper module for spectrogram coloring
- Refactor Visualizer class to use the new modules
- Add support for different window functions and color schemes
- Improve type safety and code organization

The spectrogram visualization allows users to:
- Toggle spectrogram view using the 'spectrogram' property
- View frequency content over time alongside waveform
- Switch between different color schemes
- Configure window functions for FFT analysis
- Adjust visualization parameters (FFT size, dB range)

Configuration:
- spectrogram: boolean (optional) - When set to true, enables
  spectrogram visualization alongside the waveform

Labels: audio, editor, feature, community:feature-request, community:reviewed

Closes HumanSignal#384
Add spectrogram visualization capabilities to the audio editor component with configurable settings and improved UI controls.

Key changes:
- Extract window functions into separate WindowFunctions module for better code organization
- Create new ColorMapper module for handling spectrogram color schemes
- Add spectrogram property to AudioPlus component (optional boolean to enable/disable)
- Implement FFT-based spectrogram rendering with configurable parameters
- Add UI controls for spectrogram settings (FFT size, color scheme, dB range)
- Fix CSS styling issues in the configuration modal
- Improve section header positioning and spacing

Features:
- Real-time spectrogram visualization
- Configurable FFT window size and type
- Multiple color scheme options
- Adjustable dB range for visualization
- Mel-scale frequency mapping support
- Responsive rendering with performance optimizations

Labels:
- audio
- community:feature-request
- community:reviewed
- editor
- feature

Closes HumanSignal#384
Copy link

netlify bot commented Apr 20, 2025

👷 Deploy request for heartex-docs pending review.

Visit the deploys page to approve it

Name Link
🔨 Latest commit ef8d83b

Copy link

netlify bot commented Apr 20, 2025

👷 Deploy request for label-studio-docs-new-theme pending review.

Visit the deploys page to approve it

Name Link
🔨 Latest commit ef8d83b

Copy link

netlify bot commented Apr 20, 2025

Deploy Preview for label-studio-storybook ready!

Name Link
🔨 Latest commit ef8d83b
🔍 Latest deploy log https://app.netlify.com/sites/label-studio-storybook/deploys/6823bf509f348700084ae73f
😎 Deploy Preview https://deploy-preview-7400--label-studio-storybook.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

@cloudmark cloudmark changed the title Feature/spectrogram analyser Spectrogram Analysis in Audio component Apr 20, 2025
@cloudmark
Copy link
Contributor Author

To help visualize the new spectrogram functionality implemented in this PR (#7400), I've recorded a short video demonstration:

Video Demonstration: Spectrogram Feature

What the video shows:

The video walks through the spectrogram feature within the Label Studio interface, highlighting:

  • The spectrogram display integrated below the audio waveform.
  • Synchronized playback tracking across both the waveform and spectrogram.
  • How zooming affects both views simultaneously.
  • Real-time updates as various configuration options are adjusted in the settings panel:
    • Changing FFT window sizes.
    • Applying different color schemes.
    • Selecting various windowing functions (Blackman, Hann, Hamming).
    • Toggling the Mel frequency scale.
    • Adjusting the amplitude (dB) range.

Hope this provides a helpful overview of the user experience!

@cloudmark cloudmark changed the title Spectrogram Analysis in Audio component feat: Implement spectrogram visualization for AudioPlus Apr 20, 2025
@makseq
Copy link
Member

makseq commented Apr 20, 2025

Great PR! How well will it work with long audio files around 1-2 hours?

@cloudmark
Copy link
Contributor Author

cloudmark commented Apr 21, 2025

Hey @makseq,

TL; DR: Yes! it handles long files (1-2 hours) efficiently.

The core strategies implemented are:

  1. On-Demand Processing: It only analyzes the audio needed for the currently visible portion.
  2. Efficient Sampling (Zoomed Out): When zoomed out (e.g., 1 hour on a 1024px view), each pixel represents a time slice (~3.5s). For each pixel, I take one representative FFT (e.g., 512 samples covering ~11.6ms) within that slice, avoiding processing every single audio sample.
  3. Non-Blocking Rendering: A generator function (renderSpectrogramSlice) renders pixel by pixel, yielding frequently (~16ms) to keep the UI responsive during interactions like scrolling/zooming.

This approach balances performance, memory, and visual overview. As you zoom in, the detail naturally increases as fewer samples are represented per pixel.

Separately, the chosen FFT window size affects the computation time per slice (larger FFTs = more detail but slower slice render). This characteristic is independent of total file length. For the most fluid feel, 512 is often a good balance.

To demonstrate this with varied audio content, the video uses a 1-hour file created by concatenating samples from the ESC-50 dataset (https://github.com/karolpiczak/ESC-50). This dataset contains 2000 short environmental sound recordings across 50 categories (like dogs barking, rain, helicopters, etc.), ensuring the test file has diverse spectral characteristics.

Video Demo: Spectrogram Performance & FFT Size Impact (1hr ESC-50 file)

(Video shows loading/panning the long, varied file & the visible speed difference when switching FFT sizes).

@farioas
Copy link
Member

farioas commented Apr 21, 2025

@cloudmark please rebase your branch on the latest changes from repo to include this commit 9b0487f. It will fix failing checks.

@cloudmark
Copy link
Contributor Author

Done @farioas, I also updated #7376 to take from this upstream

@makseq
Copy link
Member

makseq commented Apr 25, 2025

<AudioPlus> was deprecated, could you please use <Audio> instead?

@cloudmark
Copy link
Contributor Author

Thank you @makseq for the heads up. I think internally they should resolve to the same component so there are no further updates needed (I believe).

@makseq makseq changed the title feat: Implement spectrogram visualization for AudioPlus feat: OPTIC-2123: Audio spectrograms Apr 30, 2025
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we extract the logic introduced here into a separate set of component files, and hooks/utilities, so we don't end up with a 700+ line file in ConfigControl.tsx. We have cursor rules in this project that outline the best practices, aim for 1 component per file where possible, and extract hooks and utils similarly.

import { createPortal } from "react-dom";
import {Toggle, Tooltip} from "@humansignal/ui";
import {Block, Elem} from "../../../utils/bem";
import {Slider as AntSlider, Select} from "antd";
Copy link
Contributor

@bmartel bmartel May 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
import {Slider as AntSlider, Select} from "antd";
import { Slider as AntSlider } from "antd";

I noticed additional usage of AntD components, we are actively removing our dependence on these in the project, so we can use a Select component available from our internal ui lib. See the above suggestion.

import {Slider as AntSlider, Select} from "antd";
import {Range} from "../../../common/Range/Range";

import {IconConfig, IconInfoConfig} from "@humansignal/ui";
Copy link
Contributor

@bmartel bmartel May 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
import {IconConfig, IconInfoConfig} from "@humansignal/ui";
import { IconConfig, IconInfoConfig } from "@humansignal/icons";

import { Slider } from "./Slider";
import {type FC, type MouseEvent, useContext, useEffect, useMemo, useRef, useState} from "react";
import { createPortal } from "react-dom";
import {Toggle, Tooltip} from "@humansignal/ui";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
import {Toggle, Tooltip} from "@humansignal/ui";
import { Select, Toggle, Tooltip } from "@humansignal/ui";

@bmartel
Copy link
Contributor

bmartel commented May 5, 2025

Hey @cloudmark 👋, can you please run the linter/formatter over your changes as well as rebase there are currently conflicts with a few files.

To lint/fix, from the LabelStudio project root: make fmt-all


// Update Windowing Function
if (params.windowingFunction && params.windowingFunction !== this.spectrogramWindowingFunction) {
console.log(`Visualizer: Updating Windowing Function to ${params.windowingFunction}`);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
console.log(`Visualizer: Updating Windowing Function to ${params.windowingFunction}`);


// Update Color Scheme
if (params.colorScheme && params.colorScheme !== this.spectrogramColorScheme) {
console.log(`Visualizer: Updating Colors Scheme Function to ${params.colorScheme}`);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
console.log(`Visualizer: Updating Colors Scheme Function to ${params.colorScheme}`);

// Update dB Range
if ((params.minDb !== undefined && params.minDb !== this.spectrogramMinDb) ||
(params.maxDb !== undefined && params.maxDb !== this.spectrogramMaxDb)) {
console.log(`Visualizer: Updating dB Range to ${params.minDb} - ${params.maxDb}`);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
console.log(`Visualizer: Updating dB Range to ${params.minDb} - ${params.maxDb}`);

console.log('handleDbRangeChange received:', values);

if (!Array.isArray(values) || values.length !== 2) {
console.log('Invalid values array');
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
console.log('Invalid values array');

Comment on lines 469 to 470
console.log('Current state before update:', { displayMinDb, displayMaxDb });
console.log('New values to set:', { newMinDb, newMaxDb });
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
console.log('Current state before update:', { displayMinDb, displayMaxDb });
console.log('New values to set:', { newMinDb, newMaxDb });


// Basic validation
if (isNaN(newMinDb) || isNaN(newMaxDb) || newMinDb >= newMaxDb) {
console.log('Values invalid or crossed:', { newMinDb, newMaxDb });
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
console.log('Values invalid or crossed:', { newMinDb, newMaxDb });

if (lastUpdate && currentTime - lastUpdate.time < 100) {
// If we're getting a quick update that would change max when we're moving min
if (lastUpdate.min === newMinDb && lastUpdate.max !== newMaxDb && newMaxDb !== displayMaxDb) {
console.log('Preventing unstable max update');
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
console.log('Preventing unstable max update');

}
// If we're getting a quick update that would change min when we're moving max
if (lastUpdate.max === newMaxDb && lastUpdate.min !== newMinDb && newMinDb !== displayMinDb) {
console.log('Preventing unstable min update');
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
console.log('Preventing unstable min update');

// Update local state
setDisplayMinDb(newMinDb);
setDisplayMaxDb(newMaxDb);
console.log('State updated to:', { newMinDb, newMaxDb });
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
console.log('State updated to:', { newMinDb, newMaxDb });

Comment on lines 513 to 514
console.log('Range onChange raw value:', valueArray);
console.log('Current display values:', { displayMinDb, displayMaxDb });
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
console.log('Range onChange raw value:', valueArray);
console.log('Current display values:', { displayMinDb, displayMaxDb });

if (!Array.isArray(valueArray) || valueArray.length !== 2) return;

let [newMin, newMax] = valueArray.map(Math.round);
console.log('After rounding:', { newMin, newMax });
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
console.log('After rounding:', { newMin, newMax });

// Ensure values stay within bounds
newMin = Math.max(-120, Math.min(0, newMin));
newMax = Math.max(-120, Math.min(0, newMax));
console.log('After bounds check:', { newMin, newMax });
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
console.log('After bounds check:', { newMin, newMax });


// Ensure min is always less than max
if (newMin >= newMax) {
console.log('Values crossed or equal, adjusting...');
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
console.log('Values crossed or equal, adjusting...');

console.log('Values crossed or equal, adjusting...');
if (isMinMoving) {
newMin = Math.min(newMin, newMax - 1);
console.log('Adjusted min:', newMin);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
console.log('Adjusted min:', newMin);

console.log('Adjusted min:', newMin);
} else {
newMax = Math.max(newMax, newMin + 1);
console.log('Adjusted max:', newMax);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
console.log('Adjusted max:', newMax);

}
}

console.log('Final values before handleDbRangeChange:', { newMin, newMax });
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
console.log('Final values before handleDbRangeChange:', { newMin, newMax });

Copy link
Contributor

@bmartel bmartel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So far other than some linting/code organization changes, this looks amazing 🔥.

@cloudmark
Copy link
Contributor Author

Hey @bmartel, I'm going to push an update shortly that speeds up the computations and makes the FFT computations async. LS should be way more responsive than the sync version I presented here. The render pipeline has also been completely rehauled to optimise rendering.

@cloudmark
Copy link
Contributor Author

Here's a short Loom video showing it in action: https://www.loom.com/share/b2aa14fe5f7e4411811a3437bdc3b729

Label Studio (LS) should feel much snappier since I moved the FFT computations to be async and overhauled the rendering pipeline.

@bmartel
Copy link
Contributor

bmartel commented May 6, 2025

@cloudmark Heads up, I resolved the conflicts on this branch and pushed up the changes, as most of it had to do with the recent release of Dark Mode and Design Tokens.

…ent with configurable settings and improved UI controls.

### Key changes:
- Extracted window functions into a separate `WindowFunctions` module for better code organization.
- Created new `ColorMapper` module for handling spectrogram color schemes.
- Added `spectrogram` property to the `AudioPlus` component (optional boolean to enable/disable).
- Implemented FFT-based spectrogram rendering with configurable parameters.
- Added UI controls for spectrogram settings (FFT size, color scheme, dB range).
- Fixed CSS styling issues in the configuration modal.
- Improved section header positioning and spacing.

### Features:
- Real-time spectrogram visualization.
- Configurable FFT window size and type.
- Multiple color scheme options.
- Adjustable dB range for visualization.
- Mel-scale frequency mapping support.
- Responsive rendering with performance optimizations.

### Labels:
- `audio`
- `community:feature-request`
- `community:reviewed`
- `editor`
- `feature`
@cloudmark
Copy link
Contributor Author

@bmartel branch updated to include all the updates.

@cloudmark cloudmark requested a review from bmartel May 9, 2025 00:14
@makseq
Copy link
Member

makseq commented May 9, 2025

/git merge develop

Workflow run
Successfully merged: Already up to date.

@makseq
Copy link
Member

makseq commented May 9, 2025

/fm sync

Workflow run

@cloudmark cloudmark requested review from a team, hlomzik, Gondragos and nick-skriabin as code owners May 13, 2025 19:58
- Fix spectrogram visibility by including scrollbar height in rendering pipeline
- Update color scheme to reflect changes immediately
- Optimize progress bar:
  - Reduce size for better UI integration
  - Add auto-hide behavior when no progress is present
- Improve frequency grid layout with optimal Hz label placement

These changes improve the audio player's visual feedback and user experience
by ensuring proper rendering of the spectrogram, making the progress
indicator more subtle and responsive, and enhancing the readability of
frequency labels.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants