Skip to content

Toggle Spectrogram Preview for Audio #384

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
Path-A opened this issue Aug 3, 2020 · 36 comments
Open

Toggle Spectrogram Preview for Audio #384

Path-A opened this issue Aug 3, 2020 · 36 comments
Assignees
Labels
audio community:feature-request Feature Request from the community reviewed by the community team. community:reviewed Issue has been reviewed by the Label Studio Community Team. editor Label Studio Frontend feature Feature request often asked
Milestone

Comments

@Path-A
Copy link

Path-A commented Aug 3, 2020

Is your feature request related to a problem? Please describe.
Classifying or segmenting audio with only a waveform preview can be time-consuming or difficult, especially with noisy audio data. Some data is more easily segmented by looking at frequency content over time.

Describe the solution you'd like
Include a toggle to preview a spectrogram representation of an audio clip. Some common python libraries to generate these are Librosa or Scipy.signal.

Describe alternatives you've considered
I've manually generated the spectrograms and saved them as images to be used within the image classification labeling tool. The downsides of this are threefold.

  1. Labeling audio this way does not allow for temporal segmentation. The user must classify the entire spectrogram, not simply a vertical fraction of it. A user could, in theory, use the image annotation tool, but it would be tedious and the user would need to convert bounding boxes to its corresponding time in the audio clip.
  2. The user can no longer listen to the audio clip while viewing the spectrogram image.
  3. The user generated spectograms require temporary additional storage requirements.

Additional context
Each user's spectrogram needs may differ, such as their sound of interest being within the low or high frequency areas of the spectrogram. To keep implementation simple, use default spectrogram parameters that generalize well and potentially allow users to zoom in on this general spectrogram. A more robust solution would allow the user to specify a few parameters to generate the spectrogram that they would want. Lastly, I include an example of a log-scaled spectrogram with its accompanying waveform.
Example

@makseq
Copy link
Member

makseq commented Aug 3, 2020

@Path-A Thank you for your issue. Are you familiar with React?

@Path-A
Copy link
Author

Path-A commented Aug 4, 2020

@makseq Unfortunately not, although I've been wanting to learn.

@niklub niklub added the feature Feature request label Mar 29, 2021
@feddybear
Copy link

feddybear commented Apr 26, 2021

For future reference, I think the challenge here is more of understanding the multicanvas code of wavesurfer.
I was able to successfully use the spectrogram functions but could only implement it on the older single canvas implementation. Unfortunately, for long audio files, this is impractical because of the need to recalculate and redraw spectra when zooming, etc. If pre-segmentation is done, it becomes more practical. But segmentation of audio (especially for speech technology applications) isn't perfect.
Here's a sample demonstration as reference: LINK
It takes about 12-14 seconds per zoom value on a 3-minute file (using N=512 fft samples).

@makseq
Copy link
Member

makseq commented Apr 26, 2021

@feddybear Wow! It's very impressive! Do you have an account on our slack? https://label-studio.slack.com/

@feddybear
Copy link

@feddybear Wow! It's very impressive! Do you have an account on our slack? https://label-studio.slack.com/

Hi @makseq yeah I also mentioned this on one of the spectrogram inquiries there. But I'm leaving it to someone more capable, especially in reading the wavesurfer multicanvas codes. Hopefully it's also someone who knows signal processing, as the older implementation of drawing spectrogram on wavesurfer had some really weird canvas settings that didn't make sense (e.g. height of the spectrogram).

@makseq
Copy link
Member

makseq commented Apr 26, 2021

@feddybear Let's move to slack. I know DSP, also we'll include our frontend team there.

@makseq
Copy link
Member

makseq commented Apr 26, 2021

@feddybear please, tag me again (@makseq). I can't find my mention there.

@Tom-Lu
Copy link

Tom-Lu commented May 21, 2021

I'm also seeking for similar feature, any progress so far?

@makseq
Copy link
Member

makseq commented May 22, 2021

@Tom-Lu
We have some news from our contributor:
https://github.com/feddybear/label-studio-frontend

I hope we will make this work to the end.

@tpeet
Copy link

tpeet commented Jul 8, 2021

Would be also very interested in this feature, as it is currently hard to select regions for audio with low SNR

@makseq makseq added this to the Label Studio 1.3 milestone Jul 9, 2021
@niklub niklub removed this from the Label Studio 1.3 milestone Aug 30, 2021
@Path-A
Copy link
Author

Path-A commented Sep 30, 2021

Has there been any progress regarding development of this feature?

@makseq
Copy link
Member

makseq commented Sep 30, 2021

Only if @feddybear has any news.
We are currently focusing on the image / html tagging. Audio updates are planned for next year.

@feddybear
Copy link

feddybear commented Oct 1, 2021

Sorry, I have yet to integrate the spectrogram-related edits from the previous version to the latest one. Also, kinda occupied with other stuff outside of annotation.

@makseq makseq added this to the Future milestone Oct 4, 2021
@makseq makseq added often asked audio editor Label Studio Frontend labels Oct 5, 2021
@Selimonder
Copy link

Hello,

Has there been any progress regarding the development of this feature?

@mikolajpabiszczak
Copy link

I know it's irritating with people asking again & again, but it would be really useful to have this, any news about this?

@makseq
Copy link
Member

makseq commented May 19, 2022

Thank you for asking. By your activity we prioritize features, so it isn't irritating :-)
@nicholasrq had some progress in Audio Plus Engine, but I heard that we haven't still implemented spectrograms :-( I will draw the attention of our team to this feature request.

@faroit
Copy link

faroit commented May 24, 2022

@makseq @feddybear also 👍 for spectrograms!

@mcgee0916
Copy link

Hello,
Has there been any progress about this feature?

@cspindler
Copy link

👍 Yes, spectrogram annotation would be fantastic. But it would be just the start - with the spectrogram view available, the following features would be super useful:

  • RectangleLabels in the spectrogram (time and frequency range)
  • Audio playback speed control (with and without time-stretching), especially slowing down to 0.1 of original speed.
  • Choices of frequency scaling (lin, log, mel, bark)
  • Zoom (time and frequency)

I'm not familiar with React at all, but that could change.

We're building a pipeline to annotate bats in high-frequency audio recordings, based on batdetect2. They also have a labelling UI that checks many of the boxes UI-wise, except the handling of large amount of tasks, users, storage backends etc. - all the golden labelstud.io features.

@paulpeyret-biophonia
Copy link

Hello,

I would love to see this feature for spectrogram annotation with sound playback. There is a huge demand from all the bioacoustics and eco-acoustics community (who are still working on desktop app like audacity and raven for annotations).
@cspindler I think you well described the need.

I understand that spectrogram calculation speed is a bottleneck here for the zooming feature inside the spectrogram.
Maybe these libs can help get descent speeds.
libAudioFlux/audioFlux#22

Hopping this feature will come soon.
Cheers

@samvelkoch
Copy link

Up for specs in audio labeling

@sajarin sajarin added the community:reviewed Issue has been reviewed by the Label Studio Community Team. label May 1, 2024
@sajarin sajarin added the community:feature-request Feature Request from the community reviewed by the community team. label May 1, 2024
@sajarin
Copy link
Contributor

sajarin commented May 1, 2024

/jira create

Workflow run
Jira issue TRIAG-527 is created

@DK2895
Copy link

DK2895 commented Jan 20, 2025

Hi guys (@sajarin @makseq @feddybear),

Has there been any further progress on this feature? Thanks

@isaac-jordan
Copy link

+1 on wanting these spectrogram tooling. The lack of it is forcing us to consider using other tools for audio labelling.

@l4j3b
Copy link

l4j3b commented Feb 7, 2025

@isaac-jordan Would you mind sharing the other tools you are considering please? I am in the same situation.

@isaac-jordan
Copy link

@isaac-jordan Would you mind sharing the other tools you are considering please? I am in the same situation.

Two front-runners for us are https://github.com/mbsantiago/whombat and https://www.wildlifeacoustics.com/products/kaleidoscope

@shemerey
Copy link

btw the repo even contains web/node_modules/wavesurfer.js/src/plugin/spectrogram/fft.js I thought that we can enable the plugin and use it, but unfortunately it looks like wavesurfer.js is not used any more.

@xixinzhang
Copy link

It's helpful but seems hard

@DK2895
Copy link

DK2895 commented Mar 4, 2025

Hi there - is there any news on this or whether it is being worked on for a future release?

@JulienBrn
Copy link

JulienBrn commented Mar 6, 2025

Spectrogram view for audio labeling would be very helpful. We are currently using Label Studio for keypoint labeling, but we now need to annotate audio and this can not be done without a spectrogram view... we are considering other options, but we would prefer to stick with label studio.

@xvbai0317
Copy link

I seem to have accomplished this, not sure if that's the kind of mel spectrogram people want.

Image

@jonnor
Copy link

jonnor commented Mar 24, 2025

@xvbai0317 looks good! Are you able to share the code / a branch with your changes? So that others can maybe try it out

@DK2895
Copy link

DK2895 commented Mar 24, 2025

Nice work @xvbai0317, looks awesome! For features, basing them off of Audacity's built in capabilities and parameters would be useful

@xvbai0317
Copy link

It is my pleasure to help you with my work, and I will submit the finished code to a branch later. @jonnor @DK2895

@cloudmark
Copy link
Contributor

Hey @xvbai0317 , I’ve been working on the spectrogram feature and came across this issue, looks like we're tackling the same thing. Just wondering if you're still planning to share a branch with your work? Happy to help out if there's anything I can do, maybe we can get it over the line together.

cloudmark pushed a commit to cloudmark/label-studio that referenced this issue Apr 20, 2025
This commit adds spectrogram visualization capabilities to the audio editor
through a new optional 'spectrogram' property in the AudioPlus component.

Example usage:
<AudioPlus
  name="audio"
  value="$audio"
  height="240"
  hotkey="space"
  defaultscale="1"
  defaultzoom="2"
  zoom="true"
  spectrogram="true"
  sync="group_a"
/>

Key changes:
- Add new 'spectrogram' boolean property to AudioPlus component
- Extract window functions into a dedicated WindowFunctions module
- Create a new ColorMapper module for spectrogram coloring
- Refactor Visualizer class to use the new modules
- Add support for different window functions and color schemes
- Improve type safety and code organization

The spectrogram visualization allows users to:
- Toggle spectrogram view using the 'spectrogram' property
- View frequency content over time alongside waveform
- Switch between different color schemes
- Configure window functions for FFT analysis
- Adjust visualization parameters (FFT size, dB range)

Configuration:
- spectrogram: boolean (optional) - When set to true, enables
  spectrogram visualization alongside the waveform

Labels: audio, editor, feature, community:feature-request, community:reviewed

Closes HumanSignal#384
cloudmark added a commit to cloudmark/label-studio that referenced this issue Apr 20, 2025
Add spectrogram visualization capabilities to the audio editor component with configurable settings and improved UI controls.

Key changes:
- Extract window functions into separate WindowFunctions module for better code organization
- Create new ColorMapper module for handling spectrogram color schemes
- Add spectrogram property to AudioPlus component (optional boolean to enable/disable)
- Implement FFT-based spectrogram rendering with configurable parameters
- Add UI controls for spectrogram settings (FFT size, color scheme, dB range)
- Fix CSS styling issues in the configuration modal
- Improve section header positioning and spacing

Features:
- Real-time spectrogram visualization
- Configurable FFT window size and type
- Multiple color scheme options
- Adjustable dB range for visualization
- Mel-scale frequency mapping support
- Responsive rendering with performance optimizations

Labels:
- audio
- community:feature-request
- community:reviewed
- editor
- feature

Closes HumanSignal#384
@cloudmark
Copy link
Contributor

I'm happy to announce that the spectrogram visualization feature has been implemented and is now available in PR #7400:
#7400

The implementation includes everything from the original feature request plus additional capabilities:

  • Real-time spectrogram visualization
  • Multiple color schemes with live preview
  • Configurable FFT parameters (64-2048 samples)
  • Mel-scale frequency mapping
  • Adjustable dB range (-80 to -10 dB)
  • Different windowing functions (Blackman, Hann, Hamming)

I've recorded a short video demonstrating these features in action:
Video Demonstration

What the video shows:
The video demonstrates the new spectrogram feature within the Label Studio interface. You can see:

  • The spectrogram appearing below the audio waveform.
  • Synchronized playback, with the cursor moving across both the waveform and the spectrogram.
  • How zooming interacts with both views.
  • Live updates as various configuration options are changed through the settings panel, including:
    • FFT window size adjustments
    • Different color schemes being applied
    • Switching between windowing functions
    • Toggling the Mel scale on/off
    • Modifying the dB range for amplitude scaling

You can try it by adding spectrogram="true" to your AudioPlus component:

<AudioPlus
  name="audio"
  value="$audio"
  hotkey="space"
  sync="group_a"
  defaultscale="1"
  defaultzoom="2"
  zoom="true"
  spectrogram="true"
/>

cloudmark added a commit to cloudmark/label-studio that referenced this issue May 6, 2025
This commit adds spectrogram visualization capabilities to the audio editor
through a new optional 'spectrogram' property in the AudioPlus component.

Example usage:
<Audio
  name="audio"
  value="$audio"
  height="240"
  hotkey="space"
  defaultscale="1"
  defaultzoom="2"
  zoom="true"
  spectrogram="true"
  sync="group_a"
/>

Key changes:
- Add new 'spectrogram' boolean property to AudioPlus component
- Extract window functions into a dedicated WindowFunctions module
- Create a new ColorMapper module for spectrogram coloring
- Refactor Visualizer class to use the new modules
- Add support for different window functions and color schemes
- Improve type safety and code organization

The spectrogram visualization allows users to:
- Toggle spectrogram view using the 'spectrogram' property
- View frequency content over time alongside waveform
- Switch between different color schemes
- Configure window functions for FFT analysis
- Adjust visualization parameters (FFT size, dB range)

Configuration:
- spectrogram: boolean (optional) - When set to true, enables
  spectrogram visualization alongside the waveform

Labels: audio, editor, feature, community:feature-request, community:reviewed

Closes HumanSignal#384
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
audio community:feature-request Feature Request from the community reviewed by the community team. community:reviewed Issue has been reviewed by the Label Studio Community Team. editor Label Studio Frontend feature Feature request often asked
Projects
None yet
Development

No branches or pull requests