Many questions about the DynamicsCompressor spec

I have been working on implementing my own dynamic range compressor and decided to base my initial implementation off the WebAudio spec, and the more I look into this the more questions I have. Note that I am very new to audio processing so feel free to just say I'm wrong about any of this. First off some quick issues I found:

> Let _releasing_ be `true` if _attenuation_ is greater than _compressor gain_, `false` otherwise.

Since _releasing_ is used to decide how to mix _compressor gain_ and _detector average_, I assume this is supposed to be "if _detector average_ is greater than _compressor gain_?"

> The envelope rate MUST be the calculated from the ratio of the _compressor gain_ and the _detector average_.
> > **NOTE:** When attacking, this number less than or equal to 1, when releasing, this number is strictly greater than 1.

I think this overall needs some better wording. First off, "MUST be ~~the~~ calculated from". Secondly I read the "ratio" as being _compressor gain_ divided by the _detector average_. This doesn't match the note however, since when attacking (_detector average_ is less than _compressor gain_) the ratio would be grater than 1, and vice-versa. Although on later reads I realised that "this number" more likely refers to the envelope rate, which in that case why have the note at all? The later two points specify the ranges of the envelope rate when attacking or releasing. Overall just needs some better wording.

Anyway, more overall issues with the spec. Notably that the specification was based off the [Chromium implementation](https://github.com/chromium/chromium/blob/main/third_party/blink/renderer/platform/audio/dynamics_compressor.cc) (see #10) which all browsers use verbatim, while the spec leaves things out to stay vague but still is roughly modelled after it in such a way that's confusing and somewhat inflexible. And the original code it was implemented off has been seemingly mostly unchanged since [its first commit](https://github.com/chromium/chromium/commit/ca9fefcbc79e32b51c588467d08d4c98acb296fe) which doesn't explain a lot of the components used other than a [few graphs linked in a ticket](https://bugs.webkit.org/show_bug.cgi?id=60682). I'll also be referencing [this paper (Wayback Machine)](https://web.archive.org/web/20151117110244/https://www.eecs.qmul.ac.uk/~josh/documents/GiannoulisMassbergReiss-dynamicrangecompression-JAES2012.pdf) which was [mentioned when designing the spec](https://github.com/WebAudio/web-audio-api/issues/10#issuecomment-253041536).

The **compression curve** in the spec is the part that makes the most sense. Seems to be correctly split up into 3 portions. The WebKit/Blink implementation seems to use a different function than the one I derived which basically matched the one described in the paper, but that is both not within the scope of this repo and likely not an issue (haven't dived into Chromiums implementation too much but it probably produces a similar function shape). It makes sense for this to stay vague since there is not much variability given the outlined requirements of the function.

The **envelope rate** and how it is applied to the compression gain is where things start to fall apart. If we have a simple `envelope_rate = releasing ? 1 - release_coeff : 1 - attack_coeff` then steps 7 and 8 of the EnvelopeFollower processing:
> 7. If _releasing_ is `true`, set _compressor gain_ to be the product of _compressor gain_ and _envelope rate_, clamped to a maximum of 1.0.
> 8. Else, if _releasing_ is `false`, let _gain increment_ to be _detector average_ minus _compressor gain_. Multiply _gain increment_ by _envelope rate_, and add the result to _compressor gain_.

.. does closely match the branching detector shown in the paper:
<img width="431" height="88" alt="Image" src="https://github.com/user-attachments/assets/7a9ce317-a009-49cb-bcfc-4a59571f3475" />
Except for the fact that this detector is designed to go to zero (silence) during release, while the WebAudio release goes to one (100% gain). We could adjust the release rate to be greater than one as required by the spec (e.g. `2.0 - release_coeff` or `1.0 / release_ceoff`) but this still causes the compressor gain to start its release slow and speed up as it gets closer to 100%, rather than starting fast and slowing down (which a release to zero would cause). WebKit/Blink gets around this by having an [adaptive release curve](https://github.com/chromium/chromium/blob/88547700c7111c05932d805609d48be9e92a4f87/third_party/blink/renderer/platform/audio/dynamics_compressor.cc#L255) which it recalculates every 32 samples to slow down the release time the further it is from the target, but I don't think this would be needed if the release rate was applied more correctly and did this already. E.g. `compressor_gain += (1 - compressor_gain) * envelope_rate`, or just use the same function as the attack side which would match the smoothed branching detector:
<img width="435" height="62" alt="Image" src="https://github.com/user-attachments/assets/541278da-b110-41c9-895b-d3297b6581b7" />

(Also side note, I have no clue what Blinks [attack curve](https://github.com/chromium/chromium/blob/88547700c7111c05932d805609d48be9e92a4f87/third_party/blink/renderer/platform/audio/dynamics_compressor.cc#L277) is supposed to be? Doesn't match anything I've read online and again seems to be unchanged since the initial commit which explains nothing about this attack curve. Although again this is outside the scope of this repo, but if anyone knows where they got it from please let me know, I am very curious)

Finally the **detector curve**. This is the most vague part of the spec. It says it allows implementing "adaptive release" or "to have curves for attack and release that are not of the same shape". But then how does it differ from the envelope rate? All this can be done with that already (and as mentioned above, WebKit/Blink does its adaptive release as part of its envelope rate). The fact that it applies to _detector average_ seems to suggest it is used for computing an average level, and the [comment that linked the paper](https://github.com/WebAudio/web-audio-api/issues/10#issuecomment-253041536) additionally mentions that Chromium uses an RMS detector. However this is incorrect, WebKit/Blink does not have any sort of rolling average window that would be needed for an RMS detector. What they do seems to be a lot closer to a decoupled peak detector (which is also what I ended up effectively using the detector curve for):
<img width="412" height="84" alt="Image" src="https://github.com/user-attachments/assets/abedb14c-b27f-416d-afe3-115877fa3adc" />
But using it like this makes the _detector average_ and _detector rate_ terminology more confusing, because this step only ends up being half of the "detector" and does not "average" anything.

I think if the spec wanted to stay vague, it would make a lot more sense to get rid of the separate envelope rate and detector curve functions and just have some sort of "envelope function" which takes an _attenuation_ and _compressor gain_ and returns a new _compressor gain_. This would allow a lot of flexibility of the type of curve that is used, although this  makes the spec even less vague and up to implementation than it currently is.

That being said, every browser (afaik) uses the exact same implementation anyway. Which was implemented 15 years ago which from my (albeit very limited) understanding, has some questionable choices for functions that I cannot find any explanation of where they came from or why they were used. So it might make more sense to review this common implementation and question if something better could be used, and regardless document it more specifically in the WebAudio spec with explanations of the choice of functions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Many questions about the DynamicsCompressor spec #2670

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Many questions about the DynamicsCompressor spec #2670

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions