-
Notifications
You must be signed in to change notification settings - Fork 173
Description
I have been working on implementing my own dynamic range compressor and decided to base my initial implementation off the WebAudio spec, and the more I look into this the more questions I have. Note that I am very new to audio processing so feel free to just say I'm wrong about any of this. First off some quick issues I found:
Let releasing be
trueif attenuation is greater than compressor gain,falseotherwise.
Since releasing is used to decide how to mix compressor gain and detector average, I assume this is supposed to be "if detector average is greater than compressor gain?"
The envelope rate MUST be the calculated from the ratio of the compressor gain and the detector average.
NOTE: When attacking, this number less than or equal to 1, when releasing, this number is strictly greater than 1.
I think this overall needs some better wording. First off, "MUST be the calculated from". Secondly I read the "ratio" as being compressor gain divided by the detector average. This doesn't match the note however, since when attacking (detector average is less than compressor gain) the ratio would be grater than 1, and vice-versa. Although on later reads I realised that "this number" more likely refers to the envelope rate, which in that case why have the note at all? The later two points specify the ranges of the envelope rate when attacking or releasing. Overall just needs some better wording.
Anyway, more overall issues with the spec. Notably that the specification was based off the Chromium implementation (see #10) which all browsers use verbatim, while the spec leaves things out to stay vague but still is roughly modelled after it in such a way that's confusing and somewhat inflexible. And the original code it was implemented off has been seemingly mostly unchanged since its first commit which doesn't explain a lot of the components used other than a few graphs linked in a ticket. I'll also be referencing this paper (Wayback Machine) which was mentioned when designing the spec.
The compression curve in the spec is the part that makes the most sense. Seems to be correctly split up into 3 portions. The WebKit/Blink implementation seems to use a different function than the one I derived which basically matched the one described in the paper, but that is both not within the scope of this repo and likely not an issue (haven't dived into Chromiums implementation too much but it probably produces a similar function shape). It makes sense for this to stay vague since there is not much variability given the outlined requirements of the function.
The envelope rate and how it is applied to the compression gain is where things start to fall apart. If we have a simple envelope_rate = releasing ? 1 - release_coeff : 1 - attack_coeff then steps 7 and 8 of the EnvelopeFollower processing:
- If releasing is
true, set compressor gain to be the product of compressor gain and envelope rate, clamped to a maximum of 1.0.- Else, if releasing is
false, let gain increment to be detector average minus compressor gain. Multiply gain increment by envelope rate, and add the result to compressor gain.
.. does closely match the branching detector shown in the paper:

Except for the fact that this detector is designed to go to zero (silence) during release, while the WebAudio release goes to one (100% gain). We could adjust the release rate to be greater than one as required by the spec (e.g. 2.0 - release_coeff or 1.0 / release_ceoff) but this still causes the compressor gain to start its release slow and speed up as it gets closer to 100%, rather than starting fast and slowing down (which a release to zero would cause). WebKit/Blink gets around this by having an adaptive release curve which it recalculates every 32 samples to slow down the release time the further it is from the target, but I don't think this would be needed if the release rate was applied more correctly and did this already. E.g. compressor_gain += (1 - compressor_gain) * envelope_rate, or just use the same function as the attack side which would match the smoothed branching detector:

(Also side note, I have no clue what Blinks attack curve is supposed to be? Doesn't match anything I've read online and again seems to be unchanged since the initial commit which explains nothing about this attack curve. Although again this is outside the scope of this repo, but if anyone knows where they got it from please let me know, I am very curious)
Finally the detector curve. This is the most vague part of the spec. It says it allows implementing "adaptive release" or "to have curves for attack and release that are not of the same shape". But then how does it differ from the envelope rate? All this can be done with that already (and as mentioned above, WebKit/Blink does its adaptive release as part of its envelope rate). The fact that it applies to detector average seems to suggest it is used for computing an average level, and the comment that linked the paper additionally mentions that Chromium uses an RMS detector. However this is incorrect, WebKit/Blink does not have any sort of rolling average window that would be needed for an RMS detector. What they do seems to be a lot closer to a decoupled peak detector (which is also what I ended up effectively using the detector curve for):

But using it like this makes the detector average and detector rate terminology more confusing, because this step only ends up being half of the "detector" and does not "average" anything.
I think if the spec wanted to stay vague, it would make a lot more sense to get rid of the separate envelope rate and detector curve functions and just have some sort of "envelope function" which takes an attenuation and compressor gain and returns a new compressor gain. This would allow a lot of flexibility of the type of curve that is used, although this makes the spec even less vague and up to implementation than it currently is.
That being said, every browser (afaik) uses the exact same implementation anyway. Which was implemented 15 years ago which from my (albeit very limited) understanding, has some questionable choices for functions that I cannot find any explanation of where they came from or why they were used. So it might make more sense to review this common implementation and question if something better could be used, and regardless document it more specifically in the WebAudio spec with explanations of the choice of functions.