Multi channel audio merging #320
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Previous versions of WhisperKit was only using the first channel of Audio, regardless of whether it contained 2 or more channels. This update allows developers to combine all channels available, or choose specific channels to combine to feed into the transcription. To set the configuration on load of WhisperKit, you can pass it through to WhisperKitConfig similar to this:
This will sum the 2nd, 4th, and 6th channel (the first channel is index 0).
Another way to pass the channelMode is to directly feed it into the loadAudio function like this:
This option has been added to all audio processing API's that would read audio.
The audio merging algorithm works like this:
We find the peak across all channels, check if the peak of the mono (summed) version is higher than any of the peaks of the channels, then we multiply the whole track so that the peak of the mono channel matches the peak of the loudest channel.
Eg: Top mono (merged) buffer, bottom individual channels (pre-merge)

Here you can see how the merged audio maintains the same loudness as the original multi-channel audio file, and you can see the total merged waveform of all the channels.
Resolves these two issues:
#134
#313