On-device Web Speech API #1038

evanbliu · 2025-01-09T22:49:18Z

こんにちは TAG-さん!

I'm requesting a TAG review of on-device support for the Web Speech API.

This feature adds on-device speech recognition support to the Web Speech API, allowing websites to ensure that neither audio nor transcribed speech are sent to a third-party service for processing. Websites can query the availability of on-device speech recognition for specific languages, prompt users to install the necessary resources for on-device speech recognition, and choose between on-device or cloud-based speech recognition as needed.

Explainer¹: Add on-device speech recognition support to the Web Speech API WebAudio/web-speech-api#122
Specification: https://webaudio.github.io/web-speech-api/
WPT Tests: https://github.com/web-platform-tests/wpt/tree/master/speech-api
User research: N/A
Security and Privacy self-review²:
Relevant survey questions:
2.1. What information might this feature expose to Web sites or other parties, and for what purposes is that exposure necessary?
This feature would expose if on-device speech recognition is available in a specific language. This is required in order for websites to know if on-device speech recognition is available.

2.2. Do features in your specification expose the minimum amount of information necessary to enable their intended uses?
Yes. Some websites may have strict privacy requirements that require on-device speech recognition so websites must know if it's possible to ensure that neither audio nor captions are sent to a third-party service for processing.

2.6. Do the features in your specification expose information about the underlying platform to origins?
While this feature does not directly expose information about the underlying platform, websites may potentially use performance metrics for on-device speech recognition to gauge general hardware capability.

2.15. Does this specification have both "Security Considerations" and "Privacy Considerations" sections?
Yes, the spec contains a section on how to reduce risk of fingerprinting. Websites needs explicit user permission to install on-device speech recognition language packs that do not match the user's preferred language or if the user is not on ethernet or Wi-Fi.

GitHub repo: https://github.com/WebAudio/web-speech-api
Primary contacts: [email protected]
Organization/project driving the specification: Google
Multi-stakeholder support³:
- Chromium comments: https://chromestatus.com/feature/6090916291674112
- Mozilla comments: On-device Web Speech API mozilla/standards-positions#1157
- WebKit comments: On-device Web Speech API WebKit/standards-positions#443
  Commonly requested feature. Examples:
  https://webwewant.fyi/wants/55/
  Offline/on-device speech recognition WebAudio/web-speech-api#108
  https://stackoverflow.com/questions/49473369/offline-speech-recognition-in-browser
  https://www.reddit.com/r/html5/comments/8jtv3u/offline_voice_recognition_without_the_webspeech/

Further details:

I have reviewed the TAG's Web Platform Design Principles
The group where the work on this specification is currently being done: Audio Community Group
The group where standardization of this work is intended to be done (if different from the current group): Audio Working Group
This work is being funded by: Google

You should also know that...
The primary risk of this new functionality is the potential for fingerprinting. To mitigate this risk, the Chrome Trust & Safety team proposes requiring explicit user consent to install language packs that do not match one of the user's preferred languages or if the user is not on a Ethernet/Wi-Fi network.

The existing Web Speech API has an outdated callback design which must be maintained due to backwards compatibility/interoperability issues. While Firefox doesn't officially support the speech recognition section of the Web Speech API, it has a unprefixed implementation behind a flag and most of the guides on how to use the Web Speech API do something like window.SpeechRecognition || window.webkitSpeechRecognition; (Examples from developer.mozilla.org, codeburst.io, dev.to) and there are 17.8K instances of this kind of usage on Github alone. The Audio Working Group is looking into potentially replacing this API with a new, modernized version under a different name. A separate TAG design review will be sent for that if the group decides to proceed with the new API.

evanbliu added the Progress: untriaged label Jan 9, 2025

jyasskin self-assigned this Jan 9, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

On-device Web Speech API #1038

On-device Web Speech API #1038

evanbliu commented Jan 9, 2025

On-device Web Speech API #1038

On-device Web Speech API #1038

Comments

evanbliu commented Jan 9, 2025