Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

On-device Web Speech API #1038

Open
1 task done
evanbliu opened this issue Jan 9, 2025 · 0 comments
Open
1 task done

On-device Web Speech API #1038

evanbliu opened this issue Jan 9, 2025 · 0 comments
Assignees

Comments

@evanbliu
Copy link

evanbliu commented Jan 9, 2025

こんにちは TAG-さん!

I'm requesting a TAG review of on-device support for the Web Speech API.

This feature adds on-device speech recognition support to the Web Speech API, allowing websites to ensure that neither audio nor transcribed speech are sent to a third-party service for processing. Websites can query the availability of on-device speech recognition for specific languages, prompt users to install the necessary resources for on-device speech recognition, and choose between on-device or cloud-based speech recognition as needed.

2.2. Do features in your specification expose the minimum amount of information necessary to enable their intended uses?
Yes. Some websites may have strict privacy requirements that require on-device speech recognition so websites must know if it's possible to ensure that neither audio nor captions are sent to a third-party service for processing.

2.6. Do the features in your specification expose information about the underlying platform to origins?
While this feature does not directly expose information about the underlying platform, websites may potentially use performance metrics for on-device speech recognition to gauge general hardware capability.

2.15. Does this specification have both "Security Considerations" and "Privacy Considerations" sections?
Yes, the spec contains a section on how to reduce risk of fingerprinting. Websites needs explicit user permission to install on-device speech recognition language packs that do not match the user's preferred language or if the user is not on ethernet or Wi-Fi.

Further details:

  • I have reviewed the TAG's Web Platform Design Principles
  • The group where the work on this specification is currently being done: Audio Community Group
  • The group where standardization of this work is intended to be done (if different from the current group): Audio Working Group
  • This work is being funded by: Google

You should also know that...
The primary risk of this new functionality is the potential for fingerprinting. To mitigate this risk, the Chrome Trust & Safety team proposes requiring explicit user consent to install language packs that do not match one of the user's preferred languages or if the user is not on a Ethernet/Wi-Fi network.

The existing Web Speech API has an outdated callback design which must be maintained due to backwards compatibility/interoperability issues. While Firefox doesn't officially support the speech recognition section of the Web Speech API, it has a unprefixed implementation behind a flag and most of the guides on how to use the Web Speech API do something like window.SpeechRecognition || window.webkitSpeechRecognition; (Examples from developer.mozilla.org, codeburst.io, dev.to) and there are 17.8K instances of this kind of usage on Github alone. The Audio Working Group is looking into potentially replacing this API with a new, modernized version under a different name. A separate TAG design review will be sent for that if the group decides to proceed with the new API.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants