WebXR Hand Input API Specification #568

fordacious · 2020-11-12T06:34:30Z

HIQaH! QaH! TAG!

I'm requesting a TAG review of WebXR Hand Input.

The WebXR Hand Input module expands the WebXR Device API with the functionality to track articulated hand poses.

Explainer¹ (minimally containing user needs and example code): https://github.com/immersive-web/webxr-hand-input/blob/master/explainer.md
Specification URL: (currently incomplete) https://www.w3.org/TR/webxr-hand-input-1/
Tests: https://github.com/web-platform-tests/wpt/tree/master/webxr/hand-input
Security and Privacy self-review²: https://immersive-web.github.io/webxr-hand-input/#privacy-security
GitHub repo (if you prefer feedback filed there): https://github.com/immersive-web/webxr-hand-input
Primary contacts (and their relationship to the specification):
- Lachlan Ford (@fordacious), Microsoft
- Manish Goregaokar, (@Manishearth), Invited Expert and original point of contact
Organization(s)/project(s) driving the specification: Immersive Web Working Group https://www.w3.org/immersive-web/
Key pieces of existing multi-stakeholder review or discussion of this specification:
External status/issue trackers for this specification (publicly visible, e.g. Chrome Status): https://github.com/immersive-web/webxr-hand-input/issues

Further details:

I have reviewed the TAG's API Design Principles
Relevant time constraints or deadlines:
The group where the work on this specification is currently being done: Immersive Web Working Group
The group where standardization of this work is intended to be done (if current group is a community group or other incubation venue): Immersive Web Working Group
Major unresolved issues with or opposition to this specification:
This work is being funded by:

We'd prefer the TAG provide feedback as (please delete all but the desired option):

☂️ open a single issue in our GitHub repo for the entire review

alice · 2020-11-24T09:05:56Z

One quick question: can you explain why the API is named around medical terms for bones, rather than something more straightforward?

asankah · 2020-11-24T14:49:53Z

The security/privacy self review states:

Data returned from this API, MUST NOT be so specific that one can detect individual users. If the underlying hardware returns data that is too precise, the User Agent MUST anonymize this data (ie by adding noise or rounding) before revealing it through the WebXR Hand Input API.

Could you elaborate a bit more on how an implementation should evaluate a noising or rounding strategy? I.e. how should an implementation evaluate anonymity?

Would there be recommendations around minimum fidelity for sensor readings?

Manishearth · 2020-11-24T17:57:15Z

@alice

One quick question: can you explain why the API is named around medical terms for bones, rather than something more straightforward?

This is what every other XR platform does for hand input and we wanted to be consistent with expectations. There aren't any good names to use otherwise, you could number things at best; but it still gets confusing: there is a joint before each knuckle that needs to be included (the thumb one is the most important one), and intuitively the finger ends at the knuckle.

Manishearth · 2020-11-24T18:00:06Z

@asankah

Could you elaborate a bit more on how an implementation should evaluate a noising or rounding strategy? I.e. how should an implementation evaluate anonymity?

At the moment, we don't have a clear idea of this: @fordacious / @thetuvix / @cabanier might though. This is one of the bits of privacy work I'd like to see as we move forward (since I consider the API surface mostly "done").

It also might be worth downgrading this to a SHOULD, since a valid choice for an implementation to make is to expose precise data but be clear about fingerprinting risks in the initial permissions prompt.

cabanier · 2020-11-24T18:25:42Z

The Oculus browser implementation exposes a hand model that is the same for everyone.
The underlying implementation has additional information to make the model better match the user's hands but we decided not to apply that to preserve privacy.

alice · 2020-12-01T09:32:46Z

Regarding naming: I see that the Unity hand tracking API, for example, doesn't use the medical names for bones. They use a number to indicate the joint number.

The TAG design principles doc notes:

API naming must be done in easily readable US English. Keep in mind that most web developers aren’t native English speakers. Whenever possible, names should be chosen that use common vocabulary a majority of English speakers are likely to understand when first encountering the name.

and also:

You will probably not be able to directly translate an API available to native applications to be a web API.

Instead, consider the functionality available from the native API, and the user needs it addresses, and design an API which meets those user needs, even if the implementation depends on the existing native API.

I don't think using the bone name reduces the ambiguity either, since you're referring to a joint rather than the bone in any case.

Each of your examples sets up a structure like:

[ [XRHand.INDEX_PHALANX_TIP, XRHand.INDEX_METACARPAL],
  [XRHand.MIDDLE_PHALANX_TIP, XRHand.MIDDLE_METACARPAL],
  [XRHand.RING_PHALANX_TIP, XRHand.RING_METACARPAL],
  [XRHand.LITTLE_PHALANX_TIP, XRHand.LITTLE_METACARPAL] ]

... would it work to have the API expose the hand data via a richer data structure?

Regarding privacy: the strategy @cabanier mentions seems like a good start.

I'm also trying to understand the relationship between hand as a member of XRInputSource, and the primary action concept. Does hand input provide a way of generating a primary action?

Also, can you give some background on how hand tracking works for people who are missing or unable to use one or more fingers on the hand(s) being used for hand tracking - how does this affect the data which is provided to the application?

Finally, maybe a silly question - if an application wanted to track both hands, would that be two separate InputSources?

domenic · 2020-12-01T16:44:46Z

An issue that may be of interest to the TAG, as it concerns design guidelines for modern APIs: immersive-web/webxr-hand-input#70

cabanier · 2020-12-01T22:17:51Z

Regarding naming: I see that the Unity hand tracking API, for example, doesn't use the medical names for bones. They use a number to indicate the joint number.

I agree that the current names are not easily understood, even by native English speakers.
Should we rename the joints with simpler names, much like the unity example you listed?

Each of your examples sets up a structure like:
...
... would it work to have the API expose the hand data via a richer data structure?

Actual code wouldn't use those structures. I think @Manishearth provided that to clarify how the mapping is done.

I'm also trying to understand the relationship between hand as a member of XRInputSource, and the primary action concept. Does hand input provide a way of generating a primary action?

The hands API is not involved in this. Each hand will also be a "controller" which has actions associated with it.

Also, can you give some background on how hand tracking works for people who are missing or unable to use one or more fingers on the hand(s) being used for hand tracking - how does this affect the data which is provided to the application?

There is an issue on this. The spec currently defines that the hand will always returns all the joints.

Finally, maybe a silly question - if an application wanted to track both hands, would that be two separate InputSources?

Yes :-)

alice · 2020-12-01T23:57:00Z

Should we rename the joints with simpler names, much like the unity example you listed?

I think that would be helpful to at least consider - naming the joints by number also makes it easier to understand the ordering without memorising or looking up the names of the bones each time (and also remembering that the joint comes before the named bone).

Actual code wouldn't use those structures. I think @Manishearth provided that to clarify how the mapping is done.

I see. Could we see some more realistic code examples somewhere?

The hands API is not involved in this. Each hand will also be a "controller" which has actions associated with it.

Can you expand on this? How would someone using hand input access the default action?

There is an issue on this. The spec currently defines that the hand will always returns all the joints.

Great issue, thank you!

Manishearth · 2020-12-02T00:02:45Z

Regarding naming: I see that the Unity hand tracking API, for example, doesn't use the medical names for bones. They use a number to indicate the joint number.

A problem with this is that it's not extensible, we're not exposing all of the hand joints that exist, we're exposing all of the hand joints that are typically used in VR hand tracking.

I find numbering to be more confusing because different platforms may choose to index differently: e.g. the indexing changes based on which carpals and metacarpals you include. For example, on Oculus/Unity only the thumb and pinky fingers have metacarpals, and the thumb also has a trapezium carpal bone. On the other hand (hah), OpenXR provides a metacarpal bone for all fingers, but no trapezium bone. So numbers don't really carry a cross-platform useful meaning.

If you just want to iterate over all of the joints, you can do that without knowing the names, but if you're going to be thinking about detecting gestures, I find names and a diagram to be far easier than plain numbers. Most humans have more than these 25 bones (+ tip "bones") in each hand, "index joint 0" doesn't tell me anything unless you show me a diagram.

I don't think using the bone name reduces the ambiguity either, since you're referring to a joint rather than the bone in any case.

it's both, really, since the orientation of that space is aligned with the named bone.

I'm also trying to understand the relationship between hand as a member of XRInputSource, and the primary action concept. Does hand input provide a way of generating a primary action?

Yes, but that's not under the purview of this spec at all. Oculus Browser and Hololens use pinch/grab actions for the primary action. The precise selection for the primary action gesture is up to the platform defaults.

Actual code wouldn't use those structures. I think @Manishearth provided that to clarify how the mapping is done.

The first example is doing this because it is outdated: it is iterable now so you don't need that array. The second example does need this.

I considered a structured approach in the past but there are basically many different ways to slice this data based on the gesture you need, so it made more sense to surface it as an indexable iterator and let people slice it themselves. Also, starting with a structured approach now may lock us out of being able to handle hands with more or less than five fingers in the future.

I can update the explainer to use the iterator where possible!

Also, can you give some background on how hand tracking works for people who are missing or unable to use one or more fingers on the hand(s) being used for hand tracking - how does this affect the data which is provided to the application?

As Rik said, immersive-web/webxr-hand-input#11 covers this. At the moment this is entirely based on platform defaults: some platforms may emulate a finger, others may not detect it as a hand (unfortunate, but not something we can control here).

Currently all of the hand tracking platforms out there are all-or-nothing, AIUI, which means that they will always report all joints, and if some joints don't exist they'll either emulate them or refuse to surface a hand.

I want to make progress here, but I fear that doing so without having platforms that support it is putting the cart before the horse. A likely solution would be where you can use an XR feature descriptor to opt in to joints being missing as an indicator of "I can handle whatever configuration you throw at me". Polydactyl hands will also need a similar approach.

Manishearth · 2020-12-02T00:07:15Z

Can you expand on this? How would someone using hand input access the default action?

It depends on the platform, it's typically some kind of pinching gesture. It's whatever people use for "select" when using hands on the rest of the platform, outside of the web. This is how the WebXR API treats the primary action for physical controllers as well: it's whatever button people will be using on the rest of the platform (usually a trigger).

Each XRHand is owned by an XRInputSource, which represents an input source, and the actions are tied to that, as defined in the core spec. The XRHand surfaces additional articulated joint information about physical hand input sources, but it was already spec-compliant for an XR device to use hands as input without needing to opt in to the XR Hand Input specification.

Manishearth · 2020-12-02T00:14:39Z

I can update the explainer to use the iterator where possible!

Oh, actually, both examples need it to be explicit. A structured API around this might be useful, and I'm open to adding one, but I'm wary of locking out accessibility in the future as platforms start exposing more info about hands with more or less than 5 fingers.

Manishearth · 2020-12-02T01:11:41Z

Note: We're probably changing the constants to enums.

alice · 2020-12-08T08:56:43Z

Thanks for making the change to enums!

Thanks also for the more in-depth explanation of why the anatomical terms make the most sense - the extensibility argument in particular is very reasonable.

Regarding a structured API - could you expand on the implication for accessibility?

cabanier · 2020-12-08T12:34:13Z

Regarding a structured API - could you expand on the implication for accessibility?

Can you elaborate what you mean by this?

Manishearth · 2020-12-08T18:51:23Z

Regarding a structured API - could you expand on the implication for accessibility?

As mentioned earlier I'm wary of designing anything that can handle users with uncommon hand configurations (e.g. polydactyl users) until we have accessible device APIs that this can be built and experimented on. It's reasonably easy to design things without closing the door to future improvements for the unstructured API, but the more structure we introduce, the more assumptions about the hand we introduce. Ideally, such a structured API would handle changes in hand structure. I would rather not close these doors, which is why I'd like to start with the unstructured API.

I'm not fully against adding a structured API -- I think it would be pretty nice to have -- but I'm mostly comfortable letting frameworks handle this right now.

alice · 2020-12-09T04:48:15Z

I guess I'm still not quite getting how a set of enums is more flexible than a fully structured API, since the naming of the enums already implies a certain hand structure.

Manishearth · 2020-12-09T05:13:43Z

I guess I'm still not quite getting how a set of enums is more flexible than a fully structured API, since the naming of the enums already implies a certain hand structure.

I think it's more that the enums are not necessarily super flexible, but they're also not the right approach for uncommon hand structures, which will likely need a level 2 and a structured API, but I don't want to design the structured API until we better understand how uncommon hand structures will work at the device level. The alternative is designing a structured API now, but having to design a second one when we get more devices that can handle uncommon hand structures and having a better understanding of how this API should work.

alice · 2020-12-09T22:06:23Z

Thanks for the explanation.

This does raise some questions about where the responsibility lies for designing a more inclusive API - if manufacturers are not being inclusive, do we just wait for them to get around to it? Do we spend some effort imagining what a more inclusive system might look like in the meantime?

I don't have answers for these questions, personally, but I think they're worth thinking about (obviously they don't just apply to this API, but it's a good example to consider).

Manishearth · 2020-12-09T22:40:14Z

Do we spend some effort imagining what a more inclusive system might look like in the meantime?

I have been spending some effort on this, and I plan to do more of this as well! I have ideas on how this could work well. I'm just wary of including this in the spec given that it actually working well requires a decent amount of buy in from device manufacturers, and I don't perceive the existence of the API to be sufficient pressure to do this.

I'm hoping to spend some of the WGs time on this issue (after all, many device manufacturers are part of the WG!) after having more conversations with potentially affected users, but I don't have the time to start that just yet.

fordacious · 2021-01-04T19:29:50Z

When is a TAG review officially completed? What is the next steps?

cabanier · 2021-01-05T19:59:36Z

When is a TAG review officially completed? What is the next steps?

@domenic the tag filed 2 issues. Was that the extent of the review or do we have to wait for an official blessing?

alice · 2021-01-12T09:04:08Z

We just discussed this in our breakout meeting.

Thank you so much for your patience and responsiveness through this process! We're happy with how this is progressing, so I'm proposing to close this (and it will likely be closed at our plenary tomorrow).

fordacious added the Progress: untriaged label Nov 12, 2020

fordacious mentioned this issue Nov 12, 2020

TAG review immersive-web/webxr-hand-input#66

Open

himorin mentioned this issue Nov 12, 2020

post-FPWD wide review tracking immersive-web/webxr-hand-input#57

Open

alice self-assigned this Nov 18, 2020

torgo self-assigned this Nov 18, 2020

torgo added this to the 2020-11-23-week milestone Nov 18, 2020

torgo removed the Progress: untriaged label Nov 18, 2020

torgo assigned hadleybeeman Nov 18, 2020

torgo added Venue: WebXR Venue: Immersive Web WG labels Nov 18, 2020

Manishearth mentioned this issue Dec 2, 2020

Consider not using numeric constants immersive-web/webxr-hand-input#70

Closed

cabanier mentioned this issue Dec 2, 2020

Tag feedback: change from constants to enums + change XRHand into a map immersive-web/webxr-hand-input#71

Merged

plinss modified the milestones: 2020-11-23-week, 2020-12-07-week Dec 7, 2020

torgo added the Progress: in progress label Dec 8, 2020

torgo removed this from the 2020-12-07-week milestone Dec 8, 2020

torgo added this to the 2021-01-11-week milestone Dec 8, 2020

TrevorFSmith mentioned this issue Dec 14, 2020

IWW Issue #31, December 15th ☃️ 2020 immersive-web/immersive-web-weekly#26

Merged

alice added Progress: propose closing we think it should be closed but are waiting on some feedback or consensus and removed Progress: in progress labels Jan 12, 2021

alice closed this as completed Jan 13, 2021

alcooper91 mentioned this issue Sep 12, 2024

WebXR Hand Input Module - Level 1 WebKit/standards-positions#395

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WebXR Hand Input API Specification #568

WebXR Hand Input API Specification #568

fordacious commented Nov 12, 2020

alice commented Nov 24, 2020

asankah commented Nov 24, 2020

Manishearth commented Nov 24, 2020 •

edited

Loading

Manishearth commented Nov 24, 2020

cabanier commented Nov 24, 2020

alice commented Dec 1, 2020

domenic commented Dec 1, 2020

cabanier commented Dec 1, 2020

alice commented Dec 1, 2020

Manishearth commented Dec 2, 2020

Manishearth commented Dec 2, 2020

Manishearth commented Dec 2, 2020

Manishearth commented Dec 2, 2020

alice commented Dec 8, 2020

cabanier commented Dec 8, 2020

Manishearth commented Dec 8, 2020

alice commented Dec 9, 2020

Manishearth commented Dec 9, 2020

alice commented Dec 9, 2020

Manishearth commented Dec 9, 2020

fordacious commented Jan 4, 2021 •

edited

Loading

cabanier commented Jan 5, 2021

alice commented Jan 12, 2021

WebXR Hand Input API Specification #568

WebXR Hand Input API Specification #568

Comments

fordacious commented Nov 12, 2020

alice commented Nov 24, 2020

asankah commented Nov 24, 2020

Manishearth commented Nov 24, 2020 • edited Loading

Manishearth commented Nov 24, 2020

cabanier commented Nov 24, 2020

alice commented Dec 1, 2020

domenic commented Dec 1, 2020

cabanier commented Dec 1, 2020

alice commented Dec 1, 2020

Manishearth commented Dec 2, 2020

Manishearth commented Dec 2, 2020

Manishearth commented Dec 2, 2020

Manishearth commented Dec 2, 2020

alice commented Dec 8, 2020

cabanier commented Dec 8, 2020

Manishearth commented Dec 8, 2020

alice commented Dec 9, 2020

Manishearth commented Dec 9, 2020

alice commented Dec 9, 2020

Manishearth commented Dec 9, 2020

fordacious commented Jan 4, 2021 • edited Loading

cabanier commented Jan 5, 2021

alice commented Jan 12, 2021

Manishearth commented Nov 24, 2020 •

edited

Loading

fordacious commented Jan 4, 2021 •

edited

Loading