Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WebXR Hand Input API Specification #568

Closed
1 task done
fordacious opened this issue Nov 12, 2020 · 23 comments
Closed
1 task done

WebXR Hand Input API Specification #568

fordacious opened this issue Nov 12, 2020 · 23 comments
Assignees
Labels
Progress: propose closing we think it should be closed but are waiting on some feedback or consensus Venue: Immersive Web WG Venue: WebXR

Comments

@fordacious
Copy link

HIQaH! QaH! TAG!

I'm requesting a TAG review of WebXR Hand Input.

The WebXR Hand Input module expands the WebXR Device API with the functionality to track articulated hand poses.

Further details:

  • I have reviewed the TAG's API Design Principles
  • Relevant time constraints or deadlines:
  • The group where the work on this specification is currently being done: Immersive Web Working Group
  • The group where standardization of this work is intended to be done (if current group is a community group or other incubation venue): Immersive Web Working Group
  • Major unresolved issues with or opposition to this specification:
  • This work is being funded by:

We'd prefer the TAG provide feedback as (please delete all but the desired option):

☂️ open a single issue in our GitHub repo for the entire review

@alice
Copy link

alice commented Nov 24, 2020

One quick question: can you explain why the API is named around medical terms for bones, rather than something more straightforward?

@asankah
Copy link

asankah commented Nov 24, 2020

The security/privacy self review states:

Data returned from this API, MUST NOT be so specific that one can detect individual users. If the underlying hardware returns data that is too precise, the User Agent MUST anonymize this data (ie by adding noise or rounding) before revealing it through the WebXR Hand Input API.

Could you elaborate a bit more on how an implementation should evaluate a noising or rounding strategy? I.e. how should an implementation evaluate anonymity?

Would there be recommendations around minimum fidelity for sensor readings?

@Manishearth
Copy link

Manishearth commented Nov 24, 2020

@alice

One quick question: can you explain why the API is named around medical terms for bones, rather than something more straightforward?

This is what every other XR platform does for hand input and we wanted to be consistent with expectations. There aren't any good names to use otherwise, you could number things at best; but it still gets confusing: there is a joint before each knuckle that needs to be included (the thumb one is the most important one), and intuitively the finger ends at the knuckle.

@Manishearth
Copy link

@asankah

Could you elaborate a bit more on how an implementation should evaluate a noising or rounding strategy? I.e. how should an implementation evaluate anonymity?

At the moment, we don't have a clear idea of this: @fordacious / @thetuvix / @cabanier might though. This is one of the bits of privacy work I'd like to see as we move forward (since I consider the API surface mostly "done").

It also might be worth downgrading this to a SHOULD, since a valid choice for an implementation to make is to expose precise data but be clear about fingerprinting risks in the initial permissions prompt.

@cabanier
Copy link

The Oculus browser implementation exposes a hand model that is the same for everyone.
The underlying implementation has additional information to make the model better match the user's hands but we decided not to apply that to preserve privacy.

@alice
Copy link

alice commented Dec 1, 2020

Regarding naming: I see that the Unity hand tracking API, for example, doesn't use the medical names for bones. They use a number to indicate the joint number.

The TAG design principles doc notes:

API naming must be done in easily readable US English. Keep in mind that most web developers aren’t native English speakers. Whenever possible, names should be chosen that use common vocabulary a majority of English speakers are likely to understand when first encountering the name.

and also:

You will probably not be able to directly translate an API available to native applications to be a web API.

Instead, consider the functionality available from the native API, and the user needs it addresses, and design an API which meets those user needs, even if the implementation depends on the existing native API.

I don't think using the bone name reduces the ambiguity either, since you're referring to a joint rather than the bone in any case.

Each of your examples sets up a structure like:

[ [XRHand.INDEX_PHALANX_TIP, XRHand.INDEX_METACARPAL],
  [XRHand.MIDDLE_PHALANX_TIP, XRHand.MIDDLE_METACARPAL],
  [XRHand.RING_PHALANX_TIP, XRHand.RING_METACARPAL],
  [XRHand.LITTLE_PHALANX_TIP, XRHand.LITTLE_METACARPAL] ]

... would it work to have the API expose the hand data via a richer data structure?

Regarding privacy: the strategy @cabanier mentions seems like a good start.

I'm also trying to understand the relationship between hand as a member of XRInputSource, and the primary action concept. Does hand input provide a way of generating a primary action?

Also, can you give some background on how hand tracking works for people who are missing or unable to use one or more fingers on the hand(s) being used for hand tracking - how does this affect the data which is provided to the application?

Finally, maybe a silly question - if an application wanted to track both hands, would that be two separate InputSources?

@domenic
Copy link
Member

domenic commented Dec 1, 2020

An issue that may be of interest to the TAG, as it concerns design guidelines for modern APIs: immersive-web/webxr-hand-input#70

@cabanier
Copy link

cabanier commented Dec 1, 2020

Regarding naming: I see that the Unity hand tracking API, for example, doesn't use the medical names for bones. They use a number to indicate the joint number.

I agree that the current names are not easily understood, even by native English speakers.
Should we rename the joints with simpler names, much like the unity example you listed?

Each of your examples sets up a structure like:
...
... would it work to have the API expose the hand data via a richer data structure?

Actual code wouldn't use those structures. I think @Manishearth provided that to clarify how the mapping is done.

I'm also trying to understand the relationship between hand as a member of XRInputSource, and the primary action concept. Does hand input provide a way of generating a primary action?

The hands API is not involved in this. Each hand will also be a "controller" which has actions associated with it.

Also, can you give some background on how hand tracking works for people who are missing or unable to use one or more fingers on the hand(s) being used for hand tracking - how does this affect the data which is provided to the application?

There is an issue on this. The spec currently defines that the hand will always returns all the joints.

Finally, maybe a silly question - if an application wanted to track both hands, would that be two separate InputSources?

Yes :-)

@alice
Copy link

alice commented Dec 1, 2020

Should we rename the joints with simpler names, much like the unity example you listed?

I think that would be helpful to at least consider - naming the joints by number also makes it easier to understand the ordering without memorising or looking up the names of the bones each time (and also remembering that the joint comes before the named bone).

Actual code wouldn't use those structures. I think @Manishearth provided that to clarify how the mapping is done.

I see. Could we see some more realistic code examples somewhere?

The hands API is not involved in this. Each hand will also be a "controller" which has actions associated with it.

Can you expand on this? How would someone using hand input access the default action?

There is an issue on this. The spec currently defines that the hand will always returns all the joints.

Great issue, thank you!

@Manishearth
Copy link

Regarding naming: I see that the Unity hand tracking API, for example, doesn't use the medical names for bones. They use a number to indicate the joint number.

A problem with this is that it's not extensible, we're not exposing all of the hand joints that exist, we're exposing all of the hand joints that are typically used in VR hand tracking.

I find numbering to be more confusing because different platforms may choose to index differently: e.g. the indexing changes based on which carpals and metacarpals you include. For example, on Oculus/Unity only the thumb and pinky fingers have metacarpals, and the thumb also has a trapezium carpal bone. On the other hand (hah), OpenXR provides a metacarpal bone for all fingers, but no trapezium bone. So numbers don't really carry a cross-platform useful meaning.

If you just want to iterate over all of the joints, you can do that without knowing the names, but if you're going to be thinking about detecting gestures, I find names and a diagram to be far easier than plain numbers. Most humans have more than these 25 bones (+ tip "bones") in each hand, "index joint 0" doesn't tell me anything unless you show me a diagram.

I don't think using the bone name reduces the ambiguity either, since you're referring to a joint rather than the bone in any case.

it's both, really, since the orientation of that space is aligned with the named bone.

I'm also trying to understand the relationship between hand as a member of XRInputSource, and the primary action concept. Does hand input provide a way of generating a primary action?

Yes, but that's not under the purview of this spec at all. Oculus Browser and Hololens use pinch/grab actions for the primary action. The precise selection for the primary action gesture is up to the platform defaults.

Actual code wouldn't use those structures. I think @Manishearth provided that to clarify how the mapping is done.

The first example is doing this because it is outdated: it is iterable now so you don't need that array. The second example does need this.

I considered a structured approach in the past but there are basically many different ways to slice this data based on the gesture you need, so it made more sense to surface it as an indexable iterator and let people slice it themselves. Also, starting with a structured approach now may lock us out of being able to handle hands with more or less than five fingers in the future.

I can update the explainer to use the iterator where possible!

Also, can you give some background on how hand tracking works for people who are missing or unable to use one or more fingers on the hand(s) being used for hand tracking - how does this affect the data which is provided to the application?

As Rik said, immersive-web/webxr-hand-input#11 covers this. At the moment this is entirely based on platform defaults: some platforms may emulate a finger, others may not detect it as a hand (unfortunate, but not something we can control here).

Currently all of the hand tracking platforms out there are all-or-nothing, AIUI, which means that they will always report all joints, and if some joints don't exist they'll either emulate them or refuse to surface a hand.

I want to make progress here, but I fear that doing so without having platforms that support it is putting the cart before the horse. A likely solution would be where you can use an XR feature descriptor to opt in to joints being missing as an indicator of "I can handle whatever configuration you throw at me". Polydactyl hands will also need a similar approach.

@Manishearth
Copy link

Can you expand on this? How would someone using hand input access the default action?

It depends on the platform, it's typically some kind of pinching gesture. It's whatever people use for "select" when using hands on the rest of the platform, outside of the web. This is how the WebXR API treats the primary action for physical controllers as well: it's whatever button people will be using on the rest of the platform (usually a trigger).

Each XRHand is owned by an XRInputSource, which represents an input source, and the actions are tied to that, as defined in the core spec. The XRHand surfaces additional articulated joint information about physical hand input sources, but it was already spec-compliant for an XR device to use hands as input without needing to opt in to the XR Hand Input specification.

@Manishearth
Copy link

I can update the explainer to use the iterator where possible!

Oh, actually, both examples need it to be explicit. A structured API around this might be useful, and I'm open to adding one, but I'm wary of locking out accessibility in the future as platforms start exposing more info about hands with more or less than 5 fingers.

@Manishearth
Copy link

Note: We're probably changing the constants to enums.

@alice
Copy link

alice commented Dec 8, 2020

Thanks for making the change to enums!

Thanks also for the more in-depth explanation of why the anatomical terms make the most sense - the extensibility argument in particular is very reasonable.

Regarding a structured API - could you expand on the implication for accessibility?

@torgo torgo removed this from the 2020-12-07-week milestone Dec 8, 2020
@torgo torgo added this to the 2021-01-11-week milestone Dec 8, 2020
@cabanier
Copy link

cabanier commented Dec 8, 2020

Regarding a structured API - could you expand on the implication for accessibility?

Can you elaborate what you mean by this?

@Manishearth
Copy link

Regarding a structured API - could you expand on the implication for accessibility?

As mentioned earlier I'm wary of designing anything that can handle users with uncommon hand configurations (e.g. polydactyl users) until we have accessible device APIs that this can be built and experimented on. It's reasonably easy to design things without closing the door to future improvements for the unstructured API, but the more structure we introduce, the more assumptions about the hand we introduce. Ideally, such a structured API would handle changes in hand structure. I would rather not close these doors, which is why I'd like to start with the unstructured API.

I'm not fully against adding a structured API -- I think it would be pretty nice to have -- but I'm mostly comfortable letting frameworks handle this right now.

@alice
Copy link

alice commented Dec 9, 2020

I guess I'm still not quite getting how a set of enums is more flexible than a fully structured API, since the naming of the enums already implies a certain hand structure.

@Manishearth
Copy link

I guess I'm still not quite getting how a set of enums is more flexible than a fully structured API, since the naming of the enums already implies a certain hand structure.

I think it's more that the enums are not necessarily super flexible, but they're also not the right approach for uncommon hand structures, which will likely need a level 2 and a structured API, but I don't want to design the structured API until we better understand how uncommon hand structures will work at the device level. The alternative is designing a structured API now, but having to design a second one when we get more devices that can handle uncommon hand structures and having a better understanding of how this API should work.

@alice
Copy link

alice commented Dec 9, 2020

Thanks for the explanation.

This does raise some questions about where the responsibility lies for designing a more inclusive API - if manufacturers are not being inclusive, do we just wait for them to get around to it? Do we spend some effort imagining what a more inclusive system might look like in the meantime?

I don't have answers for these questions, personally, but I think they're worth thinking about (obviously they don't just apply to this API, but it's a good example to consider).

@Manishearth
Copy link

Do we spend some effort imagining what a more inclusive system might look like in the meantime?

I have been spending some effort on this, and I plan to do more of this as well! I have ideas on how this could work well. I'm just wary of including this in the spec given that it actually working well requires a decent amount of buy in from device manufacturers, and I don't perceive the existence of the API to be sufficient pressure to do this.

I'm hoping to spend some of the WGs time on this issue (after all, many device manufacturers are part of the WG!) after having more conversations with potentially affected users, but I don't have the time to start that just yet.

@fordacious
Copy link
Author

fordacious commented Jan 4, 2021

When is a TAG review officially completed? What is the next steps?

@cabanier
Copy link

cabanier commented Jan 5, 2021

When is a TAG review officially completed? What is the next steps?

@domenic the tag filed 2 issues. Was that the extent of the review or do we have to wait for an official blessing?

@alice
Copy link

alice commented Jan 12, 2021

We just discussed this in our breakout meeting.

Thank you so much for your patience and responsiveness through this process! We're happy with how this is progressing, so I'm proposing to close this (and it will likely be closed at our plenary tomorrow).

@alice alice added Progress: propose closing we think it should be closed but are waiting on some feedback or consensus and removed Progress: in progress labels Jan 12, 2021
@alice alice closed this as completed Jan 13, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Progress: propose closing we think it should be closed but are waiting on some feedback or consensus Venue: Immersive Web WG Venue: WebXR
Projects
None yet
Development

No branches or pull requests

9 participants