-
Notifications
You must be signed in to change notification settings - Fork 58
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WebXR Hand Input API Specification #568
Comments
One quick question: can you explain why the API is named around medical terms for bones, rather than something more straightforward? |
The security/privacy self review states:
Could you elaborate a bit more on how an implementation should evaluate a noising or rounding strategy? I.e. how should an implementation evaluate anonymity? Would there be recommendations around minimum fidelity for sensor readings? |
This is what every other XR platform does for hand input and we wanted to be consistent with expectations. There aren't any good names to use otherwise, you could number things at best; but it still gets confusing: there is a joint before each knuckle that needs to be included (the thumb one is the most important one), and intuitively the finger ends at the knuckle. |
At the moment, we don't have a clear idea of this: @fordacious / @thetuvix / @cabanier might though. This is one of the bits of privacy work I'd like to see as we move forward (since I consider the API surface mostly "done"). It also might be worth downgrading this to a SHOULD, since a valid choice for an implementation to make is to expose precise data but be clear about fingerprinting risks in the initial permissions prompt. |
The Oculus browser implementation exposes a hand model that is the same for everyone. |
Regarding naming: I see that the Unity hand tracking API, for example, doesn't use the medical names for bones. They use a number to indicate the joint number. The TAG design principles doc notes:
and also:
I don't think using the bone name reduces the ambiguity either, since you're referring to a joint rather than the bone in any case. Each of your examples sets up a structure like: [ [XRHand.INDEX_PHALANX_TIP, XRHand.INDEX_METACARPAL],
[XRHand.MIDDLE_PHALANX_TIP, XRHand.MIDDLE_METACARPAL],
[XRHand.RING_PHALANX_TIP, XRHand.RING_METACARPAL],
[XRHand.LITTLE_PHALANX_TIP, XRHand.LITTLE_METACARPAL] ] ... would it work to have the API expose the hand data via a richer data structure? Regarding privacy: the strategy @cabanier mentions seems like a good start. I'm also trying to understand the relationship between Also, can you give some background on how hand tracking works for people who are missing or unable to use one or more fingers on the hand(s) being used for hand tracking - how does this affect the data which is provided to the application? Finally, maybe a silly question - if an application wanted to track both hands, would that be two separate |
An issue that may be of interest to the TAG, as it concerns design guidelines for modern APIs: immersive-web/webxr-hand-input#70 |
I agree that the current names are not easily understood, even by native English speakers.
Actual code wouldn't use those structures. I think @Manishearth provided that to clarify how the mapping is done.
The hands API is not involved in this. Each hand will also be a "controller" which has actions associated with it.
There is an issue on this. The spec currently defines that the hand will always returns all the joints.
Yes :-) |
I think that would be helpful to at least consider - naming the joints by number also makes it easier to understand the ordering without memorising or looking up the names of the bones each time (and also remembering that the joint comes before the named bone).
I see. Could we see some more realistic code examples somewhere?
Can you expand on this? How would someone using hand input access the default action?
Great issue, thank you! |
A problem with this is that it's not extensible, we're not exposing all of the hand joints that exist, we're exposing all of the hand joints that are typically used in VR hand tracking. I find numbering to be more confusing because different platforms may choose to index differently: e.g. the indexing changes based on which carpals and metacarpals you include. For example, on Oculus/Unity only the thumb and pinky fingers have metacarpals, and the thumb also has a trapezium carpal bone. On the other hand (hah), OpenXR provides a metacarpal bone for all fingers, but no trapezium bone. So numbers don't really carry a cross-platform useful meaning. If you just want to iterate over all of the joints, you can do that without knowing the names, but if you're going to be thinking about detecting gestures, I find names and a diagram to be far easier than plain numbers. Most humans have more than these 25 bones (+ tip "bones") in each hand, "index joint 0" doesn't tell me anything unless you show me a diagram.
it's both, really, since the orientation of that space is aligned with the named bone.
Yes, but that's not under the purview of this spec at all. Oculus Browser and Hololens use pinch/grab actions for the primary action. The precise selection for the primary action gesture is up to the platform defaults.
The first example is doing this because it is outdated: it is iterable now so you don't need that array. The second example does need this. I considered a structured approach in the past but there are basically many different ways to slice this data based on the gesture you need, so it made more sense to surface it as an indexable iterator and let people slice it themselves. Also, starting with a structured approach now may lock us out of being able to handle hands with more or less than five fingers in the future. I can update the explainer to use the iterator where possible!
As Rik said, immersive-web/webxr-hand-input#11 covers this. At the moment this is entirely based on platform defaults: some platforms may emulate a finger, others may not detect it as a hand (unfortunate, but not something we can control here). Currently all of the hand tracking platforms out there are all-or-nothing, AIUI, which means that they will always report all joints, and if some joints don't exist they'll either emulate them or refuse to surface a hand. I want to make progress here, but I fear that doing so without having platforms that support it is putting the cart before the horse. A likely solution would be where you can use an XR feature descriptor to opt in to joints being missing as an indicator of "I can handle whatever configuration you throw at me". Polydactyl hands will also need a similar approach. |
It depends on the platform, it's typically some kind of pinching gesture. It's whatever people use for "select" when using hands on the rest of the platform, outside of the web. This is how the WebXR API treats the primary action for physical controllers as well: it's whatever button people will be using on the rest of the platform (usually a trigger). Each XRHand is owned by an XRInputSource, which represents an input source, and the actions are tied to that, as defined in the core spec. The XRHand surfaces additional articulated joint information about physical hand input sources, but it was already spec-compliant for an XR device to use hands as input without needing to opt in to the XR Hand Input specification. |
Oh, actually, both examples need it to be explicit. A structured API around this might be useful, and I'm open to adding one, but I'm wary of locking out accessibility in the future as platforms start exposing more info about hands with more or less than 5 fingers. |
Note: We're probably changing the constants to enums. |
Thanks for making the change to enums! Thanks also for the more in-depth explanation of why the anatomical terms make the most sense - the extensibility argument in particular is very reasonable. Regarding a structured API - could you expand on the implication for accessibility? |
Can you elaborate what you mean by this? |
As mentioned earlier I'm wary of designing anything that can handle users with uncommon hand configurations (e.g. polydactyl users) until we have accessible device APIs that this can be built and experimented on. It's reasonably easy to design things without closing the door to future improvements for the unstructured API, but the more structure we introduce, the more assumptions about the hand we introduce. Ideally, such a structured API would handle changes in hand structure. I would rather not close these doors, which is why I'd like to start with the unstructured API. I'm not fully against adding a structured API -- I think it would be pretty nice to have -- but I'm mostly comfortable letting frameworks handle this right now. |
I guess I'm still not quite getting how a set of enums is more flexible than a fully structured API, since the naming of the enums already implies a certain hand structure. |
I think it's more that the enums are not necessarily super flexible, but they're also not the right approach for uncommon hand structures, which will likely need a level 2 and a structured API, but I don't want to design the structured API until we better understand how uncommon hand structures will work at the device level. The alternative is designing a structured API now, but having to design a second one when we get more devices that can handle uncommon hand structures and having a better understanding of how this API should work. |
Thanks for the explanation. This does raise some questions about where the responsibility lies for designing a more inclusive API - if manufacturers are not being inclusive, do we just wait for them to get around to it? Do we spend some effort imagining what a more inclusive system might look like in the meantime? I don't have answers for these questions, personally, but I think they're worth thinking about (obviously they don't just apply to this API, but it's a good example to consider). |
I have been spending some effort on this, and I plan to do more of this as well! I have ideas on how this could work well. I'm just wary of including this in the spec given that it actually working well requires a decent amount of buy in from device manufacturers, and I don't perceive the existence of the API to be sufficient pressure to do this. I'm hoping to spend some of the WGs time on this issue (after all, many device manufacturers are part of the WG!) after having more conversations with potentially affected users, but I don't have the time to start that just yet. |
When is a TAG review officially completed? What is the next steps? |
@domenic the tag filed 2 issues. Was that the extent of the review or do we have to wait for an official blessing? |
We just discussed this in our breakout meeting. Thank you so much for your patience and responsiveness through this process! We're happy with how this is progressing, so I'm proposing to close this (and it will likely be closed at our plenary tomorrow). |
HIQaH! QaH! TAG!
I'm requesting a TAG review of WebXR Hand Input.
The WebXR Hand Input module expands the WebXR Device API with the functionality to track articulated hand poses.
Further details:
We'd prefer the TAG provide feedback as (please delete all but the desired option):
☂️ open a single issue in our GitHub repo for the entire review
The text was updated successfully, but these errors were encountered: