Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Safer (checked) QCellOwnerSeq::new #50

Open
CAD97 opened this issue May 5, 2024 · 1 comment
Open

Safer (checked) QCellOwnerSeq::new #50

CAD97 opened this issue May 5, 2024 · 1 comment

Comments

@CAD97
Copy link

CAD97 commented May 5, 2024

std's Rc, Arc, and ThreadId have essentially the same pathological edge case as QCellOwnerSeq — incrementing the reference count (owner id sequence) in a loop without ever decreasing it (i.e. by forgetting the cloned Rc/Arc, or always for the monotonic ThreadId and QCellOwnerSeq counts) could overflow the counter and lead to UB. std declares this as pathological unsupported behavior — when would you ever legitimately need even 231 different owning references or threads, let alone 2631 — and aborts if this overflow would happen2.

At least on 64-bit targets (where exhausting the ID space is fundamentally impractical3), it'd be nice to have QCellOwnerSeq::new be made safe. I'd recommend doing a cmpxchg loop (fetch_update) like ThreadId, since creating new owners doesn't seem like it'd ever be a contended operation.

Doing this would entail one of:

  • making QCellOwnerSeq::new safe (in theory not API breaking, but can cause downstream unused_unsafe warnings and makes the non-panicking path now potentially panicking); or
  • changing QCellOwnerSeq::new's unsafe requirements (from "don't misuse colliding IDs" to "don't exhaust the ID space") and making a separate constructor4 that does checked sequence based owner IDs.

Since qcell doesn't expose a way to get the numeric value of a QCellOwnerID (and I think this is a good decision), there's no way to implement this downstream, even partially.

Footnotes

  1. For some context, with 128 threads all cooperatively incrementing the same shared counter at a rate of 6 GHz, it will still take 139 days at that rate to increment the the counter 263 times. It's effectively never going to happen accidentally.

  2. Rc uses cell.set(cell.get() + 1) and aborts when overflow happens. Arc uses atom.fetch_add(1, Relaxed) and aborts at isize::MAX, relying on the fact that having isize::MAX threads cloning Arc concurrently is impossible (at least two bytes of address space are consumed per thread). ThreadId uses a compare_exchange loop, presumably because creating a new thread will never be performance critical like cloning Arc can be.

  3. Counting to 263 is never going to happen in a reasonable timeframe (see previous footnote), but counting to 232 is entirely practical. OTOH, 32-bit targets with 64-bit atomic support could still use 64-bit owner IDs, zero-extending the address-based owner IDs.

  4. In both cases QCellOwner could also switch to using checked sequence-based IDs to avoid the alloc requirement, but do note that this would result in the ID space being consumed without reuse by QCellOwner as well as QCellOwnerSeq, which it isn't currently.

@uazu
Copy link
Owner

uazu commented May 11, 2024

Thanks for the comments, and sorry for the delay in replying.

I guess this must be for a no_std situation where you can't allocate? Otherwise QCellOwner::new is much easier. For no_std everyone is working at a much lower level, and making guarantees with unsafe is more often necessary, so I thought that it would be acceptable.

Also, from the reaction in the forums, it seems like any unsoundness at all, even once every 139 days with a determined effort, is regarded as a demonstrated weakness and therefore a bug. So to be conservative, this is marked as unsafe, even if for all practical non-malicious purposes it is totally sound. Originally I intended the number of bits in the ID to "tend to zero" like in maths once testing was complete, but after the reaction in the forums I realized that would never fly. For practical purposes, the ID could just be a wrapping 8 bits and still provide enough of a check to catch legitimate coding errors. So the unsafe just acts as a flag to the reviewer to make sure that the code isn't making a crazy attempt to wrap the ID and get a collision.

Since the ID goes on every single QCell, it was originally 32-bit everywhere. However when it was proposed to me to use addresses as a basis, I felt that a 64-bit ID would be acceptable on 64-bit because alignment might cause it to take up 64 bits anyway. 32 bits on 32-bit continues to make sense to not bloat all the cells.

I feel we got the balance about right on this one. Personally I prefer not to have long-term panics hanging over me, so using addresses or a wrapping ID means they will never run out.

Considering the option of adding both panicking and non-panicking versions of QCellOwner::new, they would have to be allocated separate ranges, or else use of the non-panicking one could wrap the value meaning that the panicking one wouldn't notice the wrap. So each would get 2^62 (or 2^30) values.

So I think I can do what you're asking, by adding a new panicking call with its own ID range. Does this sound okay?

I do wonder what the situation is where someone would prefer a hidden panic rather than no panic and an explicit "Safety: ..." notice in the code. But really I don't need to understand that. If you want this call it is not a big deal for me to add it.

So if you agree, I have two actions from this:

  • Add a panicking version of QCellOwnerSeq::new with its own ID range
  • Consider using fetch_update instead of fetch_add as an optimisation

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants