Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug in lock-free sofi #2160

Open
elfenpiff opened this issue Jan 15, 2024 · 4 comments · May be fixed by #2214
Open

Bug in lock-free sofi #2160

elfenpiff opened this issue Jan 15, 2024 · 4 comments · May be fixed by #2214
Labels
bug Something isn't working

Comments

@elfenpiff
Copy link
Contributor

Required information

Assume the following scenario: A new subscriber subscribes to a service with a history size of N. The subscriber's buffer has a capacity of N as well, and the history is full.

  1. RouDi calls SoFi::push until sofi is full and the next push call would cause an overflow. The memory is correctly synced with m_write_pos.store(std::memory_order_release)
  2. The subscriber does not consume the history or anything in the buffer, sofi remains full.
  3. The publisher in another process and therefore also in another thread publishes a new sample. Since sofi is in overflow, the oldest value must copied out and returned to the user. But there is no corresponding m_write_pos.load(std::memory_order_acquire) call to sync the m_data memory from the roudi thread to the publisher thread. The publisher may return an unsynced garbage value.

See:

inline bool SpscSofi<ValueType, CapacityValue>::push(const ValueType& valueIn, ValueType& valueOut) noexcept

So, whenever the push method in sofi is called from multiple threads non-concurrently, those threads must sync m_data for the overflow case.

@budrus
Copy link
Contributor

budrus commented Jan 15, 2024

@elfenpiff This also assumes that there are multiple pushers. SOFI was designed for SPSC. All we said way that by accident is could also be SPMC, but we never had MPMC in mind

@elBoberido
Copy link
Member

Yes, this is not an issue for iceoryx1 since there is a Mutex in the ChunkDistributor which takes care of the synchronization. The queue is used as expected and guarded by a Mutex in the non-spsc case.

I would not complicate the code just catch a misuse.

@elBoberido
Copy link
Member

Well, there is one valid use case and we need to decide whether we want to support it. If one could guarantee that no concurrent pushes happen but still multiple producer threads access the SOFI, e.g. by exchanging an access token via a relaxes atomic, the synchronization of m_data is required. The question is whether this should be the task of the SOFI of the task of the mechanism which ensures that there is no concurrent access of multiple producer/consumer.

@elBoberido
Copy link
Member

@elfenpiff @budrus do you agree to close this issue. From my point of view this is not a bug since the SpscSofi is, as the name suggests, a single producer single consumer queue. If used in a multi pusher context, it needs to be guarded with a mutex (or different synchronization mechanism), like it is done in the ChunkDistributor where a mutex is used to synchronize the access to the data.

@albtam albtam linked a pull request Mar 5, 2024 that will close this issue
22 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants