Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix imports creating duplicate Editions #3471

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

hughrun
Copy link
Contributor

@hughrun hughrun commented Nov 16, 2024

This fixes bugs at both the sending and receiving instances: As the primary fix is at the sending end, a full fix will need to be collaborative at the network level.

Fixing at the sending end: don't send duplicate GeneratedNotes

The primary fix is to handle_reading_status: it calls create_generated_note which saves and broadcasts the GeneratedNote. Before this fix, handle_reading_status then saved and broadcast the note again, creating two incoming GeneratedNotes for all federated BookWyrm servers, and thus triggering simultaneous or near-simultaneous imports of the associated book within the tag reference. This is then dereferenced, creating two identical Works, the first of which ends up empty and the second of which ends up with two identical Editions.

Fixing at the sending end: delay BookStatus broadcasts

My testing doesn't indicate one way or the other whether this happens, but there's a theoretical race condition for incoming editions that are ListItem or ShelfBook items. The Add activity needs to create the Edition and Work if they are not already on the receiving instance, and so does the Comment or GeneratedNote. The fix in ActivitypubMixin::broadcast sets a delay on these statuses to ensure that the Add activity is processed first, and thus the book will already exist when the status is processed.

Fixing at the receiving end: prevent dereferencing the incoming Edition

The book attached to a ListItem or ShelfBook is an Edition. This means that when a receiving instance dereferences it, we end up with a recursion: The Edition dereferences the parent Work, which dereferences all the child editions in celery tasks. This includes the Edition that triggered the whole process in the first place.

I couldn't create a reproducable test of this, but I've seen evidence that this may cause a race condition where the incoming Edition ends up being duplicated - the task to create child Editions checks for existence before the original triggering Edition has saved, and in combination with the above we can then end up with three duplicates.

The changes to Note, Fields, and ActivityObject::to_model are aimed at preventing this by allowing us to compare related field ids to the original triggering remote_id before running set_related_field and excluding them if there is a match.

Description

What type of Pull Request is this?

  • Bug Fix
  • Enhancement
  • Plumbing / Internals / Dependencies
  • Refactor

Does this PR change settings or dependencies, or break something?

  • This PR changes or adds default settings, configuration, or .env values
  • This PR changes or adds dependencies
  • This PR introduces other breaking changes

Details of breaking or configuration changes (if any of above checked)

Documentation

  • New or amended documentation will be required if this PR is merged
  • I have created a matching pull request in the Documentation repository
  • I intend to create a matching pull request in the Documentation repository after this PR is merged

Tests

  • My changes do not need new tests
  • All tests I have added are passing
  • I have written tests but need help to make them pass
  • I have not written tests and need help to write them

This fixes bugs at both the sending and receiving instances:
As the primary fix is at the sending end, a full fix will need to be collaborative at the network level.

**Fixing at the sending end: don't send duplicate GeneratedNotes**

The primary fix is to `handle_reading_status`: it calls `create_generated_note` which saves and broadcasts the GeneratedNote.
Before this fix, `handle_reading_status` then saved and broadcast the note again, creating two incoming GeneratedNotes
for all federated BookWyrm servers, and thus triggering simultaneous or near-simultaneous imports of the associated book
within the `tag` reference. This is then dereferenced, creating two identical Works, the first of which ends up empty
and the second of which ends up with two identical Editions.

**Fixing at the sending end: delay BookStatus broadcasts**

My testing doesn't indicate one way or the other whether this happens, but there's a theoretical race condition for incoming
editions that are ListItem or ShelfBook items. The Add activity needs to create the Edition and Work if they are not already
on the receiving instance, and so does the Comment or GeneratedNote. The fix in `ActivitypubMixin::broadcast` sets a delay on
these statuses to ensure that the Add activity is processed first, and thus the book will already exist when the status is
processed.

**Fixing at the receiving end: prevent dereferencing the incoming Edition**

The book attached to a ListItem or ShelfBook is an Edition. This means that when a receiving instance dereferences it, we
end up with a recursion: The Edition dereferences the parent Work, which dereferences all the child editions in celery tasks.
This includes the Edition that triggered the whole process in the first place.

I couldn't create a reproducable test of this, but I've seen evidence that this can cause a race condition where the incoming
Edition ends up being duplicated - the task to create child Editions checks for existence before the original triggering
Edition has saved.

The changes to Note, Fields, and `ActivityObject::to_model` are aimed at preventing this by allowing us to compare related
field ids to the original triggering remote_id before running `set_related_field` and excluding them if there is a match.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
1 participant