Fix imports creating duplicate Editions #3471
Open
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This fixes bugs at both the sending and receiving instances: As the primary fix is at the sending end, a full fix will need to be collaborative at the network level.
Fixing at the sending end: don't send duplicate GeneratedNotes
The primary fix is to
handle_reading_status
: it callscreate_generated_note
which saves and broadcasts the GeneratedNote. Before this fix,handle_reading_status
then saved and broadcast the note again, creating two incoming GeneratedNotes for all federated BookWyrm servers, and thus triggering simultaneous or near-simultaneous imports of the associated book within thetag
reference. This is then dereferenced, creating two identical Works, the first of which ends up empty and the second of which ends up with two identical Editions.Fixing at the sending end: delay BookStatus broadcasts
My testing doesn't indicate one way or the other whether this happens, but there's a theoretical race condition for incoming editions that are ListItem or ShelfBook items. The Add activity needs to create the Edition and Work if they are not already on the receiving instance, and so does the Comment or GeneratedNote. The fix in
ActivitypubMixin::broadcast
sets a delay on these statuses to ensure that the Add activity is processed first, and thus the book will already exist when the status is processed.Fixing at the receiving end: prevent dereferencing the incoming Edition
The book attached to a ListItem or ShelfBook is an Edition. This means that when a receiving instance dereferences it, we end up with a recursion: The Edition dereferences the parent Work, which dereferences all the child editions in celery tasks. This includes the Edition that triggered the whole process in the first place.
I couldn't create a reproducable test of this, but I've seen evidence that this may cause a race condition where the incoming Edition ends up being duplicated - the task to create child Editions checks for existence before the original triggering Edition has saved, and in combination with the above we can then end up with three duplicates.
The changes to Note, Fields, and
ActivityObject::to_model
are aimed at preventing this by allowing us to compare related field ids to the original triggering remote_id before runningset_related_field
and excluding them if there is a match.Description
What type of Pull Request is this?
Does this PR change settings or dependencies, or break something?
Details of breaking or configuration changes (if any of above checked)
Documentation
Tests