Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Account import fails with error: 'attributedTo' #3381

Open
dnoesgaard opened this issue Jun 9, 2024 · 3 comments · May be fixed by #3431
Open

Account import fails with error: 'attributedTo' #3381

dnoesgaard opened this issue Jun 9, 2024 · 3 comments · May be fixed by #3431
Assignees
Labels
bug Something isn't working

Comments

@dnoesgaard
Copy link

I'm trying to import an account, but the import fails. Only 60 out of ~500 books are imported and see this in the logs:

celery_worker-1   | [2024-06-09 11:57:38,222: ERROR/ForkPoolWorker-4] User Import Job 4 Failed with error: 'attributedTo'
celery_worker-1   | Traceback (most recent call last):
celery_worker-1   |   File "/app/bookwyrm/models/bookwyrm_import_job.py", line 63, in start_import_task
celery_worker-1   |     process_books(job, tar)
celery_worker-1   |   File "/app/bookwyrm/models/bookwyrm_import_job.py", line 92, in process_books
celery_worker-1   |     upsert_statuses(
celery_worker-1   |   File "/app/bookwyrm/models/bookwyrm_import_job.py", line 185, in upsert_statuses
celery_worker-1   |     user, status["attributedTo"]
celery_worker-1   |           ~~~~~~^^^^^^^^^^^^^^^^
celery_worker-1   | KeyError: 'attributedTo'

I'm attaching archive.json from the export tarball
archive.json

Any idea what might be the reason?

@dnoesgaard dnoesgaard added the bug Something isn't working label Jun 9, 2024
@hughrun
Copy link
Contributor

hughrun commented Jun 26, 2024

oof, that's a bug.

In your export there's a Tombstone comment for the book "Piranesi":

{
    "id": "https://books.theunseen.city/user/daniel/comment/50874",
    "type": "Tombstone",
    "@context": "https://www.w3.org/ns/activitystreams",
    "progress": null,
    "progress_mode": "PG"
}

Because it's a Tombstone, it can't be attributed to any user, so it doesn't have an attributedTo key or value, hence the loop fails with a KeyError.

There really needs to be some more robust exception checking in the user account import process generally but in this case we just need to skip this section and import the comment as-is.

@hughrun
Copy link
Contributor

hughrun commented Jul 15, 2024

I've made some adjustments to bookwyrm/models/bookwyrm_import_job.py to handle tombstones and non-existent values better but I'm getting a different error now when I attempt importing your file in a dev system. It's fairly clear what it's saying, but I'm not clear on what would be causing it. I believe this is tripping when we add statuses. @mouse-reeve any ideas? Could it be a race condition?

celery_worker-1   | [2024-07-15 07:11:46,546: ERROR/ForkPoolWorker-3] Task bookwyrm.activitypub.base_activity.set_related_field[cda3f85e-9619-45b3-a2ec-919fac4fde04] raised unexpected: IntegrityError('duplicate key value violates unique constraint "bookwyrm_user_username_key"\nDETAIL:  Key (username)=([email protected]) already exists.\n')
celery_worker-1   | Traceback (most recent call last):
celery_worker-1   |   File "/usr/local/lib/python3.11/site-packages/django/db/backends/utils.py", line 89, in _execute
celery_worker-1   |     return self.cursor.execute(sql, params)
celery_worker-1   |            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
celery_worker-1   | psycopg2.errors.UniqueViolation: duplicate key value violates unique constraint "bookwyrm_user_username_key"
celery_worker-1   | DETAIL:  Key (username)=([email protected]) already exists.
celery_worker-1   |
celery_worker-1   |
celery_worker-1   | The above exception was the direct cause of the following exception:
celery_worker-1   |
celery_worker-1   | Traceback (most recent call last):
celery_worker-1   |   File "/usr/local/lib/python3.11/site-packages/celery/app/trace.py", line 477, in trace_task
celery_worker-1   |     R = retval = fun(*args, **kwargs)
celery_worker-1   |                  ^^^^^^^^^^^^^^^^^^^^
celery_worker-1   |   File "/usr/local/lib/python3.11/site-packages/celery/app/trace.py", line 760, in __protected_call__
celery_worker-1   |     return self.run(*args, **kwargs)
celery_worker-1   |            ^^^^^^^^^^^^^^^^^^^^^^^^^
celery_worker-1   |   File "/usr/local/lib/python3.11/contextlib.py", line 81, in inner
celery_worker-1   |     return func(*args, **kwds)
celery_worker-1   |            ^^^^^^^^^^^^^^^^^^^
celery_worker-1   |   File "/app/bookwyrm/activitypub/base_activity.py", line 288, in set_related_field
celery_worker-1   |     item = activity.to_model(model=model)
celery_worker-1   |            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
celery_worker-1   |   File "/app/bookwyrm/activitypub/base_activity.py", line 160, in to_model
celery_worker-1   |     changed = field.set_field_from_activity(
celery_worker-1   |               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
celery_worker-1   |   File "/app/bookwyrm/models/fields.py", line 88, in set_field_from_activity
celery_worker-1   |     formatted = self.field_from_activity(
celery_worker-1   |                 ^^^^^^^^^^^^^^^^^^^^^^^^^
celery_worker-1   |   File "/app/bookwyrm/models/fields.py", line 174, in field_from_activity
celery_worker-1   |     return activitypub.resolve_remote_id(
celery_worker-1   |            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
celery_worker-1   |   File "/app/bookwyrm/activitypub/base_activity.py", line 398, in resolve_remote_id
celery_worker-1   |     return item.to_model(model=model, instance=result, save=save)
celery_worker-1   |            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
celery_worker-1   |   File "/app/bookwyrm/activitypub/base_activity.py", line 174, in to_model
celery_worker-1   |     changed = field.set_field_from_activity(
celery_worker-1   |               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
celery_worker-1   |   File "/app/bookwyrm/models/fields.py", line 466, in set_field_from_activity
celery_worker-1   |     getattr(instance, self.name).save(*formatted, save=save)
celery_worker-1   |   File "/usr/local/lib/python3.11/site-packages/django/db/models/fields/files.py", line 99, in save
celery_worker-1   |     self.instance.save()
celery_worker-1   |   File "/app/bookwyrm/models/user.py", line 365, in save
celery_worker-1   |     super().save(*args, update_fields=update_fields, **kwargs)
celery_worker-1   |   File "/app/bookwyrm/models/activitypub_mixin.py", line 215, in save
celery_worker-1   |     super().save(*args, **kwargs)
celery_worker-1   |   File "/usr/local/lib/python3.11/site-packages/django/contrib/auth/base_user.py", line 76, in save
celery_worker-1   |     super().save(*args, **kwargs)
celery_worker-1   |   File "/usr/local/lib/python3.11/site-packages/django/db/models/base.py", line 814, in save
celery_worker-1   |     self.save_base(
celery_worker-1   |   File "/usr/local/lib/python3.11/site-packages/model_utils/tracker.py", line 378, in inner
celery_worker-1   |     return original(instance, *args, **kwargs)
celery_worker-1   |            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
celery_worker-1   |   File "/usr/local/lib/python3.11/site-packages/django/db/models/base.py", line 877, in save_base
celery_worker-1   |     updated = self._save_table(
celery_worker-1   |               ^^^^^^^^^^^^^^^^^
celery_worker-1   |   File "/usr/local/lib/python3.11/site-packages/django/db/models/base.py", line 1020, in _save_table
celery_worker-1   |     results = self._do_insert(
celery_worker-1   |               ^^^^^^^^^^^^^^^^
celery_worker-1   |   File "/usr/local/lib/python3.11/site-packages/django/db/models/base.py", line 1061, in _do_insert
celery_worker-1   |     return manager._insert(
celery_worker-1   |            ^^^^^^^^^^^^^^^^
celery_worker-1   |   File "/usr/local/lib/python3.11/site-packages/django/db/models/manager.py", line 87, in manager_method
celery_worker-1   |     return getattr(self.get_queryset(), name)(*args, **kwargs)
celery_worker-1   |            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
celery_worker-1   |   File "/usr/local/lib/python3.11/site-packages/django/db/models/query.py", line 1805, in _insert
celery_worker-1   |     return query.get_compiler(using=using).execute_sql(returning_fields)
celery_worker-1   |            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
celery_worker-1   |   File "/usr/local/lib/python3.11/site-packages/django/db/models/sql/compiler.py", line 1822, in execute_sql
celery_worker-1   |     cursor.execute(sql, params)
celery_worker-1   |   File "/usr/local/lib/python3.11/site-packages/django/db/backends/utils.py", line 102, in execute
celery_worker-1   |     return super().execute(sql, params)
celery_worker-1   |            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
celery_worker-1   |   File "/usr/local/lib/python3.11/site-packages/django/db/backends/utils.py", line 67, in execute
celery_worker-1   |     return self._execute_with_wrappers(
celery_worker-1   |            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
celery_worker-1   |   File "/usr/local/lib/python3.11/site-packages/django/db/backends/utils.py", line 80, in _execute_with_wrappers
celery_worker-1   |     return executor(sql, params, many, context)
celery_worker-1   |            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
celery_worker-1   |   File "/usr/local/lib/python3.11/site-packages/django/db/backends/utils.py", line 84, in _execute
celery_worker-1   |     with self.db.wrap_database_errors:
celery_worker-1   |   File "/usr/local/lib/python3.11/site-packages/django/db/utils.py", line 91, in __exit__
celery_worker-1   |     raise dj_exc_value.with_traceback(traceback) from exc_value
celery_worker-1   |   File "/usr/local/lib/python3.11/site-packages/django/db/backends/utils.py", line 89, in _execute
celery_worker-1   |     return self.cursor.execute(sql, params)
celery_worker-1   |            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
celery_worker-1   | django.db.utils.IntegrityError: duplicate key value violates unique constraint "bookwyrm_user_username_key"
celery_worker-1   | DETAIL:  Key (username)=([email protected]) already exists.

@hughrun hughrun self-assigned this Jul 21, 2024
@hughrun
Copy link
Contributor

hughrun commented Aug 12, 2024

I've been looking at this today and I think I need to refactor bookwyrm_import_job.py with some childjobs so any problematic items can be logged and bypassed rather than causing the import to completely break partway through.

@hughrun hughrun linked a pull request Sep 8, 2024 that will close this issue
14 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants