Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for the Research Organization Registry (ROR) #4483

Open
wants to merge 24 commits into
base: master
Choose a base branch
from
Open

Conversation

joemull
Copy link
Member

@joemull joemull commented Nov 8, 2024

Closes #3168.

Overview

This piece of work adds a ROR data model, including affiliations, organizations, organization names, and locations. It includes an import routine to fetch and process ROR's full database. I've included as much backwards compatibility as possible for importers that do not know about ROR. It also adds a user interface for editing one's own affiliations, and an admin interface for all of the relevant models.

Command line interface

❯ python src/manage.py import_ror_data
DEBUG 2024-12-09 18:41:33,666 connectionpool P:131157 T:135176669522816 Starting new HTTPS connection (1): zenodo.org:443
DEBUG 2024-12-09 18:41:34,697 connectionpool P:131157 T:135176669522816 https://zenodo.org:443 "GET /api/communities/ror-data/records?sort=newest HTTP/1.1" 200 None
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 100/100 [00:00<00:00, 101.06it/s]
INFO 2024-12-09 18:41:38,658 import_ror_data P:131157 T:135176669522816 successful

User interface

image
image
image
image
image
image
image

Some functionality is still to do:

  • As a system admin, I can import the organizations more quickly (currently the process is slow due to the need to check existing data while creating new records to avoid duplicates)

  • A cron job is created to run ROR imports when I install Janeway in production

  • As the author of an article submission, I can add an affiliation for myself and my coauthors (see Update the author submission screen #4519)

  • As the author of a repository submission, I can add an affiliation for myself and my coauthors

  • When searching for an organization, I can filter by country subdivision or continent (schema v2.1 has this data)

  • As an editor, I can edit the affiliations of frozen author records on articles in the pipeline

  • As a publishing librarian or research impact librarian, I can see ROR data in Crossref deposits, OAI-PMH feeds, and JATS files exposed to web crawlers so that I can track research impact

  • As a publisher importing metadata, I can import RORs using the import / export / update interface

  • As an editor/publisher I can to be able to pull reports using ROR as a filter

  • As a marketer/publisher I can pull reports that compare authors against the consortial billing plugin

  • Write a validator on the Affiliation form to check that only one of author, frozen_author, and preprint_author is present. Here is an earlier version of this that probably needs some tweaking:

    def clean(self):
        authorlike_links = set()
        for field in ['account', 'frozen_author', 'preprint_author']:
            if self.cleaned_data.get(field):
                authorlike_links.add(field)
        if len(authorlike_links) > 1:
            raise ValidationError(
                f'Affiliation can only have one of author, ' \
                f'frozen_author, or preprint_author. ' \
                f'Found multiple: { authorlike_links }'
            )
    
    def test_affiliation_clean(self):
        with self.assertRaises(ValidationError):
            self.affiliation_lecturer.frozen_author = self.kathleen_booth_frozen
            self.affiliation_lecturer.clean()

Some things still need design / feature development:

  • As any user, I can enter location metadata for my account that is separate from an affiliation (do we want this?)
  • As any user, when I enter a custom organization, I can add a custom location to the organization (do we want this?)
  • When I search for and add an author in the author submission interface (Update the author submission screen #4519), the ROR affiliations are pulled in from ORCID API search results
  • As a staff member, I can edit the affiliations of user accounts with a role in my journal

@joemull joemull linked an issue Nov 15, 2024 that may be closed by this pull request
1 task
@joemull joemull force-pushed the 3168-ror branch 2 times, most recently from d314d51 to cc46446 Compare December 9, 2024 18:20
@joemull joemull requested a review from ajrbyers December 9, 2024 18:43
@joemull joemull requested a review from MartinPaulEve December 9, 2024 18:44
@joemull joemull marked this pull request as ready for review December 9, 2024 18:44
@MartinPaulEve
Copy link
Contributor

If you enter an organization name that isn't found (e.g., in test import, "Birkbeck") and then click on one of the pagination items at the bottom, you are redirected to an erroring page eg:

http://127.0.0.1:8000/JRNL/profile/organization/search/?q=birkbeck&page=3&paginate_by=25

This errors because the item isn't found, but the search query is inserted into the pagination link

Copy link
Contributor

@MartinPaulEve MartinPaulEve left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This mostly looks really good. Just a few small things.

  • Pagination in ROR search currently redirects to page not found error
  • List of affiliations should show which is primary?
  • RORImportStatus should not be duplicated?

src/core/logic.py Show resolved Hide resolved
src/submission/tests.py Show resolved Hide resolved
@MartinPaulEve MartinPaulEve removed their assignment Dec 13, 2024
@joemull
Copy link
Member Author

joemull commented Dec 13, 2024

  • Pagination in ROR search currently redirects to page not found error

This turned out to be a Django bug I believe. I think it's obscure because most people don't encounter it, since they submit their search query to change the list of items before selecting a page.

But I've gone ahead and fixed it for this view.

  • List of affiliations should show which is primary?

Yes, good call. I added a label.

I also updated the save method to make the affiliation primary if it's the first one.

image

  • RORImportStatus should not be duplicated?

That's standard, I'm pretty sure. See comment inline.

@joemull joemull assigned MartinPaulEve and unassigned ajrbyers Dec 13, 2024
@joemull
Copy link
Member Author

joemull commented Jan 6, 2025

Martin approved this off GitHub on 13 Dec. Moving it along to @ajrbyers.

@joemull joemull requested review from mauromsl and removed request for MartinPaulEve January 6, 2025 10:53
@joemull joemull assigned ajrbyers and unassigned MartinPaulEve Jan 6, 2025
@joemull joemull removed the request for review from mauromsl January 6, 2025 10:53
Copy link
Member

@ajrbyers ajrbyers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is generally very good. The CBVs are growing on me... slowly.

There are some comments inline that should be addressed. Here are some general comments in addition:

  • Institution and Affiliation fields are removed without preservation.
  • Mauro made a suggestion to handle setting inst/dept etc using the Manager, this should be added.
  • When my affiliation was snapshotted it did not include the start/end date or the title.
  • There is a conflict currently listed
Screenshot 2025-01-15 at 10 52 31

def add_arguments(self, parser):
parser.add_argument(
'--test_full_import',
help='By default, the command only runs 100 records when DEBUG=True.'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These lines are too long.

try:
organization = cls.objects.get(labels__value=value)
created = False
except cls.DoesNotExist or cls.MultipleObjectsReturned:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This wont catch cls.MultipleObjectsReturned. It should be a tuple except (cls.DoesNotExist, cls.MultipleObjectsReturned):

try:
organization = cls.objects.get(aliases__value=value)
created = False
except cls.DoesNotExist or cls.MultipleObjectsReturned:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same issue here. This should be a tuple.

ror__exact='',
)
created = False
except cls.DoesNotExist or cls.MultipleObjectsReturned:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same issue here. This should be a tuple.

# canonical name data from ROR (e.g. labels)?
try:
organization = cls.objects.get(labels__value=value)
created = False
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If an exception is raised created wont be set. Should it be set before the try block?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice.

@joemull joemull assigned mauromsl and unassigned ajrbyers Jan 15, 2025
@joemull joemull requested a review from mauromsl January 15, 2025 11:18
@joemull joemull requested review from ajrbyers and mauromsl January 27, 2025 14:09
@joemull joemull assigned ajrbyers and unassigned joemull Jan 27, 2025
@mauromsl mauromsl added this to the v1.8.0 Tracker milestone Feb 6, 2025
@ajrbyers
Copy link
Member

ajrbyers commented Feb 6, 2025

@joemull running the migrations on a clean OLH database gives me:

Running migrations:
  Applying utils.0035_rorimport_rorimporterror... OK
  Applying core.0100_location_organization_affiliation... OK
  Applying core.0101_migrate_affiliation_institution...Traceback (most recent call last):
  File "/Users/ajrbyers/.venvs/janeway/lib/python3.9/site-packages/django/db/backends/utils.py", line 89, in _execute
    return self.cursor.execute(sql, params)
psycopg2.errors.StringDataRightTruncation: value too long for type character varying(200)


The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/Users/ajrbyers/Code/janeway/src/manage.py", line 16, in <module>
    execute_from_command_line(sys.argv)
  File "/Users/ajrbyers/.venvs/janeway/lib/python3.9/site-packages/django/core/management/__init__.py", line 442, in execute_from_command_line
    utility.execute()
  File "/Users/ajrbyers/.venvs/janeway/lib/python3.9/site-packages/django/core/management/__init__.py", line 436, in execute
    self.fetch_command(subcommand).run_from_argv(self.argv)
  File "/Users/ajrbyers/.venvs/janeway/lib/python3.9/site-packages/django/core/management/base.py", line 412, in run_from_argv
    self.execute(*args, **cmd_options)
  File "/Users/ajrbyers/.venvs/janeway/lib/python3.9/site-packages/django/core/management/base.py", line 458, in execute
    output = self.handle(*args, **options)
  File "/Users/ajrbyers/.venvs/janeway/lib/python3.9/site-packages/django/core/management/base.py", line 106, in wrapper
    res = handle_func(*args, **kwargs)
  File "/Users/ajrbyers/.venvs/janeway/lib/python3.9/site-packages/django/core/management/commands/migrate.py", line 356, in handle
    post_migrate_state = executor.migrate(
  File "/Users/ajrbyers/.venvs/janeway/lib/python3.9/site-packages/django/db/migrations/executor.py", line 135, in migrate
    state = self._migrate_all_forwards(
  File "/Users/ajrbyers/.venvs/janeway/lib/python3.9/site-packages/django/db/migrations/executor.py", line 167, in _migrate_all_forwards
    state = self.apply_migration(
  File "/Users/ajrbyers/.venvs/janeway/lib/python3.9/site-packages/django/db/migrations/executor.py", line 252, in apply_migration
    state = migration.apply(state, schema_editor)
  File "/Users/ajrbyers/.venvs/janeway/lib/python3.9/site-packages/django/db/migrations/migration.py", line 132, in apply
    operation.database_forwards(
  File "/Users/ajrbyers/.venvs/janeway/lib/python3.9/site-packages/django/db/migrations/operations/special.py", line 193, in database_forwards
    self.code(from_state.apps, schema_editor)
  File "/Users/ajrbyers/Code/janeway/src/core/migrations/0101_migrate_affiliation_institution.py", line 71, in migrate_affiliation_institution
    create_affiliation(
  File "/Users/ajrbyers/Code/janeway/src/core/migrations/0101_migrate_affiliation_institution.py", line 47, in create_affiliation
    organization = create_organization(apps, old_institution, old_country)
  File "/Users/ajrbyers/Code/janeway/src/core/migrations/0101_migrate_affiliation_institution.py", line 30, in create_organization
    OrganizationName.objects.create(
  File "/Users/ajrbyers/.venvs/janeway/lib/python3.9/site-packages/django/db/models/manager.py", line 87, in manager_method
    return getattr(self.get_queryset(), name)(*args, **kwargs)
  File "/Users/ajrbyers/.venvs/janeway/lib/python3.9/site-packages/django/db/models/query.py", line 658, in create
    obj.save(force_insert=True, using=self.db)
  File "/Users/ajrbyers/.venvs/janeway/lib/python3.9/site-packages/django/db/models/base.py", line 814, in save
    self.save_base(
  File "/Users/ajrbyers/.venvs/janeway/lib/python3.9/site-packages/django/db/models/base.py", line 877, in save_base
    updated = self._save_table(
  File "/Users/ajrbyers/.venvs/janeway/lib/python3.9/site-packages/django/db/models/base.py", line 1020, in _save_table
    results = self._do_insert(
  File "/Users/ajrbyers/.venvs/janeway/lib/python3.9/site-packages/django/db/models/base.py", line 1061, in _do_insert
    return manager._insert(
  File "/Users/ajrbyers/.venvs/janeway/lib/python3.9/site-packages/django/db/models/manager.py", line 87, in manager_method
    return getattr(self.get_queryset(), name)(*args, **kwargs)
  File "/Users/ajrbyers/.venvs/janeway/lib/python3.9/site-packages/django/db/models/query.py", line 1805, in _insert
    return query.get_compiler(using=using).execute_sql(returning_fields)
  File "/Users/ajrbyers/.venvs/janeway/lib/python3.9/site-packages/django/db/models/sql/compiler.py", line 1822, in execute_sql
    cursor.execute(sql, params)
  File "/Users/ajrbyers/.venvs/janeway/lib/python3.9/site-packages/django/db/backends/utils.py", line 102, in execute
    return super().execute(sql, params)
  File "/Users/ajrbyers/.venvs/janeway/lib/python3.9/site-packages/django/db/backends/utils.py", line 67, in execute
    return self._execute_with_wrappers(
  File "/Users/ajrbyers/.venvs/janeway/lib/python3.9/site-packages/django/db/backends/utils.py", line 80, in _execute_with_wrappers
    return executor(sql, params, many, context)
  File "/Users/ajrbyers/.venvs/janeway/lib/python3.9/site-packages/django/db/backends/utils.py", line 89, in _execute
    return self.cursor.execute(sql, params)
  File "/Users/ajrbyers/.venvs/janeway/lib/python3.9/site-packages/django/db/utils.py", line 91, in __exit__
    raise dj_exc_value.with_traceback(traceback) from exc_value
  File "/Users/ajrbyers/.venvs/janeway/lib/python3.9/site-packages/django/db/backends/utils.py", line 89, in _execute
    return self.cursor.execute(sql, params)
django.db.utils.DataError: value too long for type character varying(200)

Looks like OrganizationName.value isn't large enough for the old institution field:

janeway/src/core/models.py

Lines 277 to 282 in d9c1ea5

institution = models.CharField(
max_length=1000,
blank=True,
verbose_name=_('Institution'),
validators=[plain_text_validator],
)

Copy link
Member

@ajrbyers ajrbyers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comment on migration error.

@ajrbyers ajrbyers assigned joemull and unassigned ajrbyers Feb 6, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add support for Research Organization Registry (ROR)
4 participants