Vocabularies Sprint Goal (Dec 2020)

The goal of the sprint is to add vocabularies support in InvenioRDM as a small layer on top of a reusable vocabularies support in Invenio.

## Subgoals

Add the following vocabularies to InvenioRDM:

  - License (SPDX vocabulary)
  - Subjects (mixed free text + others)
  - Affiliations (ROR.org source)
  - Languages (ISO-639-3 source)

For each vocabulary that includes:

- UI: The user interface widget and it's integration in the RDM deposit form.
- API: The REST API for the vocabulary itself, as well as it's integration in the InvenioRDM record.
- Display: The proper display of localized values from the vocabulary in search results and landing pages.
- Import: The import from the vocabulary data source.

## How does a successful outcome look like?

### End-user

For an end-user, a successful sprint outcome is visible in the deposit form and search results/landing pages (e.g aggregations, labels etc).

#### Deposit form

*Affiliation*:

- As an uploader, I want organisations names auto-completed in the affiliation field, so that I can save time.
- As an uploader, I want to be able to specify organisations names not yet in the database, so that I'm not constrained on the input.

*License*:

- As an uploader, I want help to select the license for my upload, so that I know what it means.
- As an uploader, I want to be presented with the best license choice for my upload, so that I don't have to think.
- As an uploader, I want to be able to specify a custom license text not known to the system, so that I can provide the correct license.

*Subjects*:

- As an uploader, I want subjects to be auto-completed, so that I can use similar subjects as other
- As an uploader, I want to limit auto-completion to specific vocabularies, so that I don't get irrelevant suggestions.
- As an uploader, I want to provide free text keywords, so that I can include keywords exactly as they are on my journal article.

*Languages*:

- As an uploader, I want to quickly specify multiple languages for my upload, so that I save time.
- As an uploader, I want to see both the full name and the language code.

*Mockup*

![Deposit - Protection v2](https://user-images.githubusercontent.com/1698163/100452881-4be0b500-30ba-11eb-854d-d07391f0b1ae.png)

#### Search results/landing page

In search results, vocabularies are often used in facets or for showing classification labels and similar (see below):

- As a visitor, I want to see human-readable titles in facets in my own language, so that the site looks professional and is understandable.
- As a visitor, I want to see human-readable subjects on both landing pages and search result items, , so that the site looks professional and is understandable.

![Screenshot 2020-11-27 at 14 13 07](https://user-images.githubusercontent.com/1698163/100453164-ce697480-30ba-11eb-897c-180a1f589741.png)

## Plan

The work is divided into two parallel tracks:

- **Front-end track**: Building the necessary UI widgets.
- **Backend track**:  Building the REST APIs to support the UI widgets and rendering of search results.

### Frontend track

- Analyse, mockup and plan generic reusable widgets and it's integration with formik in the deposit form.
  - The UI widget itself.
  - The state management and data management.
  - Affiliation widget: auto-complete a single value, but allow non-vocabulary items as well.
  - License widget: auto-complete a single value using a modal select box (to be designed)
  - Subjects: auto-complete multiple values, with a possibility to limit to a specific subject scheme, and a possibility to specify subject terms not in the vocabulary.
  - Languages: auto-complete multiple values, restricted to only allowing vocabulary items.
- Build the widgets with mock data.
- Integrate them in the deposit form state and 
- Integrate the widgets with the REST API.

If blocked by others:
- Improve deposit form UX.

### Backend track

- Design: See RFC 40 and 41.
- **Parallel track 1**: Build the first REST API vocabularies endpoint into InvenioRDM (``/api/vocabularies/languages``).
  - Build basic record type factory into Invenio-Records-Resources
  - Build Invenio-Vocabularies module with a generic vocabulary (data, service, presentation layer) using common definitions.
  - Integrate the Invenio-Vocabularies generic module into InvenioRDM.
  - Build a way to import vocabularies into the generic vocabulary.
  - Import licenses and languages vocabulary into the generic model
  - CHECKPOINT: ``/api/vocabularies/languages/`` and ``/api/vocabularies/licenses`` working and delivering data. This allows the frontend track to integrate it into the widgets.
- **Parallel track 2**: Build the machinery for linking records
  - Build basic relations support into Invenio-Records (integrity checking, dereferencing, indexing)
    - Note: Invenio v3.4 sprint team depends on getting a final Invenio-Records release as early as possible.
  - Build basic scan() and reindex() support into Invenio-Records-Resources
  - CHECKPOINT:  ability of data layer to check integrity, dereference a record, and index (allows integrating vocabulary into bibliographic record).

- Integrate languages vocabulary into the RDM bibliographic record in Invenio-RDM-Records
  - Data layer: integrity checking, dereferencing and indexing
  - Service layer: a marshmallow schema
  - Presentation layer: show in
  - CHECKPOINT: REST API is now fully supporting the linking of languages
- Build a facet over languages
   - Build programmatic vocabulary API
   - CHECKPOINT: Search results facet over languages is working.

CHECKPOINT: One full vocabulary working all the way through the backend.

- Build licenses vocabulary
- Build subjects vocabulary
- Build organisation vocabulary 
- Improve stack and APIs
  - Address performance challenges 
  - Address machinery challenges (data flow, updates, etc)
- Validate vocabularies on Invenio-App-ILS
- Test migration of resource type vocabulary

## Training/Context

- Ability to install and run InvenioRDM, with assets and module development workflows.
- Training on Invenio-(Drafts|Records)-Resources data flow.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Vocabularies Sprint Goal (Dec 2020) #4056

Subgoals

How does a successful outcome look like?

End-user

Deposit form

Search results/landing page

Plan

Frontend track

Backend track

Training/Context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Vocabularies Sprint Goal (Dec 2020) #4056

Description

Subgoals

How does a successful outcome look like?

End-user

Deposit form

Search results/landing page

Plan

Frontend track

Backend track

Training/Context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions