Adapt to pending entities with BBIDs / Introducing Kysely #323

kellnerd · 2024-11-04T17:15:28Z

Problem

Following the schema change in metabrainz/bookbrainz-site#1136, the ORM has to be adapted to work together with the new schema.

Each pending entity has a pending_import_bbid which changes to a different accepted_entity_bbid after approval. The BBID should stay stable whenever possible and BBIDs should not be wasted for temporary data.

Solution

Change a lot of column, property and table names to follow the new schema
Adapt approval function to deal with imports which were fetched by the regular entity loader (532da04)
Completely rewrite the approval function to preserve the BBID which was already assigned (fb8fd5a)

Rather than collecting all properties of the pending entity data and creating a new entity (with new BBID), we simply create an entity header and a revision which link to the existing data.
This is a complete rewrite using Kysely (see next section) instead of Bookshelf/Knex as the query builder.

Because I can't easily mix Kysely and Bookshelf code in the same DB transaction, I can no longer load and return a Bookshelf model in this utility function.
Being honest, the final loading of the entity model (for search indexing) is even out of scope for this function and has hence been moved into bookbrainz-site.

P.S. There are a couple more functions which mix their specific task with the rather unrelated loading of a model for search indexing.
I think we should clearly separate these two concerns, as it is currently very hard to understand the search indexing and we seem to (unintentionally?) load different properties for indexing in different places...

Introducing Kysely

Kysely (pronounce “Key-Seh-Lee”) is a type-safe and autocompletion-friendly TypeScript SQL query builder.

We have already discussed Kysely as a possible replacement for the unmaintained Bookshelf ORM in BB-729.
I've long been annoyed by the lack of type safety of both Bookshelf and Knex, the query builder it uses. Writing queries is often trial and error unless you know our schema by heart or have the SQL open as a reference and manually typing the results is a pain as well.

So I've finally gone ahead, set up Kysely (using the already installed pg as DB driver) and generated type definitions for our DB schema using kysely-codegen in less than an hour (05446cd).
Kysely is now available as a top-level property of our BB orm object everywhere, so we can migrate from Bookshelf/Knex to Kysely function by function.
Using it is really straightforward as it very closely resembles SQL and the autocompletion of table and column names alone would have been completely worth it from my experience so far.

In addition to autocompletion, the results are properly typed based on just a single file with type definitions (src/types/schema.ts) which we can generate with the new kysely-codegen npm task whenever it is necessary.
(We only have to reformat the generated code afterwards to follow our linting rules.)
Since this tool is using DB introspection, we have to specify a DB connection string via the --url param or the DATABASE_URL env var.
There is currently a bug which makes the tool fail when there is no .env file, so that is the way I would currently recommend 😏

It would be great to have someone else test the whole type generation.
If you think Kysely could be our way to go, I still have to add documentation somewhere for the code generation and migration plans (e.g. writing utility functions to load related data for entities which would allow us to replace the Bookshelf models one day).

Areas of Impact

Only imported entity models and utility functions are affected.
For Kysely, the top-level ORM object exposes a new kysely property.

Mostly column, property and table names had to be updated to follow the schema. A few other identifiers have been renamed as well for consistency, although this was not strictly necessary.

It is never used and never passed, but causes a TS error.

The error message is already specific enough, no need to make it longer.

In the end we of course want to reuse `bbid` and `data_id` of accepted entities instead of duplicating the data and deleting the original.

Also make `ImportMetadataT` more versatile/useful.

It is not needed and it is causing a type error since the last commit.

The rule to disallow trailing commas often forces you to pointlessly touch neighboring lines when you add or remove items. If we want to enforce consistent dangling commas, we should rather force them to always be present. But since our whole codebase follows this rule, it would be better to have a grace period to gradually add commas. metabrainz/bookbrainz-site@2fc11a2

Run the kysely-codegen task to regenerate the types. Don't forget to reformat the generated code to follow our linting rules. Since this is using DB introspection, you have to specify a DB connection string via the --url param or the DATABASE_URL env var: https://github.com/RobinBlomberg/kysely-codegen

Rather than collecting all properties of the pending entity data and creating a new entity (with new BBID), we simply create an entity header and a revision which link to the existing data. This is a complete rewrite using Kysely instead of Bookshelf/Knex. The final loading of the entity model (for search indexing) is out of scope for this function and has been moved into bookbrainz-site.

From an older schema, there are still editions without edition group.

The `pending_entity_bbid` column is no longer set to NULL on approval, we have to additionally check for NULL in accepted_entity_bbid to find pending imports.

kellnerd · 2024-11-04T17:39:25Z

src/index.ts

@@ -117,6 +120,15 @@ export default function init(config: Knex.Config) {
 	const SeriesData = seriesData(bookshelf);
 	const WorkData = workData(bookshelf);

+	const kysely = new Kysely<DB>({
+		dialect: new PostgresDialect({
+			pool: new Pool(config.connection as Knex.ConnectionConfig),


The config object which is loaded from our config.json should always have this shape with a connection property, so we can probably narrow the type which the init function accepts already instead of doing a type assertion here.
Knex would allow a couple of different config formats (including a connection string) which would not be compatible with the type which the pg Pool expects here.

It is no longer needed to edit imported entities during approval. They are now approved before being edited, just like regular entities. Last usage was removed in metabrainz/bookbrainz-site@d9830e5

kellnerd added 13 commits August 12, 2024 18:50

refactor(imports): Adapt to the new schema where imports have BBIDs

3f8a095

Mostly column, property and table names had to be updated to follow the schema. A few other identifiers have been renamed as well for consistency, although this was not strictly necessary.

fix(imports): Remove unused property from function parameter type

2578a5e

It is never used and never passed, but causes a TS error.

fix(imports): Properly load recent imports after schema change

39f0c43

refactor(imports): Avoid re-throwing our own errors

56df87e

The error message is already specific enough, no need to make it longer.

fix(imports): Adapt deletion process to the new schema

f1b2cf3

In the end we of course want to reuse `bbid` and `data_id` of accepted entities instead of duplicating the data and deleting the original.

refactor(imports): Adapt approval to the new way imports are loaded

532da04

Also make `ImportMetadataT` more versatile/useful.

fix(imports): Remove raw SQL default value

b38e358

It is not needed and it is causing a type error since the last commit.

chore(deps): Install kysely

d1f5c48

fix(imports): Mark ParsedEdition.editionGroupBbid as optional

1ee3d1a

From an older schema, there are still editions without edition group.

fix(imports): Update recent import queries for new approval behavior

fdc7cbb

The `pending_entity_bbid` column is no longer set to NULL on approval, we have to additionally check for NULL in accepted_entity_bbid to find pending imports.

kellnerd mentioned this pull request Nov 4, 2024

Give pending entities BBIDs and treat them the same way as regular entities metabrainz/bookbrainz-site#1136

Open

kellnerd commented Nov 4, 2024

View reviewed changes

kellnerd mentioned this pull request Nov 4, 2024

Adapt importer to the latest schema/ORM changes metabrainz/bookbrainz-utils#51

Open

chore: Delete unused createEntity and its exclusive dependencies

f860e4e

It is no longer needed to edit imported entities during approval. They are now approved before being edited, just like regular entities. Last usage was removed in metabrainz/bookbrainz-site@d9830e5

kellnerd force-pushed the entity-import-bbid branch from f3e2ab5 to f860e4e Compare December 9, 2024 13:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adapt to pending entities with BBIDs / Introducing Kysely #323

Adapt to pending entities with BBIDs / Introducing Kysely #323

kellnerd commented Nov 4, 2024 •

edited

Loading

kellnerd Nov 4, 2024

Adapt to pending entities with BBIDs / Introducing Kysely #323

Are you sure you want to change the base?

Adapt to pending entities with BBIDs / Introducing Kysely #323

Conversation

kellnerd commented Nov 4, 2024 • edited Loading

Problem

Solution

Introducing Kysely

Areas of Impact

kellnerd Nov 4, 2024

Choose a reason for hiding this comment

kellnerd commented Nov 4, 2024 •

edited

Loading