Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adapt to pending entities with BBIDs / Introducing Kysely #323

Open
wants to merge 13 commits into
base: master
Choose a base branch
from

Conversation

kellnerd
Copy link
Contributor

@kellnerd kellnerd commented Nov 4, 2024

Problem

Following the schema change in metabrainz/bookbrainz-site#1136, the ORM has to be adapted to work together with the new schema.

Each pending entity has a pending_import_bbid which changes to a different accepted_entity_bbid after approval. The BBID should stay stable whenever possible and BBIDs should not be wasted for temporary data.

Solution

  • Change a lot of column, property and table names to follow the new schema
  • Adapt approval function to deal with imports which were fetched by the regular entity loader (532da04)
  • Completely rewrite the approval function to preserve the BBID which was already assigned (fb8fd5a)

Rather than collecting all properties of the pending entity data and creating a new entity (with new BBID), we simply create an entity header and a revision which link to the existing data.
This is a complete rewrite using Kysely (see next section) instead of Bookshelf/Knex as the query builder.

Because I can't easily mix Kysely and Bookshelf code in the same DB transaction, I can no longer load and return a Bookshelf model in this utility function.
Being honest, the final loading of the entity model (for search indexing) is even out of scope for this function and has hence been moved into bookbrainz-site.

P.S. There are a couple more functions which mix their specific task with the rather unrelated loading of a model for search indexing.
I think we should clearly separate these two concerns, as it is currently very hard to understand the search indexing and we seem to (unintentionally?) load different properties for indexing in different places...

Introducing Kysely

Kysely (pronounce “Key-Seh-Lee”) is a type-safe and autocompletion-friendly TypeScript SQL query builder.

We have already discussed Kysely as a possible replacement for the unmaintained Bookshelf ORM in BB-729.
I've long been annoyed by the lack of type safety of both Bookshelf and Knex, the query builder it uses. Writing queries is often trial and error unless you know our schema by heart or have the SQL open as a reference and manually typing the results is a pain as well.

So I've finally gone ahead, set up Kysely (using the already installed pg as DB driver) and generated type definitions for our DB schema using kysely-codegen in less than an hour (05446cd).
Kysely is now available as a top-level property of our BB orm object everywhere, so we can migrate from Bookshelf/Knex to Kysely function by function.
Using it is really straightforward as it very closely resembles SQL and the autocompletion of table and column names alone would have been completely worth it from my experience so far.

In addition to autocompletion, the results are properly typed based on just a single file with type definitions (src/types/schema.ts) which we can generate with the new kysely-codegen npm task whenever it is necessary.
(We only have to reformat the generated code afterwards to follow our linting rules.)
Since this tool is using DB introspection, we have to specify a DB connection string via the --url param or the DATABASE_URL env var.
There is currently a bug which makes the tool fail when there is no .env file, so that is the way I would currently recommend 😏

It would be great to have someone else test the whole type generation.
If you think Kysely could be our way to go, I still have to add documentation somewhere for the code generation and migration plans (e.g. writing utility functions to load related data for entities which would allow us to replace the Bookshelf models one day).

Areas of Impact

Only imported entity models and utility functions are affected.
For Kysely, the top-level ORM object exposes a new kysely property.

Mostly column, property and table names had to be updated to follow the
schema. A few other identifiers have been renamed as well for
consistency, although this was not strictly necessary.
It is never used and never passed, but causes a TS error.
The error message is already specific enough, no need to make it longer.
In the end we of course want to reuse `bbid` and `data_id` of accepted
entities instead of duplicating the data and deleting the original.
Also make `ImportMetadataT` more versatile/useful.
It is not needed and it is causing a type error since the last commit.
The rule to disallow trailing commas often forces you to pointlessly
touch neighboring lines when you add or remove items.

If we want to enforce consistent dangling commas, we should rather force
them to always be present. But since our whole codebase follows this
rule, it would be better to have a grace period to gradually add commas.

metabrainz/bookbrainz-site@2fc11a2
Run the kysely-codegen task to regenerate the types.
Don't forget to reformat the generated code to follow our linting rules.

Since this is using DB introspection, you have to specify a DB
connection string via the --url param or the DATABASE_URL env var:
https://github.com/RobinBlomberg/kysely-codegen
Rather than collecting all properties of the pending entity data and
creating a new entity (with new BBID), we simply create an entity header
and a revision which link to the existing data.
This is a complete rewrite using Kysely instead of Bookshelf/Knex.

The final loading of the entity model (for search indexing) is out of
scope for this function and has been moved into bookbrainz-site.
From an older schema, there are still editions without edition group.
The `pending_entity_bbid` column is no longer set to NULL on approval,
we have to additionally check for NULL in accepted_entity_bbid to find
pending imports.
@@ -117,6 +120,15 @@ export default function init(config: Knex.Config) {
const SeriesData = seriesData(bookshelf);
const WorkData = workData(bookshelf);

const kysely = new Kysely<DB>({
dialect: new PostgresDialect({
pool: new Pool(config.connection as Knex.ConnectionConfig),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The config object which is loaded from our config.json should always have this shape with a connection property, so we can probably narrow the type which the init function accepts already instead of doing a type assertion here.
Knex would allow a couple of different config formats (including a connection string) which would not be compatible with the type which the pg Pool expects here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant