Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DB properties storage refactoring #11177

Merged
merged 372 commits into from May 6, 2024
Merged

Conversation

tiensonqin
Copy link
Contributor

@tiensonqin tiensonqin commented Mar 29, 2024

Context and Problem Statement

  • There're two ways to store block properties for both internal properties and user-defined properties:
    • Store all properties in a map for any block
      • Currently, it's stored in :block/properties, for example: a book that has authors and a published date will be something like:
      • {:block/properties {:block/name "a book"
                            :block/properties {:authors-uuid #{author-1-uuid author-2-uuid}
                                               :published-date-uuid date-page-uuid}}}
    • Use database attributes for properties
      • The above example will be stored as:

        ```clojure
        {:block/name "a book"
         :user.property/authors #{{:db/id 1} {:db/id 2}} ;; id 1 for author 1 and id 2 for author 2
         :user.property/published-date {:db/id 3} ;; id 3 represents the date page
         }
        ```
        
  • There're some issues with Store all properties in a map for any block
    • Slow queries
      • To get all the values for a property, we have to scan all blocks that have defined :block/properties, not just the blocks which have the specific property
    • Painful to modify
      • Needs custom code handling to ensure the db is valid
        • Property keys need to be removed from blocks when deleting a property, for example, the property pair :authors-uuid #{...} needs to be removed from the block as below when deleting the property page authors

          ```clojure
          ;; block
          {:block/properties {:authors-uuid #{...}}}
          ```
          
        • Property UUIDs in property values need to be removed too when deleting a property
        • Changing property value cardinality (one to many or many to one) requires updating all the existing blocks that have this property
      • Updating a property's value for a block requires rebuilding the whole :block/properties
    • Painful to validate

      > I’ve also found the current schema to be painful to query and validate. To validate a property value by its type I had to [[invent a property pair](https://github.com/logseq/logseq/blob/c98db7cd3d19e4ac407073f8a3680a29be033d4d/deps/db/src/logseq/db/frontend/malli_schema.cljs#L59-L68)](https://github.com/logseq/logseq/blob/c98db7cd3d19e4ac407073f8a3680a29be033d4d/deps/db/src/logseq/db/frontend/malli_schema.cljs#L59-L68).  A property pair could be a good abstraction. It’s probably worth doing an ADR or at least a spike or two as I remember [[there were reservations](https://logseq.slack.com/archives/C04ENNDPDFB/p1687945735955859?thread_ts=1687937426.655469&cid=C04ENNDPDFB)](https://logseq.slack.com/archives/C04ENNDPDFB/p1687945735955859?thread_ts=1687937426.655469&cid=C04ENNDPDFB) about doing a property pair approach. If we migrate to another property schema it’s going to be expensive so hopefully that schema will last for more than a year  
      
      By Gabriel  
      
    • Painful to merge property changes for RTC
      • For example, there are two clients, A and B, who modified the same block, A set the background color to red, and B set the heading to level 2. Syncing :block/properties will overwrite one client's change for that block.
        Another example is that A adds a person to the "Authors" property, which can have multiple values, while B deletes an existing value. Syncing will lose either A's or B's operation.

Decision

- Use database attributes for properties
	- Create `db/ident` for both internal and user-defined properties
		- `logseq.x/y` for internal properties, for example:
			- `logseq.property/created-by-block`
			- `logseq.task/status`
		- `user.property/x` for user-defined properties
	- Page name unique index should be deleted, which will be handled by #11196
		- After discussions with [[Zhiyuan]], we agreed that multiple unique attributes including `:block/uuid`, `:block/name` and `:db/ident` could make RTC error-prone and hard to get it right.
		Another reason to remove `:block/name` unique index is how we can enable the same property names for different classes.
		- A property can only be queried with either `:db/ident` or `:db/id`

Positive Consequences

- We got real db refs and all the benefits
- Property queries and transactions are much easier
- One property change doesn't affect the other properties, which makes conflicts handling simple

Notice

  • :block/left has been replaced with :block/order
  • :block/namespace has been removed for the db version
  • :block/name is no longer unique for the db version
  • :block/macros has been removed, we used it for getting all the dsl queries, which can be addressed by using classes.
  • This will affect file-based import since properties are stored as db attributes now @logseq-cldwalker
  • RTC needs to be updated @RCmerci

TODO Address the following issues in other PRs

@tiensonqin tiensonqin changed the title DB properties storage refactor DB properties storage refactoring Apr 1, 2024
@tiensonqin tiensonqin marked this pull request as ready for review April 1, 2024 15:16
@tiensonqin tiensonqin marked this pull request as draft April 2, 2024 07:12
@tiensonqin tiensonqin marked this pull request as ready for review April 3, 2024 09:57
tiensonqin and others added 23 commits April 7, 2024 21:32
1. back to use container id + editing block id instead of block dom
ref to decide which block is in editing, to simplify edit-block! and
reduce browser gc.
2. set next editing block ahead of receiving the data changes from db
worker, so the editor feels a bit faster.
they had the same unique db-ident. Ensure a unique db-ident with
automatic db-ident suffixes. Also add tests for this and add more
sanitization to property db idents
in clobbered db idents. Also rename fn to make its
purpose explicit
logseq-cldwalker and others added 27 commits May 1, 2024 09:07
This allows imported and eventually user classes to have db idents.
Updated schema example graph which now imports classes as
:schema.class/X. Also fix schema properties which were removed from
the debug file and remove unused property uuids
Finally no need to worry about parent-left conflicts and broken chain.
With :block/order, we only need to re-compute new orders for siblings
with same order (it can happens if there're bugs in our code, or
updates from rtc), but it doesn't break UI.

Another huge potential benefit after discussing with Zhiyuan is:
Ee might be able to simplify both RTC and undo/redo, currently, we
need to handle each new op for both of them, with recently
refactorings like properties being db attributes, :block/order
is a string instead of a ref, we can handle most property value
conflicts using last-write-wins, and others (e.g. :block/parent,
property with :default type) specifically.

I haven't fixed the issues of using :block/left in RTC and undo/redo,
because we might change both soon.
Fixes some generation cases like 1 or 2 pages with 1000 blocks.
Larger generation with 10k+ blocks still fail b/c of block/order
generation. Also remove deleted :block-id-fn
- only use it for user properties
- don't use create ident fn to lookup names
- fixed get-area-block-asset-url which didn't work for db graphs without
  the frontend
Any script can pass a :classes config key. schema script no longer needs
to manage db ids or db idents
Copy link
Collaborator

@logseq-cldwalker logseq-cldwalker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tiensonqin Awesome work! ❤️ 🚀

@logseq-cldwalker logseq-cldwalker merged commit 9495177 into feat/db May 6, 2024
2 checks passed
@logseq-cldwalker logseq-cldwalker deleted the refactor/db-properties-schema branch May 6, 2024 14:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants