Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TypeDB 3.0: struct value types #7024

Open
flyingsilverfin opened this issue Apr 3, 2024 · 3 comments
Open

TypeDB 3.0: struct value types #7024

flyingsilverfin opened this issue Apr 3, 2024 · 3 comments
Milestone

Comments

@flyingsilverfin
Copy link
Member

flyingsilverfin commented Apr 3, 2024

Problem to Solve

TypeDB currently does not support a very expressive set of value types for attributes, which makes defining complex values (such as 'coordinate's) difficult.

Current Workaround

Users can define entities to hold multiple attributes that represent a composite value.

Proposed Solution

We consider the 'strongly typed' implementation of a value-only structs. These structs should be considered as compound primitive types.

Defining

define
struct coordinate:
  x-coord value long,
  y-coord value long, // could have annotations: @values(1,2,3)
  z-coord value long?;

Where ? denotes optionality in the struct.

Using struct definitions to define attribute values:

define
gps_coordinate sub attribute, value coordinate;

Matching
Using struct definitions in data queries:

match
  $b isa city, has gps_coordinate { x-coord: $x, y-coord: $y , z-coord: 100 };
  $gps_coordinate_2 { x-coord: $x_2, y-coord: $y_2 };
  $gps_coordinate_3 == { x-coord: $x_3, y-coord: $y_3 };

This shows how we can variabilise and query through the internals of structs in formats very analogously to our existing attribute value querying:

match
  $b isa city, has name "London";
  $name2 "London;
  $name3 == "London";

We also note that we can now polymorphically match value types by structure:

define
  struct s1:  a value long, b value long ;
  struct s2:  a value string, b value string;

match
  $s isa attribute;
  $s == { a: $a, b: $b };
get;

Should find all attributes with value instances matching the shape required.

Updating

match 
  $b isa city, has gps_coordinate $gps; $gps == { x-coord: $x, y-coord: $y };
  $new_x == $x + 1;
delete
  $b has $gps;
insert
  $b has gps_coordinate { x-coord: $new_x, y-coord: $y };

This will create a new whole struct with different members. As with all TypeDB attributes, we will preserve uniqueness and shared ownership of struct-valued attributes the same way we do with primitive values.

Additional Information

We note that we explicitly use a syntax that differs from defining normal patterns which are composable

Disallowed because we don't want to allow composing structs:
struct coordinate:x-coord value integer;
struct coordinate:y-coord value integer;
struct coordinate:z-coord value integer;

However, we are still consistently using a subset of the normally TypeQL syntax:

Allowed    --> A keyword B, keyword C;
Not allowed -->A keyword B; A keyword C;

As a result, we also reject the following syntax, which takes defining new value types too close to defining 'first-class' types (entity, relation attribute):

coordinate sub value,
  takes x-coord value integer,
  takes y-coord value integer;
@flyingsilverfin flyingsilverfin added this to the 3.0.0 milestone Apr 3, 2024
@flyingsilverfin flyingsilverfin changed the title TypeDB 3.0: Struct value types TypeDB 3.0: struct value types Apr 3, 2024
@cxdorn
Copy link
Member

cxdorn commented Apr 3, 2024

Nesting structs

Structs can be nested, but not recursively so

define
struct serial_info: 
  number value integer,
  manufacturer value string;

define
struct extended_serial_info:
  base_info value serial_info,
  more_info value string;

This means patterns may nest as well:

match 
  $x has ext_info $s;
  $s == { base_info: { number: "123", manufacturer: $m }, more_infto: $o };
get $m;

The following is not allowed

struct serial_info: 
  info value serial_info;

Question: Storage

By avoiding recursion every struct can be maximally extended into a tree of value leafs, with internal nodes labeled with field labels. How do we store this tree for performant access/indexing to fields?

@brettforbes
Copy link

brettforbes commented Apr 28, 2024

Hmm, you guys are the gurus, but have you considered what is the most valuable struct type in business??

It is the only struct type that entire databases have been developed for, and no graph can handle properly. But does this proposition handle it?? Obviously, its time, specifically date-time values!!!

Entire databases have been designed and commercialised that are specialised in handling time, yet if you adopt this approach to storing the time components, then TypeDB could outflank them, since its normalised attributes can index any date time with great granularity, and thus retrieve very quickly.

If you have normalised attributes, composite attributes and fast time indexing, then you will become super attractive for many applications. Fast datetime indexing is a very valuable capability.

Now obviously:

  1. The article is wrong because the old idea of attributes owing attributes may not have worked correctly
  2. The article is written in Graql, which is also now outdated

But the principle remains true. Ideally, one wants to:

  1. Insert a date-time, with a timezone (struct)
  2. Have rules, automatically separate that string (and timezone) into a components struct
  3. Have rules that can automatically do date time calculations (e.g. match all invoices this month, week, financial quarter, second etc.)
  4. Have rules that can handle timezones (needs some thought, because there are multiple possible approaches)

Back in the day, I developed some brief examples of this process, but using the client software to separate the components of the time, as shown in the attached presentation. However, we quickly realised it would start making Query Language statements to long for interactive use, and we discussed using some kind of moustache approach.

gTime - A New More Powerful Approach.pptx

Perhaps Time could be the perfect application for combining the Function capability you guys have got (Great work by the way), with structs and rules. Even the nanosecond field could be deconstructed with functions to shrink the 10^9 unique values down to a smaller set of composite numbers. I'll leave the base implementation to you guys.

I'm sure the community could come up with a library of rules and functions that could be built around this if a skeleton of the capability is in place.

I say resurrect the old Grakn Warriors program to build libraries of functions and rules for capabilities like Units, GIS, and Time, as there should be one best/recommended way of handling them, and quite a lot of associated functionality.

@brettforbes
Copy link

brettforbes commented May 7, 2024

Connecting a Struct to its Composite Value

Let's consider the two highest value structs for Vaticle, particularly TIME and GIS.

For these two examples (as shown previously), we can make TypeDB insanely powerful (i.e. fast indexing) by breaking the composite numbers into components, so:

Ideally, we want to represent both of these as component structs for ultra-fast indexing and functions (.e. find all records within an hour/day/financial quarter or /degree of longitude etc.), but we also want to be able to access it as a composite datetime value, or GPS value. Ideally, one could quote a datetime value in a query, and then find all within a time interval from that value using the composite indexes.

Thus, the simple structs seem to have two extra requirements to be useful for Time or GPS:

  1. Integrate the Time/GPS struct with its composite value (e.g. datetime value or GPS string) and enable rules/functions to be able to have the input as a composite value but use the struct values for calculations/indexing. Also, we need timezone storage. How?
  2. Enable input or query through any GIS format, and calculations/functions/rules based on automatically derived composite struct. How to handle the variations in formats?

Providing expert guidance as to best practice for these two Struct Types is critical to accelerate TypeDB marketshare

As part of the v3.0 release please develop/provide a best practice schema for datetime and location storage and function/rule development

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants