Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix difficult TypeQL expressions #6986

Open
james-whiteside opened this issue Feb 13, 2024 · 18 comments
Open

Fix difficult TypeQL expressions #6986

james-whiteside opened this issue Feb 13, 2024 · 18 comments

Comments

@james-whiteside
Copy link
Member

Problem to Solve

Some simple queries are difficult to express in TypeQL, and present a significant barrier to the early user journey. This issue will serve as a place to collate them.

Example

For the following schema:

define
person sub entity;
name sub attribute, value string;
person owns name;

It is difficult to write a query that "inserts a person with a given name, only if a person with that name does not already exist".

The following theoretical query would achieve that, but is not currently valid:

match
not { $p isa person, has name "Kevin Morrison"; };
insert
$p isa person, has name "Kevin Morrison";

This is because match clauses cannot have unbound nested patterns, a recurring theme in many difficult-to-express queries. The following is the best current workaround:

match
$t type person;
not { $p isa $t, has name "Kevin Morrison"; };
insert
$p isa $t, has name "Kevin Morrison";

This workaround is particularly unintuitive and basically a hack.

@james-whiteside
Copy link
Member Author

Given the following query, which is intended to return the name of anything in the UK:

match
$uk isa country, has name "United Kingdom";
(location: $uk, located: $x) isa locating;
fetch
$x: name;

Currently, if there are types that play locating:located and so could be inferred for $x in the match clause, and some of those types do not own name, the query is rejected for an improper projection. This means the query needs to be modified with two additional constraints:

match
$uk isa country, has name "United Kingdom";
(location: $uk, located: $x) isa locating;
$x isa $t;
$t owns name;
fetch
$x: name;

Adding $x has name $n; is not a suitable workaround, as the name should only affect the match optionally. This makes queries of this kind particularly verbose, especially if multiple projections of this kind are used. This also requires the introduction of throwaway variables.

Current behaviour is rooted in pre-Fetch versions of TypeDB, in which attributes had to be bound in the match clause. Since the addition of Fetch queries, it is now most intuitive only to bind attributes in the match clause if used to constrain a value rather than retrieve one. As such, the projection should be sufficient to imply attribute type ownership by the variable's type.

@james-whiteside
Copy link
Member Author

james-whiteside commented Feb 16, 2024

In many queries we often want to retrieve the type or role played of an instance. This is done with the following patterns:

$x isa $t;
($r: $x) isa my-relation;

These currently retrieve type $t and role $r transitively. If we would like to retrieve only the exact type, we can use:

$x isa! $t;

But if we want to retrieve only the exact role, we need to use:

($r: $x) isa! $t; $t relates $r; $t type my-relation;

or

($r: $x) isa! $t; $t relates $r; $t sub my-relation;

depending on the exact intent. Unlike the type case, the role case is significantly more complex when retrieving the exact role.

This behaviour is also unintuitive, when compared with most programming languages. Typically when we retrieve instances of a type, we also want to retrieve instances of its subtypes. Conversely, when we want to retrieve the type of an instance, we typically only want the exact type. Retrieving all of its types is the edge case rather than the typical case.

A simple solution to this would be to make the use of isa! on a relation type also prevent transitive retrieval of variablized roles. So for instance, the following would only retrieve the exact role for $r:

($r: $x) isa! my-relation;

@izmalk
Copy link
Member

izmalk commented Feb 27, 2024

Current behaviour is rooted in pre-Fetch versions of TypeDB, in which attributes had to be bound in the match clause. Since the addition of Fetch queries, it is now most intuitive only to bind attributes in the match clause if used to constrain a value rather than retrieve one. As such, the projection should be sufficient to imply attribute type ownership by the variable's type.

I also encountered this unexpected behavior with the following simple query (don't mind the thing, it can be any of the root types, except the entity in this particular schema):

match $x isa thing; fetch $x: attribute;

With the following schema:

define
email sub attribute,
    value string;
name sub attribute,
    value string;
tag sub attribute,
    value string;
friendship sub relation,
    relates friend;
user sub entity,
    owns email @key,
    owns name,
    owns tag,
    plays friendship:friend;

And the error sounds like:

[PRO06] Invalid projection operation: Projection from '$x' to attribute type 'attribute' is illegal, since '$x' could be of type 'tag' which does not own the attribute type or any of its subtypes. Constrain the 'match' clause such that all types can own the attribute type or its subtypes.

@james-whiteside
Copy link
Member Author

james-whiteside commented Mar 12, 2024

Retrieving multiple aggregates over the same set of values is pretty cumbersome at the moment, as each aggregate subquery can only retrieve one aggregate.

match
$user isa user, has id $id;
fetch
$id;
median-price: {
    match
    $book isa book, has price $price;
    $order isa order;
    ($book, $order) isa order-line;
    ($order, $user) isa action-execution;
    get $book, $price; median $price;
};
min-price: {
    match
    $book isa book, has price $price;
    $order isa order;
    ($book, $order) isa order-line;
    ($order, $user) isa action-execution;
    get; min $price;
};
max-price: {
    match
    $book isa book, has price $price;
    $order isa order;
    ($book, $order) isa order-line;
    ($order, $user) isa action-execution;
    get; max $price;
};

This isn't very DRY. It would be much better if aggregates could be made more of a "first-class citizen" of Fetch queries. The following syntax would be much nicer, for instance.

match
$user isa user, has id $id;
fetch
$id;
price-stats: {
    match
    $book isa book, has price $price;
    $order isa order;
    ($book, $order) isa order-line;
    ($order, $user) isa action-execution;
    fetch
    median-price: filter $book; median $price;
    min-price: min $price;
    max-price: max $price;
};

This change to Fetch queries would also allow aggregates to be retrieved without using subqueries or a Get query, for instance as follows.

match
$user isa user;
fetch
user-count: count;

Of course, there are many practical elements not yet considered. One is how a fetch clause containing both aggregates and non-aggregates should be treated (compare how SQL handles this).

@izmalk
Copy link
Member

izmalk commented Mar 15, 2024

I was trying to fetch all subtypes of a type in a meaningful way so that I could visualize the hierarchy between them.
So, I've created the following query:

match
$subtype sub subject;
$subtype sub! $supertype;
fetch
$subtype;
$supertype;

But it didn't work, as it produced the following error:

## Error> [CXN05] The transaction is closed because of the error(s):
[QRY12] Invalid Query Pattern: The type variable '$subtype' has multiple 'sub' constraints.

But I think we should be able to do something like this in TypeQL (when we are using different keywords (like sub and sub!) and not contradicting ourselves).

I've come up with a workaround for this query:

match
$subtype sub subject;
$subtype is $subtype2;
$subtype2 sub! $supertype;
fetch 
$subtype;
$supertype;

But that feels like hacking our type check.

P.S. I think Christoph had another use case for having multiple isa constraints for the same variable with disjunction branches.

@james-whiteside
Copy link
Member Author

This issue is primarily for difficult syntax that does not actually affect expressivity, but I'll also point out this issue: vaticle/typeql#325. I've made it a separate issue as it does affect expressivity.

@james-whiteside
Copy link
Member Author

Currently, TypeDB does not support SQL-style cascading deletes. This makes almost all practical deletions of entities (and relations playing roles) unintuitive, and potentially very difficult, to do correctly. If not done correctly, relations the entity was playing a role in remain in the database, but with one roleplayer less. These relations are not semantically sound in many cases.

The best practice for deleting entities in a generalised manner is detailed in Lesson 4.3 of the new learning course. It requires three queries: one query to retrieve the key attributes of the entities to be deleted (if not already known), another query to delete any relations the entities play roles in, and then a third query to delete the entities themselves. This will guarantee that no dangling relations are left after the entities are deleted, but makes certain assumptions:

  1. The entities have key attributes.
  2. No nested relations depend transitively on the entities via nesting of roleplayers.

If either of this assumptions is not correct, then this strategy will not work. Dangling relations will be left, or a bad delete may occur in which too much or not enough data is deleted. In fact, a generalised strategy for deleting any entity (or relation that plays roles) is currently not possible without making use of IIDs and an additional query per level of relation nesting, requiring an acute understanding of both TypeDB and the specific data model in use to pull off without error.

The current lack of cascading deletes makes correclty deleting data unnecessarily difficult. Even if the above assumptions are satisfied, three queries is exceedingly verbose for ensuring a single entity is correctly deleted.

@maydanw
Copy link

maydanw commented Apr 4, 2024

Currently, TypeDB does not support SQL-style cascading deletes. This makes almost all practical deletions of entities (and relations playing roles) unintuitive, and potentially very difficult, to do correctly. If not done correctly, relations the entity was playing a role in remain in the database, but with one roleplayer less. These relations are not semantically sound in many cases.

The best practice for deleting entities in a generalised manner is detailed in Lesson 4.3 of the new learning course. It requires three queries: one query to retrieve the key attributes of the entities to be deleted (if not already known), another query to delete any relations the entities play roles in, and then a third query to delete the entities themselves. This will guarantee that no dangling relations are left after the entities are deleted, but makes certain assumptions:

  1. The entities have key attributes.
  2. No nested relations depend transitively on the entities via nesting of roleplayers.

If either of this assumptions is not correct, then this strategy will not work. Dangling relations will be left, or a bad delete may occur in which too much or not enough data is deleted. In fact, a generalised strategy for deleting any entity (or relation that plays roles) is currently not possible without making use of IIDs and an additional query per level of relation nesting, requiring an acute understanding of both TypeDB and the specific data model in use to pull off without error.

The current lack of cascading deletes makes correclty deleting data unnecessarily difficult. Even if the above assumptions are satisfied, three queries is exceedingly verbose for ensuring a single entity is correctly deleted.

The deletion sometimes has "a ripple effect" where one deletes an entity and then it's attributes need to be deleted and afterward it's relations and then attributes on the relations and relations related to the relations. I think that as typeDB is closely related to the various development languages so are the various expectations from it like GC.

Update also is somewhat unintuitive and may end up with an attribute without anything linked to it that requires an extra query to check if the last link was removed and then clean up (after creating a new one linking to it and removing the previous link).

@maydanw
Copy link

maydanw commented Apr 4, 2024

The deletion sometimes has "a ripple effect" where one deletes an entity and then it's attributes need to be deleted and afterward it's relations and then attributes on the relations and relations related to the relations. I think that as typeDB is closely related to the various development languages so are the various expectations from it like GC.

Going forward with the development language analogy I think that the expected following requirements that will be asked will be a few basic "data structures" to simplify working with complex sets of objects (e.g., ordered list, not as an attribute type which can be nice, but as ability to easily collect entities in an ordered list - it is currently possible but hard and unintuitive)

@maydanw
Copy link

maydanw commented Apr 4, 2024

I think that making a query to check if a thing is contained in a set of given things or the other way around that a thing is answering all of a set of constraints is currently possible (through OR) but harder than it should.

https://www.w3schools.com/sql/sql_in.asp is quite intuitive

Found this: vaticle/typeql#213 and this #6321

@maydanw
Copy link

maydanw commented Apr 4, 2024

I am putting it here to be documented but it can be moved or deleted.
It seems like two independent "query paths" are being formed (GET and FETCH) and each have it's pros and cannot be discarded.
Yet, if both will be maintained they will naturally evolve over time and split from one another. It may become a constant requirement for double development, a time sink of maintenance and create endless confusion for the users.
It should be looked into better but I will propose a solution on the fly in order to avoid leaving "a warning in the air" without putting forward also a way out.

I am not sure what the ideal solution should be but maybe evolving TypeQL on the single path of MATCH and allowing the user to choose to "flatten" the response to JSON as part of the query or the session connection. This can be done by supplying a default way to "flatten" the result to JSON and allowing the user to supply a "flattening mapping schema" along the query or activate a specific predefined "flattening mapping schema" that is part of the schema. Other ideas are welcomed.

@maydanw
Copy link

maydanw commented Apr 4, 2024

Optional retrieval was discussed already but raising again the following option

match $x isa person, has email "[email protected]";
get $x.username, $x.email, $x.name;

#6322

The idea is to break a bit the strong coupling between the match which forms constraints and the get that forms the representation. It also make an easy an intuitive way to make queries where for both query flows.
for example, take all the persons with a specific email and return a predefined MissingAttribute object in the grpc but is requested as json it will just return null. Maybe like so:

match $x isa person, has email "[email protected]";
get as JSON $x.username, $x.email, $x.name;

Will return

[{username:bob123, email:[email protected], name: null}]

@james-whiteside
Copy link
Member Author

Allow multiple isa or sub constraints on the same variable and resolve to their intersection:
vaticle/typeql#268

@maydanw
Copy link

maydanw commented Apr 8, 2024

Create a Cypher to TypeQL converter

Improving TypeQL - Cypher has been used in many cases for many years and gained a long track of usage. By writing such a converter many of the missing and hard-to-express parts in TypeQL will surface. The assumption here is that at it's core TypeDB is a superset of Neo4J DB and therefore any limitation to translate a Cypher query to TypeQL is due to limitations in TypeQL that can be resolved.

Created an issue with this idea for it: #7031

@flyingsilverfin
Copy link
Member

flyingsilverfin commented Apr 9, 2024

Optional retrieval was discussed already but raising again the following option

match $x isa person, has email "[email protected]";
get $x.username, $x.email, $x.name;

#6322

The idea is to break a bit the strong coupling between the match which forms constraints and the get that forms the representation. It also make an easy an intuitive way to make queries where for both query flows. for example, take all the persons with a specific email and return a predefined MissingAttribute object in the grpc but is requested as json it will just return null. Maybe like so:

match $x isa person, has email "[email protected]";
get as JSON $x.username, $x.email, $x.name;

Will return

[{username:bob123, email:[email protected], name: null}]

Have you seen fetch queries? They do exactly that https://typedb.com/docs/manual/reading/fetch

Side note: let's try to keep this issue for collecting big UX issues, mostly for @james-whiteside and discussion to discord/forum/precise issues.

@james-whiteside
Copy link
Member Author

Allow composition of interface implementations into high-level traits:
vaticle/typeql#327

@james-whiteside
Copy link
Member Author

Allow negations and disjunctions without exterior bindings:
vaticle/typeql#292

@james-whiteside
Copy link
Member Author

james-whiteside commented Apr 22, 2024

Allow variables in disjunction branches to be fetched, and referenced in subqueries.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants