Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow arbitrary strings as predicates #106

Open
HolgerKnublauch opened this issue Nov 20, 2023 · 30 comments
Open

Allow arbitrary strings as predicates #106

HolgerKnublauch opened this issue Nov 20, 2023 · 30 comments

Comments

@HolgerKnublauch
Copy link

THIS IS ME BRAINSTORMING ONLY, so don't kill me.

Currently, RDF requires predicates of a triple to be IRIs. I guess that choice was made so that

  1. ontologies can attach information such as rdfs:range and rdfs:label to the properties themselves
  2. it becomes more likely that predicates are scoped in the context of a namespace and thus don't clash with other namespaces, which means that it is likely that (SPARQL) queries against these properties only find the subjects that we want

But:
ad 1: Global property axioms are not necessary and do not play a role in SHACL where everything is scoped by shapes and classes. And rdfs:range and rdfs:domain are typically horribly misunderstood.
ad 2: Even with unique identifiers people from other graphs may reference your predicate in unexpected ways and your queries still need to filter by subjects.

Even within a single namespace it is quite common that the same URI is used for different purposes. For example, a ex:role property could point from an ex:Agent to a ex:Role or from an ex:Organization to a ex:Role, and both could have different local meanings depending on the context.

So the benefits of URIs as predicates are IMHO overrated.

Proposal: Moving forward, RDF could also allow predicates to be arbitrary strings.

a) That is how most map-based data structures like JSON objects or Python dictionaries operate, meaning that the mapping between RDF and other languages becomes easier. I think property graphs too.

b) Allowing strings would make the syntax more compact. For example one could write

ex:David firstName "David"

c) People don't need to invent artificially "unique" names - their application logic and queries are most likely already checking for the context anyway, e.g.

SELECT ?david
WHERE {
    ?person firstName "David" .
    ?person a ex:Person .
}

is already scoping the use of firstName to instances of Person, making the property uniquely identified at query time. And when mapped to languages like GraphQL or JavaScript, any access to predicates is already scoped to the context object.

As this would be an incremental generalization, existing RDF graphs would not be affected. People are not forced to use strings as predicates.

To minimize the overhead for existing triple stores, string-based predicates could be internally converted to URIs such as

urn:rdfpredicate:firstName

after parsing in Turtle or SPARQL. But in the far future, there could also be an RDF that uses no URIs as predicates, with all frequently used predicates mapped to shorter names. Turtle and SPARQL have already started going down this route by introducing 'a' as abbreviation for rdf:type. They could also add 'label' as alias for rdfs:label or 'superClass' as alias for rdfs:subClassOf.

Also note that schema.org and wikidata use the same namespace for all predicates, so basically it's the same as if no namespace exists in their worlds.

@dbooth-boston
Copy link
Collaborator

Interesting idea. I think there are two ways this could be interpreted: as globally scoped predicates that have minimal semantic commitment; or as some kind of locally scoped predicates. As shown, it looks like you are treating them as globally scoped.

If they are globally scoped, then inference rules that use them must qualify their intended scope, such as by indicating the class of the subject, as you describe.

If they are locally scoped, then we'll need ways to manipulate scopes, such as we have in programming languages. For example, when a library is imported it pulls a set of identifiers into the current scope, or allows an identifier from a foreign scope to be bound to an identifier in the current scope.

I wonder what other pros and cons there might be of treating them as globally scoped vs locally scoped.

@amirouche
Copy link

In my experience 'predicates as strings' is more approachable, and less interoperable.

@HughGlaser
Copy link
Collaborator

I think the issue is perhaps tied up with the question of literals as subjects too.
That is issue #21
Note that that issue discusses literals as predicates a bit too.
Of course each of these steps takes RDF further away from Linked Data.
Even so, personally I quite like such relaxation of the URI requirements.

@HolgerKnublauch
Copy link
Author

Yes there is some overlap with #21. For many of the customers that we see, the concept of Linked Data is not relevant as they only operate on controlled enterprise graphs. Also I believe even from a Linked Data perspective, having simpler property names shouldn't be a problem because it is far more relevant that the subjects and objects are URIs than the predicate.

@namedgraph
Copy link

I don't understand how typing one less character (property instead of :property) can justify thousands of manhours of specification and implementation work this change would incur on the ecosystem.

@amirouche
Copy link

Did you consider a mini-rdf on top of what RDF can be implemented?

@HolgerKnublauch
Copy link
Author

@namedgraph: I believe one goal of an EasierRDF project is to align better with what most software people are used to. Backward compatibility is desirable but by definition difficult to achieve forever.

In the case of allowing strings as predicates, there is at least one simple approach, namely to convert them into special URIs, allowing existing infrastructure to be re-used without issues - urn:rdfpredicate:firstName

With this approach, the only software changes would be to the Turtle and SPARQL parsers, to convert these strings into special URIs.

@chiarcos
Copy link

chiarcos commented Nov 25, 2023 via email

@chiarcos
Copy link

chiarcos commented Nov 25, 2023 via email

@namedgraph
Copy link

@HolgerKnublauch I disagree. IMO we should make RDF-based software so flexible and powerful (in ways that would be impossible with RDF) so that we can empower the non-software people to work with data in new ways. That is a much broader audience than "software people".

Trying to bring RDF to the general "software people" always ends up in attempts to dumb down RDF, because part of the RDF community seems to think that it's its job to accomodate while the "software people" can't be bothered to put in the effort and learn anything new.

@HolgerKnublauch
Copy link
Author

@namedgraph For how many years has the RDF community already tried to convert everyone else, with little success. 20 years now? It remains a niche technology. Maybe success is just around the corner, maybe not.

People coming from other communities just find it alien and too complex. One of the particularly alien concepts is that properties have a global identity. This is basically unknown in any other language. Combine this with the unusual semantics of rdfs:range and rdfs:domain and you can understand why few people want to invest into understanding this stack.

You are saying it would dumb down RDF, but what is actually the value of global identifiers for properties, leaving aside what RDF Schema tried to do: using property definitions to infer the types of subjects and objects without explicitly requiring type triples. What else is there apart from that use case?

@HolgerKnublauch
Copy link
Author

Hypothetical syntax that only uses strings in predicate position, while bringing in existing namespace-based predicates:

@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix ex: <http://example.org/ns#> .
@alias a: rdf:type .
@alias label: rdfs:label .

ex:JohnDoe
    a ex:Person ;
    label "John Doe" ;
    firstName "John" ;
    lastName "Doe" ;
    age 42 .

@HughGlaser
Copy link
Collaborator

A real aside, perhaps:...

I find it gently amusing that the example predicates being used are exactly the ones I would really need to look up (using Linked Data?) to see if I can find out what the author might mean.
I wonder which of firstName and lastName might be a family name, for example?

@HolgerKnublauch
Copy link
Author

@HughGlaser not aside at all but important. Like in OWL and SHACL, a property would have its meaning in the context of a class or shape. So in this case, you would look the properties up by following the rdf:type, here ex:Person. It's still linked, self-describing data. This is exactly like you would look up the meaning of fields in a (Java) object or the parameters of a function - you start at the surrounding entity.

@dbooth-boston
Copy link
Collaborator

dbooth-boston commented Nov 25, 2023

Would all bare predicates then be implicitly scoped to the class of the subject on which they are used? If so, which class if the subject is in multiple classes?

@HolgerKnublauch
Copy link
Author

@dbooth-boston The problem of type clashes already exists, for example if you have two rdf:types with two owl:Classes that carry owl:Restrictions with different owl:allValuesFrom on the same property. In SHACL this would mean that all constraints apply.

@HughGlaser
Copy link
Collaborator

I find myself wondering if it should not be better as:
"a" ex:Person ;
"label" "John Doe" ;
"first name" "John" ;
etc., since the idea is "strings as predicates".
And indeed, there might be a difference between something that is explicitly an alias, and things that are not.
So it could be useful that 'a' might be different from '"a"', if you can see what I mean with the different quotes.
(I don't see much point in doing this if we just use typical camelCase strings without a ':' in front.)
I would love to have:
"Billy Bob Brockali" "was born in" "Ballingdon Bottom".
"Ballingdon Bottom" "is located in" "Britain".

@dbooth-boston
Copy link
Collaborator

I don't understand how typing one less character (property instead of :property) can justify thousands of manhours of specification and implementation work this change would incur on the ecosystem.

  1. The impact on users is much more than a single character of extra typing. Using :property forces users to declare a namespace, which forces them to commit to a global URI. We've seen over many years of experience that this alone presents a barrier, because it forces them to go down the unproductive rabbit hole of figuring out what URI allocation strategy to use and -- depending on the strategy chosen -- where and how to host it. See IRI allocation #12 .
  2. While this one simplification may not be enough to justify the cost of changing tools and standards, I view it as one potential ingredient in combination of simplifications that -- taken together -- may well be worth adopting.

Trying to bring RDF to the general "software people" always ends up in attempts to dumb down RDF, because part of the RDF community seems to think that it's its job to accommodate while the "software people" can't be bothered to put in the effort and learn anything new.

If the RDF community were thriving and growing you might have a valid point in blaming developer laziness. But given that RDF is clearly losing out to easier-to-use competitors, I don't buy that argument.

I think developers are rationally deciding that the effort required to "learn something new" with RDF is not worth the payoff, given the availability of easier "good enough" alternatives, even if the RDF approach may seem more appealing in a theoretical sense.

The goal here is to make RDF -- or a successor built on RDF -- significantly easier to use, while retaining RDF's benefits and as much of the tooling and standards as possible.

@TallTed
Copy link
Member

TallTed commented Nov 26, 2023

Requiring some things — e.g., RDF Subjects and Predicates — always be HTTP/S URIs means that those HTTP/S URIs can be treated as superkeys, which reach across DBMS schema, because they always denote the same thing. This is what delivers the Linked Data magic, and comprises the Giant Global Graph of our Semantic Webs (yes, intentionally plural). (Concerns like temporality do mean that Named Graphs or similar must be brought to bear, but this is handled with another batch of URIs, not arbitrary strings.)

Letting RDF Subjects and Predicates be arbitrary strings would turn RDF into yet another semantically unjoinable mishmash of schemata, and, if merged without great care, could render the current bunch of Semantic Webs a giant global mudpuddle of incoherency.


As to RDF's "failure" because it hasn't replaced tabular relational DBMS (a/k/a SQL) nor labeled property graphs — "horses for courses" comes to mind.

RDF is VERY well suited to data where the overall data structure is not known at project start, where the "schema" will evolve over time — e.g., "schema last" — and where data is sparse, i.e., where the values of some predicates/attributes may not be known for any given subject/entity but you still want to collect all those values that are known.

Tabular relational DBMS and their relational integrity and other restrictions makes them VERY well suited to dense data, i.e., where the values of all predicates/attributes for any given subject/entity are known, and you only want to collect the values of any given predicate/attribute when they are known for all subjects/entities.

Changing a tabular relational DBMS schema once deployed can be a HUGE undertaking, and may require updates to all tools in use against that schema. On the other hand, adding a property/attribute to an RDF graph or data set is typically a trivial undertaking, and tools which operate against that data do not typically require updates specific to the new attribute/property.


The idea of the "special treatment of arbitrary strings in subject or predicate position", coercing them into URIs, has some potential for implementability, though it doesn't solve the problem of "local only definition". I cannot dererefence your freshly minted URI, so I cannot confirm whether your intended meaning matches mine. This is, to me, a non-starter, overall.

@TallTed
Copy link
Member

TallTed commented Nov 26, 2023

[@HughGlaser] I would love to have:
"Billy Bob Brockali" "was born in" "Ballingdon Bottom".
"Ballingdon Bottom" "is located in" "Britain".

I think you just want more mature tools, that will show you labels instead of raw URIs, while the URIs are in place behind the screen (a/k/a under the covers).

"Billy Bob Brockali" "was born in" "Ballingdon Bottom" .
"Ballingdon Bottom" "is located in" "Britain" .

@namedgraph
Copy link

namedgraph commented Nov 26, 2023

@dbooth-boston why do you see it as a competition? RDF will always lose out in the marketing sense because Neo4J alone has received $500M+ in VC funding.

Just stop trying to convert developers to RDF or use mainstream adoption as the success criteria. Many of their problems that do not require data interchange might simply be solved with JSON, or with property graphs for that matter.

The premise of Semantic Web was to deliver a new generation of the Web that is smarter, more automated etc. We haven't really seen that yet, and that's not the fault of the RDF model but of the software development still using legacy architectures. Why not focus our efforts on software that exploits RDF to the max and delivers something previously impossible? We have barely scratched the surface yet.

@afs
Copy link
Contributor

afs commented Nov 26, 2023

Domain specific languages would help to make data writing easier int he sense of being more natural to the domain (SHACL-driven?). They could "compile" to Turtle/N-triples/JSON-LD with little more than guided text processing.

@fekaputra
Copy link

fekaputra commented Nov 26, 2023 via email

@HughGlaser
Copy link
Collaborator

HughGlaser commented Nov 26, 2023 via email

@HolgerKnublauch
Copy link
Author

On firstName vs "firstName, note that JavaScript allows both forms equivalently, assuming the string is a valid identifier.

let obj = {
    firstName: "Hugh",
    "lastName": "Glaser"
}

On the general topic, it is rather obvious that the W3C processes will not allow making such changes because by now there are too many established users and vendors who will expect predicates to continue to be (potentially resolvable) URIs. So any discussion here is rather academic, as input for a future WG that is independent of RDF as we know it. Maybe if we frame these topics accordingly, it will raise fewer concerns by those who will want to preserve the status quo.

@namedgraph
Copy link

namedgraph commented Dec 2, 2023

@dbooth-boston you should enable Discussions :)

@dbooth-boston
Copy link
Collaborator

@dbooth-boston you should enable Discussions :)

Done: #107

@TallTed
Copy link
Member

TallTed commented Dec 4, 2023

Please be aware that GitHub's "Discussions" are more of a Q&A that appears to have been modeled after the StackOverflow family of sites, than they are a discussion space which calls for threading message trees along the lines of what was once NetNews/Usenet/NNTP ... so what you intended to do may not be doable there, @namedgraph.

@redmer
Copy link

redmer commented Jan 3, 2024

Note that quasi aliasing can already be done with prefixes, albeit uncommon and not whilst re-using the prefix.

JSON-LD of course also allows mapping JSON keys (aliases) to other URLs.

Re-using @fekaputra's example:

PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX ex: <http://example.org/ns#>

# aliases as prefixes
PREFIX label: <http://www.w3.org/2000/01/rdf-schema#label> 
PREFIX firstName: <http://example.org/ns#firstName>
PREFIX lastName: <http://example.org/ns#lastName>
PREFIX age: <http://example.org/ns#age>

ex:JohnDoe
    a ex:Person ;
    label: "John Doe" ;
    firstName: "John" ;
    lastName: "Doe" ;
    age: 42 .
ex:JohnDoe
    a ex:Person ;
-    label "John Doe" ;
+    label: "John Doe" ;
-    firstName "John" ;
+    firstName: "John" ;
-    lastName "Doe" ;
+    lastName: "Doe" ;
-    age 42 .
+    age: 42 .

@amirouche
Copy link

Oh, sorry, I misread the topic. It should be best to update to topic to mention: byte strings. Otherwise, we will write past each other.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

10 participants