Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JSON-LD serialization apparently loses XSD type information. #2812

Closed
lu-pl opened this issue Jun 29, 2024 · 7 comments · Fixed by #2889
Closed

JSON-LD serialization apparently loses XSD type information. #2812

lu-pl opened this issue Jun 29, 2024 · 7 comments · Fixed by #2889
Labels
awaiting feedback More feedback is needed from the author of the PR or Issue.

Comments

@lu-pl
Copy link
Contributor

lu-pl commented Jun 29, 2024

JSON-LD serialization apparently loses XSD type information.

from rdflib import Graph, Literal, URIRef, XSD


graph: Graph = Graph()
graph.add(
    (
        URIRef("https://test.subject"),
        URIRef("https://test.predicate"),
        Literal("test type", datatype=XSD.string),
    )
)

## 1. check literal type of constructed graph object
# pass
assert all(o.datatype for o in graph.objects())

## 2. check literal type of xml parsed graph object
xml_serialized: str = graph.serialize(format="xml")
graph_xml_parsed: Graph = Graph().parse(data=xml_serialized, format="xml")

# pass
assert all(o.datatype for o in graph_xml_parsed.objects())

## 3. check literal type of json-ld parsed graph object (fails)
json_ld_serialized: str = graph.serialize(format="json-ld")
graph_json_parsed: Graph = Graph().parse(data=json_ld_serialized, format="json-ld")

# fail
assert all(o.datatype for o in graph_json_parsed.objects())
@nicholascar
Copy link
Member

The default JSON-LD & Turtle datatypes are xsd:string. As per the JSON-LD type coercion section.

So

[
  {
    "@id": "https://test.subject",
    "https://test.predicate": [
      {
        "@value": "test type",
        "@type": "http://www.w3.org/2001/XMLSchema#string"
      }
    ]
  }
]

is equivalent to

[
  {
    "@id": "https://test.subject",
    "https://test.predicate": [
      {
        "@value": "test type"
      }
    ]
  }
]

If you apply any sort of special datatype to the literal, JSON-LD & Turtle etc will preserve it.

So nothing is lost and I see no issue here.

@nicholascar nicholascar added the awaiting feedback More feedback is needed from the author of the PR or Issue. label Aug 11, 2024
@lu-pl
Copy link
Contributor Author

lu-pl commented Aug 11, 2024

True, if a Literal.datatype attribute is None it means the default datatype, XSD.string.

Still I find it confusing that if XSD.string is explicitly passed as datatype to a Literal object, the JSON-LD serializer is not explicit about it while the Turtle and XML serializers are.

E.g. serializing the above graph object to turtle results in

@prefix ns1: <https://> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .

ns1:test.subject ns1:test.predicate "test type"^^xsd:string .

serializing to JSON-LD results in

[
  {
    "@id": "https://test.subject",
    "https://test.predicate": [
      {
        "@value": "test type"
      }
    ]
  }
]

This behavior makes it at least more difficult to test for explicitly passed datatypes, which is where I encountered the issue.

@lu-pl
Copy link
Contributor Author

lu-pl commented Aug 11, 2024

I think the behavior might be controlled by line 273 in jsonld.py, I will need to take a closer look though.

@nicholascar
Copy link
Member

From the RDF 1.2 Spec, Section 3.3:

Please note that concrete syntaxes MAY support simple literals consisting of only a lexical form without any datatype IRI, language tag, or base direction. Simple literals are syntactic sugar for abstract syntax literals with the datatype IRI http://www.w3.org/2001/XMLSchema#string (which is commonly abbreviated as xsd:string).

So the behaviour is fine but I agree that if xsd:string is given, there should be no problem with it being preserved and perhaps that is a more natural behaviour.

Please feel free to create a PR to implement this behaviour! You might check the Turtle and HexTuples serializations too...

lu-pl added a commit to lu-pl/rdflib that referenced this issue Aug 13, 2024
Supplying an XSD type argument to the rdflib.Literal datatype
parameter should be reflected in JSON-LD serializations.

Closes: RDFLib#2812
lu-pl added a commit to lu-pl/rdflib that referenced this issue Aug 14, 2024
Supplying an XSD type argument to the rdflib.Literal datatype
parameter should be reflected in JSON-LD serializations.

Closes: RDFLib#2812
nicholascar pushed a commit that referenced this issue Sep 29, 2024
…zation (#2889)

* feat: Reflect explicitly XSD-typed Literals in JSON-LD serialization

Supplying an XSD type argument to the rdflib.Literal datatype
parameter should be reflected in JSON-LD serializations.

Closes: #2812

* test: Add/modify tests for XSD-typed JSON-LD serialization

Modify test "t#0018" in JSON-LD test-suite: Add XSD types to the
expected JSON-LD output.

Add test "t#0020" in JSON-LD test-suite: Add another test with mixed
explicit typing in the input source.

---------

Co-authored-by: Ashley Sommer <[email protected]>
@niklasl
Copy link
Member

niklasl commented Oct 3, 2024

This change is problematic. In RDF 1.1 a Literal always has a datatype (see #1326 and #2460). Once that is fixed, this change for JSON-LD will add "@type": "http://www.w3.org/2001/XMLSchema#string" to all string literals. (To be fair, it would probably add it to Turtle/TriG and RDF/XML too, so a more exhaustive fix is likely needed.)

The behavior in the jsonld implementation was a compromise which #2889 undid. A more correct fix should coordinate with the fix of the aforementioned problems.

In the JSON-LD 1.1 algorithm spec, step 2.4, this can be controlled uniformly with the useNativeTypes option (see 2.4.1 specifially for xsd:string).

@lu-pl
Copy link
Contributor Author

lu-pl commented Oct 3, 2024

Thanks for pointing this out!

I still feel like #2889 is justified until the above mentioned changes are in place though.

@nicholascar
Copy link
Member

Maintainers are considering a complete review of all the parsers and serializers. The plan is to be specific about the parameters that each can accept, to enable some that are nto currently available, and document them all properly. So things like useNativeTypes should be available in g.serialize(format="json-ld"...) and g.serialize(format="turtle") & g.serialize(format="longturtle") would decompose down to g.serialize(format="turtle", vcs_optimised=True) etc.

But, as per the comment above, we don't have this functionality yet, so we'd best off leave the PR as implemented above and then enable choice when we support useNativeTypes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
awaiting feedback More feedback is needed from the author of the PR or Issue.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants