Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

generating IRIs? #346

Open
flyingmachine opened this issue Jan 20, 2023 · 4 comments
Open

generating IRIs? #346

flyingmachine opened this issue Jan 20, 2023 · 4 comments
Labels
idea feature request or idea for fluree team needs specification

Comments

@flyingmachine
Copy link
Contributor

Best practice for working with RDF data is to use IRIs to identify business entities. However, it is unclear how developers should generate IRIs. What tools do we want to provide for generating IRIs, and what guidance do we want to give?

Considerations for generating IRIs:

  1. Some entities have natural identifiers. This could be a SKU or a library of congress control number. In this case, the entity already has some globally unique identifier associated with it, managed by an institution. In this case, the recommendation would be to use the natural identifier in an IRI.
  2. For some entities we want to avoid natural identifiers. Consider user accounts for a SAAS app. In this case, we might purposely want to generate an identifier that has no relation to the entity's data to avoid accidentally exposing info.
  3. Developers might expect or need an experience similar to a db like postgres, where the db provides resources that ensure newly-inserted entities will not be assigned an existing identifier
  4. Developers might expect or need to be able to make use of temporary IDs in transactions that get converted into full IRIs on insertion. The identifiers generated on insertion may need to be both non-natural (random or sequential) and distinct from existing identifiers.

For the last point, an example might be that we want to insert two new entities with a "mutual best friend" relationship. There are a lot of ways we could do this incorrectly that we want to avoid. How can we insert data that looks something like:

[{"@id": tempid-0, "bestFriend": {"@id": tempid-1}},
 {"@id": tempid-1, "bestFriend": {"@id": tempid-0}}]

and have it get turned into something like:

[{"@id": "ex:200", "bestFriend": {"@id": "ex:201"}},
 {"@id": "ex:201", "bestFriend": {"@id": "ex:200"}}]

Instead of ex:200 or ex:201, we could have UUIDs or whatever.

My recommendation here is to start by defining the UI we want to support before diving into the details of how it would work internally.

@flyingmachine flyingmachine added the idea feature request or idea for fluree team label Jan 20, 2023
@bplatz
Copy link
Contributor

bplatz commented Jan 22, 2023

For point (4.), there is a standard... it states that a blank node (starts with _:) is a temporary id, and the system it enters is free to remap it to any other internal id.

We should consistently remap these to our format (starts with _:f), and then return the mapping to the user.

We could also allow, if we thought this was insufficient, an alternative which is to use an integer for @id and treat it the same way as above.

@flyingmachine
Copy link
Contributor Author

I've been resistant to the suggestion that we treat blank node identifiers in transactions as temporary ids because I feel like those semantics aren't obvious from the specs I've been reading. I've found only a couple mentions that would let me infer the semantics:

It is worth noting that blank node identifiers may be relabeled during processing. (https://www.w3.org/TR/json-ld11/#identifying-blank-nodes)

Blank node identifiers may be automatically introduced by algorithms such as flattening, but they are also useful for authors to describe such relationships directly. (https://www.w3.org/TR/json-ld11/#embedding)

However, I do think that the suggestion makes sense. Now I'm just wondering about how to convey it so that it's not confusing to users. One thing I'm wondering is, how would we handle blank node ids that already start with _:f?

@bplatz
Copy link
Contributor

bplatz commented Feb 2, 2023

how would we handle blank node ids that already start with _:f

Good question, and idea:

_:f... is a Fluree internal identifier, so if supplied we'd look it up, as we'd assume you are updating an existing piece of info.

If it doesn't already exist, we'd have to decide between:
a) creating a new _:f... and sending back the tempid for the mapping
b) throw an exception, saying they should use a different prefix for tempids

@bplatz
Copy link
Contributor

bplatz commented Feb 2, 2023

Also worth keeping in mind there are a couple use cases here... and ideally one solutions works for both, but may not:

  1. User is importing existing standards-based data (here we'd always accept/re-map blank nodes)
  2. User is fabricating new data, but chooses not to create a unique identifier for it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
idea feature request or idea for fluree team needs specification
Projects
None yet
Development

No branches or pull requests

3 participants