Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

design: Sub-Graph Modules (sgm) #54

Closed
23 tasks
halcyondude opened this issue Apr 26, 2022 · 9 comments · Fixed by #50
Closed
23 tasks

design: Sub-Graph Modules (sgm) #54

halcyondude opened this issue Apr 26, 2022 · 9 comments · Fixed by #50
Assignees
Labels
data Data Model, GraphQL documentation Improvements or additions to documentation p1-high sgm Sub-Graph Module (sgm)
Milestone

Comments

@halcyondude
Copy link
Collaborator

halcyondude commented Apr 26, 2022

Sub-Graph Modules

Goals

  • facilitate interactive, dynamic expansion of the graph using the core model as a nucleus and/or seed.
  • learn from k8s! Don't make "special" kind:'s of things (e.g. Pod, Deployment, Ingress) part of a "built-in" data model, then a custom mechanism for extensibility (CRD's). Instead make the core data model structured with the same compositional mechanisms.
  • work with a broad, arbitrary set of targets, environments, toolchains, and compositional frameworks
  • Enable a community to gel around this project such that work can happen safely in parallel
  • facilitate self-service + automated
    • modern CI (GitHub Actions) to validate SGM's work at PR level
    • autogenerate comprehensive documentation
    • allow exploration and composition
  • able to be easily distributed on existing transports.

Tasks

  • implement inheritance model (see below)

  • implement dependency mechanism

  • implement core data model as { sgm-base, sgm-cncf, sgm-xyz, ... }

    • base: (e.g. Object)
    • landscape: for all Linux Foundation Landscapes (see https://landscapes.dev)
    • sgm-cncf: Card, Member., Project., TOC, TAG, EUG. Licenses, GitRepo)
      • sgm-cncf-docs (DD's, Charters, governance)
    • sgm-git (Commit, Author, Branch, ...)
    • sgm-github (Issue, PR, Comment, Workflow/Action, Teams (nested)
  • Sub-Graph Modules: Design and Architecture

    • Create Summary info, slides, and a blog post ("How to extend our graph")
    • reach out to neo4j for feedback on sub-graph module approach
    • reach out to CNAB.io project to assess viability
    • sgm:blog + template + examples

Types of Sub-Graph Modules (SGM)

Each of these is an Interface, acting as a base class with shared properties. Reasons to structure in this way include:

  1. enables treating classes of things polymorphically while leaving concrete instances' portion of state undisturbed.
  2. lowers the barrier to entry for new contributions
  3. provide blast radii for the model as a whole
  4. facilitate pruning and cardinality reduction of test surface requisite to validate changes in CI. As even casual data sets have the potential to be non-trivial in size, and potential cost, an intentional & structured approach is warranted.
base types derived types
blogs CNCF, thenewstack, medium.*, LinkedIn Posts, ...
boards GH Discuss, StackOverflow
corp crunchbase, yahoofinance
email cncf project lists, k8s lists
packages brew, choco, crate, deb, deno, go, maven, npm, pip, rpm
rtc slack, discord, gitter
social twitter, linkedin
threats nist
learning youtube, books, online courses (public / open only!)

Each module shall have:

  • base metadata (name, version, ...)
  • GraphQL Schema fragment
  • cypher, javascript / other expression of orchestrating growing/pruning/mutating/refactoring/... the graph
  • Description / Documentation covering entities
  • png, svg,
    • portion of the model (from arrows.app or similar) <-- used for visual diff later
    • (optional, preferred): SVG/png used for Bloom and other front ends to annotate nodes
  • sample data, patterns, and queries
  • (optional) label map providing association between the module's own names/terms, and what they might be called in the broader data model that the SGM is being loaded into. This will reduce fragility, and provide a mitigation for the inevitable label name mismatches that could happen as a result of parallel development. it'll also make these more portable
  • CI

Taking this approach facilitates creation of a rich set of capabilities impacting model training, CI, and developer experience.

By using snapshots of the graph (Graph Projections TODO doc link) in a manner similar to virtual machine snapshot trees (esx, hyper-v, ...), CI can

  • quickly set up base cases and test variations for as a matrix
  • enable smart cross-SGM dependency-aware CI to be used, such as https://zuul-ci.org or similar workflows
  • enable automated ML model experimentation and training at scale
  • per-PR live instances

We'll also benefit from a sustainable, portable, useable data model that is documented.

(TODO: update w/ final set)

.
├── blogs
│   └── sgm-blogcncf
├── boards
│   ├── sgm-ghdiscuss
│   └── sgm-stackoverflow
├── core
│   └── generated
├── corp
│   ├── sgm-crunchbase
│   └── sgm-yahoofinance
├── email
├── packages
│   ├── sgm-brew
│   ├── sgm-choco
│   ├── sgm-crate
│   ├── sgm-deb
│   ├── sgm-deno
│   ├── sgm-go
│   ├── sgm-maven
│   ├── sgm-npm
│   ├── sgm-pip
│   └── sgm-rpm
├── rtc
│   ├── sgm-discord
│   └── sgm-slack
├── social
│   ├── sgm-linkedin
│   └── sgm-twitter
├── threats
│   └── sgm-nist
└── learning
    └── sgm-youtube

ACTIVE DEVELOPMENT

Closely related to this issue is: #4 (branch)

How GraphQL Interfaces Work

https://neo4j.com/docs/graphql-manual/current/type-definitions/interfaces/#_directive_inheritance

Any directives present on an interface or its fields will be "inherited" by any object types implementing it. For example, the type definitions above could be refactored to have the @relationship directive on the actors field in the Production interface instead of on each implementing type as it is currently:

interface Production {
    title: String!
    actors: [Actor!]! @relationship(type: "ACTED_IN", direction: IN, properties: "ActedIn")
}

type Movie implements Production {
    title: String!
    actors: [Actor!]!
    runtime: Int!
}

type Series implements Production {
    title: String!
    actors: [Actor!]!
    episodes: Int!
}

interface ActedIn @relationshipProperties {
    role: String!
}

type Actor {
    name: String!
    actedIn: [Production!]! @relationship(type: "ACTED_IN", direction: OUT, properties: "ActedIn")
}

https://neo4j.com/docs/graphql-manual/current/type-definitions/interfaces/#_overriding

In addition to inheritance, directives can be overridden on a per-implementation basis. Say you had an interface defining some Content, with some basic authorization rules:

interface Content
    @auth(rules: [{ operations: [CREATE, UPDATE, DELETE], allow: { author: { username: "$jwt.sub" } } }]) {
    title: String!
    author: [Author!]! @relationship(type: "HAS_CONTENT", direction: IN)
}

type User {
    username: String!
    content: [Content!]! @relationship(type: "HAS_CONTENT", direction: OUT)
}

type PublicContent implements Content {
    title: String!
    author: [Author!]!
}

type PrivateContent implements Content
    @auth(rules: [{ operations: [CREATE, READ, UPDATE, DELETE], allow: { author: { username: "$jwt.sub" } } }]) {
    title: String!
    author: [Author!]!
}

Core Data Model

core-png

@halcyondude halcyondude added this to the data-model-v1 milestone Apr 26, 2022
@halcyondude halcyondude added documentation Improvements or additions to documentation p1-high data Data Model, GraphQL sgm Sub-Graph Module (sgm) labels Apr 26, 2022
@halcyondude halcyondude self-assigned this Apr 26, 2022
@halcyondude halcyondude moved this from Triage to Done in landscape-graph Apr 26, 2022
@halcyondude halcyondude moved this to Triage in landscape-graph Apr 26, 2022
@halcyondude halcyondude moved this from Done to In Progress in landscape-graph Apr 26, 2022
@halcyondude halcyondude linked a pull request Apr 26, 2022 that will close this issue
@jexp
Copy link

jexp commented Apr 27, 2022

That's why labels are like tags. You can add them on the fly and they are useful to tag things, group them or denote status etc.

So you don't have to create a complex ontology structure just tag your nodes with the labels that represent the roles they play.

@AlexxNica
Copy link

Hey there, @halcyondude! Congrats on the awesome work and research you're doing! I'm using a lot of your research to guide my own, which I started a while back for a Filecoin Plus project.

Through my research, I'm trying to follow similar concepts you've described, and I feel like the "Sub-Graph Modules" concept would benefit from the Apollo Federation architecture. Do you know about it already? If not, taking a look may be worth it!

Here are some starting points:

One thing, though, is that it seems Neo4j doesn't readily integrate with it, but it seems easy to make it do so. Here's a repository from the Apollo team that demonstrates it working with Neo4j: https://github.com/apollosolutions/neo4j-subgraph

@halcyondude
Copy link
Collaborator Author

halcyondude commented May 5, 2022

Hi there! I had a look at Apollo Federation, and while it's pretty cool, I'm not sure it's the best fit here, most specifically because subscriptions are not supported. However much of the conceptual information on Schema composition is quite relevant and makes sense.

For now to keep things simple, and not have another layer of indirection, planning to use graphql fragments and a simple directory/manifest approach.

Another concern I had on using Apollo Federation is the impact on the client/query layer, as well as requiring a gateway, the supergraph/subgraph idiom and such to be part of the runtime. In the ideal case (I posit) the compositional model for the data layer is immaterial to the final data model without coupling.

@halcyondude
Copy link
Collaborator Author

@halcyondude
Copy link
Collaborator Author

image

@AlexxNica
Copy link

@halcyondude That's great! Are you planning on centralizing everything into a single database vendor (in this case, Neo4j for now)?

@halcyondude
Copy link
Collaborator Author

halcyondude commented May 12, 2022

@halcyondude That's great! Are you planning on centralizing everything into a single database vendor (in this case, Neo4j for now)?

Using Neo4j for the time being, primarily due to existence of APOC and the GDS libraries, and free availability of Neo4j Desktop as a native experience across platforms, with a low[er] barrier to entry for new contributors. As the GraphQL javascript library does all the translation of GraphQL --> OpenCypher, really the standardization is on OpenCypher, which is implemented by other Graph Databases as well (https://opencypher.org/projects)

@halcyondude
Copy link
Collaborator Author

Recently published, apollo's stack just leveled up in terms of oss offerings...

https://www.apollographql.com/blog/announcement/backend/the-supergraph-a-new-way-to-think-about-graphql

https://www.apollographql.com/blog/announcement/backend/apollo-router-our-graphql-federation-runtime-in-rust

It's likely that after a more detailed review we'll move forward with some of these components.

@halcyondude
Copy link
Collaborator Author

halcyondude commented Aug 11, 2022

Update, after findings in #4

The the SGM's implemented (initially) are exposed to the supergraph via GraphQL endpoints. This provides strong typing, and interoperability with a deep canon of existing libraries and algorithms, inclusive of ontology/taxonomy/semantic frameworks, as well as documentation and visualization tools.

A variety of back-end database stores can be used behind that interface. In my view, a graph database w/ the neo4j-graphql javascript library is compelling as it removes the need for writing Resolvers, handling sorting/filtering/pagination, and authoring Mutations. Being able to drop into Cypher via @ directives is a nice escape hatch as well as a way to expose Cypher query capabilities and results as GraphQL.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
data Data Model, GraphQL documentation Improvements or additions to documentation p1-high sgm Sub-Graph Module (sgm)
Projects
Status: Done
Development

Successfully merging a pull request may close this issue.

3 participants