-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add BaseResource, adjust code paths to work with BaseResource types #12
Conversation
* Added utility method to get kind from resource, respecting any potential BaseResource that has its `kind` field specified by any inheriting objects * Refactored looping code to be configurable, for potential future need to re-loop after initial relationship creation
I'd like to rebase before merging to clean up the commit history, CI was giving me a bit of a hard time, especially with |
Hahaha no worries, squash merging is absolutely fine (it's also what I did the previous one with ;) ) For this PR as a whole, I think we need to have a bit more of a discussion on it. Just to help line everything up :) The first point is - do you know why we have the two stages of relationship generation? I ask as I think the recursing in the neo4j functionality is trying to achieve the same thing Can you elaborate a tad on your objectives with BaseResource vs Resource? Is it purely so we have a class that can be used for non "resource" definitions (ie the SecretReference)? If so, were you thinking they be added as nodes to Neo4j? Have you skimmed the PolicyRule logic in IceKube? As that's the current "method" for doing it. Not saying it has to stay that way of course, just pointing it out as a lets compare against whats existing to figure out the better way forward based on what functionality and features we want out of that side of things. |
I'm actually not sure, this is my first time getting to use a graph database, so I'm not sure exactly why relationships can appear after a first pass, but not subsequent passes. Any reading material is appreciated
I want to have a way to track non-resource definitions and was wanting them to appear as nodes in Neo4j. The reason I have a base class is to reduce some code duplication + add the object hashing functionality to attempt to make each sub-"resource" definition distinct (although I know there can be duplicates if we're basing it off hashes).
Having a shared base class is helpful when deserializing data from Neo4j, we can re-use the logic in I'm currently testing out the |
It's not a graph database specific thing, it's more the way IceKube works. When querying data from the API server, there is no guarantee that all resources are enumerated. A good example of this are users, most clusters don't have a resource for these. But users clearly do exist as you have them in role bindings, etc. Or sometimes, when on client engagements, they don't want us accessing secrets, so we can't query for any of them. When resources are creating relationships, they will figure out what other resources they refer to (the The second pass is to then make sure we account for those newly created resources as part of the RBAC processing. We avoid doing them in the first run as they are the longest running part of relationship generation and no point running them when not all resources are computed. For example, let's say a service account can impersonate a user, if that user doesn't exist in the database yet, we wouldn't see it. So we wait till after we are sure all resources we could know about are created. I raised this point, as I think that is similar to what your recursive neo4j functionality is doing. Going through nodes its creating and adding them to the create functionality. Were the nodes to just create the appropriate relationships wanted, then that would happen automatically.
So at the moment, that would be included in the "raw" field in a node. What are you trying to achieve with having them as nodes in Neo4j? Would it be separate to a relationship between two nodes where one node is specified as a non-resource definition?
OK, I'll take a new look when pushed :) |
My personal use case would be to track down deployment/dependency chains more effectively than I was able to in the past with ArgoCD, as well as querying and figuring out ownership/references to objects in a quick manner. I'm thinking that this tool would be extremely useful when debugging cluster issues during incidents/disaster recovery, and having a more complete graph of the cluster itself would be useful in that scenario. Since there's no support for Maps in Neo4j, the solution I came up with was to separate each object out into its own node so that its data was queryable. |
So something like a malicious pod, tracking down which replica set and therefore which deployments thats from. So you know what resources to check logs for on who created it, etc? Am I correct in thats the kind of view you're trying to get? |
Yeah, that's a good use case but one I didn't think about - I'm not professionally involved in security, I'm more of an enthusiast in that respect, but I'm in the DevOps/SRE side of things, and I'd honestly rather query the graph to track down resources than to fiddle around with I'd rather be able to do
than having to remember the spec:
It's just way more elegant and much better to quickly browse and see relationships between resources. Being able to map network topologies would also be really cool, and I'm sure it could be supported in the future as well. |
OK, that makes sense. I don't think you need to make a new resource type for the intermediary references if this is your goal. You could turn
into
We would just add appropriate relationships that parse the references to other resources, and add them to the main resources relationships. Similar to how pods reference service accounts that they use, or role bindings reference the role and the subject defined within. For the various resource types, we identify how they relate / refer to other resources, and parse that in and add relationships. If the reference model is the same across resource types, we can separate that out into its own class for easy re-use, similar to what we did with PolicyRule. Does that make sense, or do you see something I'm missing? |
That makes sense to me, I do prefer the shorter syntax, but it comes with the loss of information if not included in the main object. For example: Lines 61 to 67 in 86ce93b
The If there's a way to get the shorter syntax working while also not throwing away the information + getting relationships for free by defining types, then I'm game for it. I'd settle for a "verbose" mode that dumps non-resource objects into Neo4j, but have the default be to only do light contextualization, i.e. what we're already doing |
Are there other areas you can think where that information might be lost? Pods / Containers was the only one I really came across before and spent a while debating adding containers as another resource, before ending up not as I at the time didn't see much use for it at the time. Considering in Kubernetes land, Pods are considered the "smallest" resource, and when I was thinking of reasons why I might want to care about a container, most things would have been true for the pod overall so didn't see the need for it then. Of course, things can change. What kind of distinct information as an example were you aiming to get from the container? |
One use case could be container image scanning, i.e. scanning for known vulnerabilities or CVEs w/ Snyk or something along those lines. Also would be very helpful to know if a Pod's container is an init container, ephemeral, or long lived. I think exposing that sort of information to Neo4j queries would be very powerful for any new applications built on top of the graph in neo4j. Having shortcuts, i.e. directly defining A lot of my wants at the moment I don't have examples to demonstrate, but I honestly would've loved to have this tool at my last job and would've made debugging clusters so much easier, but only if I could dig in to every resource at full granularity. |
I don't think IceKube will gain the ability to do image scanning, that might be too much of a scope creep :P But I do see the benefit of knowing the various images, and the inits, etc. So we should add something for that. I'm hesitant to add a new node if there is no clear use-case for the time being, especially if we can encode that same information into the pods properties itself. Partially because other parts of the codebase assume every node is a kubernetes resource and we would need to refactor those as well. What might be worth doing is going further in adding relationships between all the resource types, and then seeing if we hit a limitaton in that where we don't get enough purely from that relationship 🤔 So with that, for the time being I don't think we need to do the refactor for BaseResource provided the data we want to expose to users can still be done with relationships? |
I agree, if that's the case, I can close the PR and focus on modeling additions the same as Do you think it'd be better to rename my fork to prevent confusion? I'll link back to this repo for attribution if that's alright with you :) - Also thank you for taking the time to walk me through your thought process, I appreciate your time. |
Absolutely fine for you to maintain a separate fork, thanks for trying to contribute back in the first place :D I don't mind if you rename or not. If it's a Github fork it will still say forked from this repo, etc and hopefully it won't be too difficult for us to swap patches back and forth. Anytime, happy to help wherever I can :D |
Commit Log
kind
field specified by any inheriting objectsSummary
This PR adds a new base class for referencing any sub-objects defined in the spec of a
Resource
. This is so that we can map relationships between theResource
and any type that has a relationship to theResource
(e.g. DEFINES, USES, CONSUMES, REFERENCES)An example of a type that can be represented using
BaseResource
is SecretReference - It doesn't appear as an API resource within K8s, but tracking the definition of the SecretReference object can be useful in tracing where references originate from in a more generic and composable way.I'm open to discussion for any concerns or alternative implementations we might want to pursue. Some important things to note is that we expect some
BaseResource
types to define their ownapiVersion
andkind
fields, since some objects use those fields to determine which object K8s should refer to (e.g. ObjectReference), so includingapiVersion
andkind
in theBaseResource
class might be premature or not a wise move