Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add BaseResource, adjust code paths to work with BaseResource types #12

Closed
wants to merge 7 commits into from

Conversation

gsalaz98
Copy link
Contributor

@gsalaz98 gsalaz98 commented Dec 5, 2023

Commit Log

  • Add BaseResource, adjust code paths to work with BaseResource types
  • Added utility method to get kind from resource, respecting any potential BaseResource that has its kind field specified by any inheriting objects
  • Refactored looping code to be configurable, for potential future need to re-loop after initial relationship creation

Summary

This PR adds a new base class for referencing any sub-objects defined in the spec of a Resource. This is so that we can map relationships between the Resource and any type that has a relationship to the Resource (e.g. DEFINES, USES, CONSUMES, REFERENCES)

An example of a type that can be represented using BaseResource is SecretReference - It doesn't appear as an API resource within K8s, but tracking the definition of the SecretReference object can be useful in tracing where references originate from in a more generic and composable way.

I'm open to discussion for any concerns or alternative implementations we might want to pursue. Some important things to note is that we expect some BaseResource types to define their own apiVersion and kind fields, since some objects use those fields to determine which object K8s should refer to (e.g. ObjectReference), so including apiVersion and kind in the BaseResource class might be premature or not a wise move

  * Added utility method to get kind from resource, respecting
    any potential BaseResource that has its `kind` field specified
    by any inheriting objects

  * Refactored looping code to be configurable, for potential
    future need to re-loop after initial relationship creation
@gsalaz98
Copy link
Contributor Author

gsalaz98 commented Dec 5, 2023

I'd like to rebase before merging to clean up the commit history, CI was giving me a bit of a hard time, especially with isort and black fighting over line additions/removals

@Skybound1
Copy link
Collaborator

I'd like to rebase before merging to clean up the commit history, CI was giving me a bit of a hard time, especially with isort and black fighting over line additions/removals

Hahaha no worries, squash merging is absolutely fine (it's also what I did the previous one with ;) )

For this PR as a whole, I think we need to have a bit more of a discussion on it. Just to help line everything up :)

The first point is - do you know why we have the two stages of relationship generation? I ask as I think the recursing in the neo4j functionality is trying to achieve the same thing

Can you elaborate a tad on your objectives with BaseResource vs Resource? Is it purely so we have a class that can be used for non "resource" definitions (ie the SecretReference)? If so, were you thinking they be added as nodes to Neo4j?

Have you skimmed the PolicyRule logic in IceKube? As that's the current "method" for doing it. Not saying it has to stay that way of course, just pointing it out as a lets compare against whats existing to figure out the better way forward based on what functionality and features we want out of that side of things.

@gsalaz98
Copy link
Contributor Author

gsalaz98 commented Dec 5, 2023

The first point is - do you know why we have the two stages of relationship generation? I ask as I think the recursing in the neo4j functionality is trying to achieve the same thing

I'm actually not sure, this is my first time getting to use a graph database, so I'm not sure exactly why relationships can appear after a first pass, but not subsequent passes. Any reading material is appreciated

Can you elaborate a tad on your objectives with BaseResource vs Resource? Is it purely so we have a class that can be used for non "resource" definitions (ie the SecretReference)? If so, were you thinking they be added as nodes to Neo4j?

I want to have a way to track non-resource definitions and was wanting them to appear as nodes in Neo4j. The reason I have a base class is to reduce some code duplication + add the object hashing functionality to attempt to make each sub-"resource" definition distinct (although I know there can be duplicates if we're basing it off hashes).

Have you skimmed the PolicyRule logic in IceKube? As that's the current "method" for doing it. Not saying it has to stay that way of course, just pointing it out as a lets compare against whats existing to figure out the better way forward based on what functionality and features we want out of that side of things.

Having a shared base class is helpful when deserializing data from Neo4j, we can re-use the logic in Resource.__new__ to properly assign a resource type to an available class if it's found. It makes adding new types significantly easier in my opinion and think we should pursue at least having a separate class for sub-"resources".

I'm currently testing out the PersistentVolume changes and am probably going to backport some required changes to BaseResource in this PR if we decide to go down that path. I can commit and we can discuss shortly afterwards if that works out for you

@Skybound1
Copy link
Collaborator

The first point is - do you know why we have the two stages of relationship generation? I ask as I think the recursing in the neo4j functionality is trying to achieve the same thing

I'm actually not sure, this is my first time getting to use a graph database, so I'm not sure exactly why relationships can appear after a first pass, but not subsequent passes. Any reading material is appreciated

It's not a graph database specific thing, it's more the way IceKube works. When querying data from the API server, there is no guarantee that all resources are enumerated. A good example of this are users, most clusters don't have a resource for these. But users clearly do exist as you have them in role bindings, etc. Or sometimes, when on client engagements, they don't want us accessing secrets, so we can't query for any of them.

When resources are creating relationships, they will figure out what other resources they refer to (the DEFINES, USES, etc nomenclature from the main description from this PR is imho on the same lines) and will tell Neo4j to create a relationship between itself and the referred too resource. The Neo4j queries are done in a way that if the referred resource doesn't exist, then it will be created. Due to this, we get new resources being created when doing the first pass on relationships.

The second pass is to then make sure we account for those newly created resources as part of the RBAC processing. We avoid doing them in the first run as they are the longest running part of relationship generation and no point running them when not all resources are computed. For example, let's say a service account can impersonate a user, if that user doesn't exist in the database yet, we wouldn't see it. So we wait till after we are sure all resources we could know about are created.

I raised this point, as I think that is similar to what your recursive neo4j functionality is doing. Going through nodes its creating and adding them to the create functionality. Were the nodes to just create the appropriate relationships wanted, then that would happen automatically.

Can you elaborate a tad on your objectives with BaseResource vs Resource? Is it purely so we have a class that can be used for non "resource" definitions (ie the SecretReference)? If so, were you thinking they be added as nodes to Neo4j?

I want to have a way to track non-resource definitions and was wanting them to appear as nodes in Neo4j. The reason I have a base class is to reduce some code duplication + add the object hashing functionality to attempt to make each sub-"resource" definition distinct (although I know there can be duplicates if we're basing it off hashes).

So at the moment, that would be included in the "raw" field in a node. What are you trying to achieve with having them as nodes in Neo4j? Would it be separate to a relationship between two nodes where one node is specified as a non-resource definition?

Have you skimmed the PolicyRule logic in IceKube? As that's the current "method" for doing it. Not saying it has to stay that way of course, just pointing it out as a lets compare against whats existing to figure out the better way forward based on what functionality and features we want out of that side of things.

Having a shared base class is helpful when deserializing data from Neo4j, we can re-use the logic in Resource.__new__ to properly assign a resource type to an available class if it's found. It makes adding new types significantly easier in my opinion and think we should pursue at least having a separate class for sub-"resources".

I'm currently testing out the PersistentVolume changes and am probably going to backport some required changes to BaseResource in this PR if we decide to go down that path. I can commit and we can discuss shortly afterwards if that works out for you

OK, I'll take a new look when pushed :)

@gsalaz98
Copy link
Contributor Author

gsalaz98 commented Dec 5, 2023

So at the moment, that would be included in the "raw" field in a node. What are you trying to achieve with having them as nodes in Neo4j? Would it be separate to a relationship between two nodes where one node is specified as a non-resource definition?

My personal use case would be to track down deployment/dependency chains more effectively than I was able to in the past with ArgoCD, as well as querying and figuring out ownership/references to objects in a quick manner.

I'm thinking that this tool would be extremely useful when debugging cluster issues during incidents/disaster recovery, and having a more complete graph of the cluster itself would be useful in that scenario. Since there's no support for Maps in Neo4j, the solution I came up with was to separate each object out into its own node so that its data was queryable.

@Skybound1
Copy link
Collaborator

My personal use case would be to track down deployment/dependency chains more effectively than I was able to in the past with ArgoCD, as well as querying and figuring out ownership/references to objects in a quick manner.

I'm thinking that this tool would be extremely useful when debugging cluster issues during incidents/disaster recovery, and having a more complete graph of the cluster itself would be useful in that scenario. Since there's no support for Maps in Neo4j, the solution I came up with was to separate each object out into its own node so that its data was queryable.

So something like a malicious pod, tracking down which replica set and therefore which deployments thats from. So you know what resources to check logs for on who created it, etc? Am I correct in thats the kind of view you're trying to get?

@gsalaz98
Copy link
Contributor Author

gsalaz98 commented Dec 5, 2023

So something like a malicious pod, tracking down which replica set and therefore which deployments thats from. So you know what resources to check logs for on who created it, etc? Am I correct in thats the kind of view you're trying to get?

Yeah, that's a good use case but one I didn't think about - I'm not professionally involved in security, I'm more of an enthusiast in that respect, but I'm in the DevOps/SRE side of things, and I'd honestly rather query the graph to track down resources than to fiddle around with kubectl and manually do it, which can get tedious when trying to fetch relationships across multiple different objects.

I'd rather be able to do

MATCH (p:Pod {namespace: "monitoring"})-->()-[:CONSUMES]->(pv:PersistentVolume) RETURN pv
# or even shorthandedly
MATCH (:Pod {namespace: "monitoring"})-->()-->(pv:PersistentVolume) RETURN pv

than having to remember the spec:

kubectl get pods -nmonitoring -ojson | jq '.items[].spec.volumes[] | select (.persistentVolumeClaim != null) | .persistentVolumeClaim.claimName'

It's just way more elegant and much better to quickly browse and see relationships between resources. Being able to map network topologies would also be really cool, and I'm sure it could be supported in the future as well.

@Skybound1
Copy link
Collaborator

Yeah, that's a good use case but one I didn't think about - I'm not professionally involved in security, I'm more of an enthusiast in that respect, but I'm in the DevOps/SRE side of things, and I'd honestly rather query the graph to track down resources than to fiddle around with kubectl and manually do it, which can get tedious when trying to fetch relationships across multiple different objects.

I'd rather be able to do

MATCH (p:Pod {namespace: "monitoring"})-->()-[:CONSUMES]->(pv:PersistentVolume) RETURN pv
# or even shorthandedly
MATCH (:Pod {namespace: "monitoring"})-->()-->(pv:PersistentVolume) RETURN pv

than having to remember the spec:

kubectl get pods -nmonitoring -ojson | jq '.items[].spec.volumes[] | select (.persistentVolumeClaim != null) | .persistentVolumeClaim.claimName'

It's just way more elegant and much better to quickly browse and see relationships between resources. Being able to map network topologies would also be really cool, and I'm sure it could be supported in the future as well.

OK, that makes sense. I don't think you need to make a new resource type for the intermediary references if this is your goal.

You could turn

MATCH (p:Pod {namespace: "monitoring"})-->()-[:CONSUMES]->(pv:PersistentVolume) RETURN pv

into

MATCH (p:Pod {namespace: "monitoring"})-[:CONSUMES]->(pv:PersistentVolume) RETURN pv

We would just add appropriate relationships that parse the references to other resources, and add them to the main resources relationships. Similar to how pods reference service accounts that they use, or role bindings reference the role and the subject defined within.

For the various resource types, we identify how they relate / refer to other resources, and parse that in and add relationships. If the reference model is the same across resource types, we can separate that out into its own class for easy re-use, similar to what we did with PolicyRule.

Does that make sense, or do you see something I'm missing?

@gsalaz98
Copy link
Contributor Author

gsalaz98 commented Dec 5, 2023

That makes sense to me, I do prefer the shorter syntax, but it comes with the loss of information if not included in the main object.

For example:

class Pod(Resource):
service_account: Optional[ServiceAccount]
node: Optional[Node]
containers: List[Dict[str, Any]]
capabilities: List[str]
host_path_volumes: List[str]
privileged: bool

The privileged flag can be set by any of the containers' securityContext, but we lose out on individual Pod->Container relationships and we can't determine which container is the one with privileged: true if we are only querying the graph.

If there's a way to get the shorter syntax working while also not throwing away the information + getting relationships for free by defining types, then I'm game for it. I'd settle for a "verbose" mode that dumps non-resource objects into Neo4j, but have the default be to only do light contextualization, i.e. what we're already doing

@Skybound1
Copy link
Collaborator

Are there other areas you can think where that information might be lost?

Pods / Containers was the only one I really came across before and spent a while debating adding containers as another resource, before ending up not as I at the time didn't see much use for it at the time. Considering in Kubernetes land, Pods are considered the "smallest" resource, and when I was thinking of reasons why I might want to care about a container, most things would have been true for the pod overall so didn't see the need for it then.

Of course, things can change. What kind of distinct information as an example were you aiming to get from the container?

@gsalaz98
Copy link
Contributor Author

gsalaz98 commented Dec 5, 2023

Are there other areas you can think where that information might be lost?

Pods / Containers was the only one I really came across before and spent a while debating adding containers as another resource, before ending up not as I at the time didn't see much use for it at the time. Considering in Kubernetes land, Pods are considered the "smallest" resource, and when I was thinking of reasons why I might want to care about a container, most things would have been true for the pod overall so didn't see the need for it then.

Of course, things can change. What kind of distinct information as an example were you aiming to get from the container?

One use case could be container image scanning, i.e. scanning for known vulnerabilities or CVEs w/ Snyk or something along those lines. Also would be very helpful to know if a Pod's container is an init container, ephemeral, or long lived.

I think exposing that sort of information to Neo4j queries would be very powerful for any new applications built on top of the graph in neo4j. Having shortcuts, i.e. directly defining (:Pod)-[:IS_PRIVILEGED]->(x) as a shortcut rather than (:Pod)-->(:Container)-[:IS_PRIVILEGED]->(x) is in my opinion worthwhile and we should aim to add as many as we can.

A lot of my wants at the moment I don't have examples to demonstrate, but I honestly would've loved to have this tool at my last job and would've made debugging clusters so much easier, but only if I could dig in to every resource at full granularity.

@Skybound1
Copy link
Collaborator

I don't think IceKube will gain the ability to do image scanning, that might be too much of a scope creep :P

But I do see the benefit of knowing the various images, and the inits, etc. So we should add something for that.

I'm hesitant to add a new node if there is no clear use-case for the time being, especially if we can encode that same information into the pods properties itself. Partially because other parts of the codebase assume every node is a kubernetes resource and we would need to refactor those as well.

What might be worth doing is going further in adding relationships between all the resource types, and then seeing if we hit a limitaton in that where we don't get enough purely from that relationship 🤔

So with that, for the time being I don't think we need to do the refactor for BaseResource provided the data we want to expose to users can still be done with relationships?

@gsalaz98
Copy link
Contributor Author

gsalaz98 commented Dec 5, 2023

I agree, if that's the case, I can close the PR and focus on modeling additions the same as PolicyRule is defined, starting with PersistentVolume/PVCs. I'll probably end up maintaining a separate fork focusing on my needs of getting the whole cluster definition loaded into Neo4j, but will try and contribute back as much as I can.

Do you think it'd be better to rename my fork to prevent confusion? I'll link back to this repo for attribution if that's alright with you :) - Also thank you for taking the time to walk me through your thought process, I appreciate your time.

@Skybound1
Copy link
Collaborator

Absolutely fine for you to maintain a separate fork, thanks for trying to contribute back in the first place :D

I don't mind if you rename or not. If it's a Github fork it will still say forked from this repo, etc and hopefully it won't be too difficult for us to swap patches back and forth.

Anytime, happy to help wherever I can :D

@gsalaz98 gsalaz98 closed this Dec 6, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants