-
Notifications
You must be signed in to change notification settings - Fork 350
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Rapid7 InsightVM source #1010
base: master
Are you sure you want to change the base?
Conversation
This is really cool, and thank you! |
Like mde, good for review and merge. |
docs/schema/rapid7.md
Outdated
TBD | ||
* Azure Tenant contains one or more Rapid7 Hosts. | ||
``` | ||
(AzureTenant)-[RESOURCE]->(Rapid7Host) | ||
``` | ||
* Azure Subscription contains one or more Rapid7 Hosts. | ||
``` | ||
(AzureSubscription)-[RESOURCE]->(Rapid7Host) | ||
``` | ||
* Azure Virtual Machine is one single Rapid7 Host. | ||
* Similarly for other cloud providers and Onpremises. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still TBD? Your changes do not create these relationships.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On relationships, I would be interested into some advices.
It does not seem most of the sources of cartography are doing cross-sources relations.
But this is an important point to me.
- link Cloud VM with tools present on it.
- problem, tool may have not always the information that makes relation certain. In case of Azure
- best case is to have resourceid (mde, rapid7 but does not seem always retrieved)
- crowdstrike gets only subscriptionid, resourcegroup and hostname
- worst case, only have hostname
I started to make AzureVM relation based on resourceid and where not available AzureSubscription based on subscriptionid.
I removed the latter as I found it making graph less readable.
I was thinking to use PRESENT_IN if relation based on resourceid and MAYBE_PRESENT_IN if relation based just on hostname.
What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
feedback on this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't have much context on how Azure works. But our preference is to only be certain about the relationships we create.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the problem is always the same when dealing with multiple tools. is there a reliable key to use to join?
Azure resourceid could be but sadly not fetched reliably by most tools, at least in my context.
hostname/short_hostname is the default join key but hardly reliable (hostname reused, different case depending on tool, different length enforcement depending on tool - some enforce 15-chars windows limit...)
At this point, I'm plan to make relation base on resourceid or hostname/short_hostname.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
more to add here?
As said, would need a bit of help on mypy error as less familiar with the tool. Also, for some reason, CLA check is not green, but when going to https://oss.lyft.com/cla, it is marked signed and current. I did it a few weeks ago. |
Ah,CLA check is green now |
on my side, linter looks all good. |
Input: dataframe row configurations column | ||
""" | ||
|
||
if configurations is None or not isinstance(configurations, list): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does the data returned from rapid7s api sometimes have empty fields?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fine to resolve?
azure = json.loads(line["value"]) | ||
else: | ||
azure = line["value"] | ||
instance_id = azure["instanceId"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will and "azure"
value always be an instance?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if azure type, I believe there will be instanceid
but not all azure resources are correctly marked azure type
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please do try to address the pylint comments.
I also have a mypy error that I'm not sure how to address
|
…l through local file (dirpath) or downloadReport API (report-id), s/list/List/
I was on vacation for the past month, but I'll be taking another look at this PR again next week. |
Even w 48GB RAM, only up to 13k... | ||
""" | ||
|
||
if authorization[2] and authorization[4]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we're checking for positions in this 6-tuple, It's better to convert the authorization
to a class or named tuple
https://docs.python.org/3/library/collections.html?highlight=namedtuple#collections.namedtuple
nodes = neo4j_session.run( | ||
""" | ||
MATCH (n:Rapid7Host) | ||
RETURN n.id |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Try to add one more test to check for the relationship to the Azure hosts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
did a quick attempt. testing needed
|-------|--------------| | ||
| firstseen| Timestamp of when a sync job first discovered this node | | ||
| lastupdated | Timestamp of the last time the node was updated | | ||
| r7_id | id | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we know this is a Rapid7Host node, there's no need to prefix each of the properties with "r7_".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMHO, it depends on each product which may have different output.
Typically for IP/MAC address, if multiple cards, there is no certainty that each product will pick the same. same for os product, version... no shared taxonomy sadly.
if still want to remove prefix, please say if everywhere or only part of fields:
size_interval = 500 | ||
result_array: List[Any] = [] | ||
df_rapid7_tmp = pandas.DataFrame() | ||
while count < limit: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we have a limit? Is it because of API ratelimitting? If that's the case, consider using the @backoff
decorator.
https://github.com/lyft/cartography/blob/master/cartography/util.py#L206
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No API rate-limiting AFAIK, more response body size and experience that it is more likely to fail if more depending on context. at least, not on doc https://help.rapid7.com/insightvm/en-us/api/index.html#operation/findAssets
Another reason to use the report option.
else: | ||
nexpose_verify_cert2 = True | ||
|
||
count = total_resources = 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like count
is unused.
… add tags to downloadreport, cleaning
Currently, this can get data from
I don't know if should keep the API one which has limitations mostly for large inventory. Mypy typing errors that I didn't solve (or was impacting functionalities), all related to extract_rapid7_configurations_* functions.
|
is there something that I can help to move this forward? |
Add Ingestion of Rapid7 InsightVM assets as source. (follow-up of #1000)
Include initial test data and draft schema
Bugs
Pending
Reviewed with pylint and black