-
Notifications
You must be signed in to change notification settings - Fork 40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add DB scanner module (mongo support only) #99
base: master
Are you sure you want to change the base?
Conversation
Thanks, I will take a look soon 👌 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are a lot of things that need to be re-done. The main problems are:
- The code is too over-complicated, we need to make it simpler, much simpler because there is a lot of useless checks.
- The nesting is too big and totally uncontrollable, separate big functions into small sub-functions and sub-methods
- The code is really hard-readable. Subfunctions, private methods, and other stuff, again.
- The output JSON is wrong, it has an improper format. Check the code review for more information.
- The JSON file (one file) is written in a cycle, in multiple processes, for hundreds of hosts. It is kinda dangerous in many ways, and totally not effective. Read the review for more information. We can save JSON per host, for example, or can do something else, idk.
- Recursive functions. Just in case.
That's all for the first iteration because without fixes of the stated problems code is really hard to read (and understand).
I will update the review when the problems above will be fixed.
Any news? @tenrowd |
Thanks for the updates. 👍 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's start with some basic fixes because they are everywhere and can be seen through the rest of the code. This review is really small, so we can fix something step-by-step to not waste the time on big reviews (for now).
Also, please, take a look at the problem with the temporary JSON files (e.g. they are corrupted sometimes).
And think about database operations optimization - maybe it's better to use something like $in
operators and so on to make one request instead of len(keys)
requests.
:param filename: filename of resulting file | ||
:return: None | ||
""" | ||
with open(filename, mode="a") as result_file: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sometimes the resulting file (the one that looks like /temporaries/ip.json
) is not close properly, and in this case, the structure of the file is wrong - we have the invalid JSON file in this case. So, I don't know why, but maybe this happens because of the Mongo timeout, or any other timeout, and the file doesn't write properly. Please, think about the idea of how can we fix it.
ip_values = DBScannerDefaultValues.MODULE_RESULTS.keys() | ||
for ip in ip_values: | ||
filename = str(temporary_directory.joinpath(str(ip) + ".json")) | ||
with open(filename, mode="a") as result_file: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops, sorry, I believe that note about the invalid output JSON belongs here. But still, please, check all the functions that are related to the file saving (it's important to have the valid JSONs).
""" | ||
ip_values = DBScannerDefaultValues.MODULE_RESULTS.keys() | ||
for ip in ip_values: | ||
filename = str(temporary_directory.joinpath(str(ip) + ".json")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use format strings with .format()
or f"{}"
if result: | ||
return result | ||
else: | ||
return None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if result:
return result
return
You can use None
in return statement if you want to do it, but you know that return
by default returns None
, sooooo... Yep. Think about it.
Also, you can skip else
statement here, because if you skip if result:
part (when you don't have result), you will automatically go to the part which is the next and the final in terms of the function (or maybe I missed something? Please tell me If so 😄 )
""" | ||
for info in DBScannerDefaultValues.CRITICAL_INFORMATION: | ||
regular = findall(info + DBScannerDefaultValues.NOT_INCLUDED, key) | ||
if regular: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To reduce the quantity of the nested loops, you can check the edge cases first, like:
if not regular:
continue
if isinstance(document...):
...
result_list = search_in_list(document[key]) | ||
if result_list: | ||
document_dict.update(result_list) | ||
else: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we can omit the else:
here, please check it
self.port = 27017 | ||
self.server_timeout = 10000 # serverSelectionTimeoutMS | ||
self.socket_timeout = 300000 # socketTimeoutMS | ||
self.connection_timeout = 10000 # connectTimeoutMs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's better to move this stuff in class like DbScannerDefaultConfiguration
or something like this
try: | ||
self.check_time_of_connection() | ||
except DBScannerConnectionTimeout as err: | ||
if err: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we really need to check that? 😺
I believe that we almost always have the err
object. Or something special here?
return None, err | ||
if collection_list: | ||
return collection_list, None | ||
else: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can omit else:
here, check it please
keys_to_take = [] | ||
regulars = [] | ||
projection = {} | ||
for key in keys: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it applicable here?
https://docs.mongodb.com/manual/reference/operator/query/in/
Wassup? How's your progress with fixes? Do you need any help? DM me on TG if you need anything or have any questions. ✌️ |
No description provided.