Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[cleaner] periodic cleanup of mapping files. #3715

Open
TrevorBenson opened this issue Jul 18, 2024 · 0 comments
Open

[cleaner] periodic cleanup of mapping files. #3715

TrevorBenson opened this issue Jul 18, 2024 · 0 comments

Comments

@TrevorBenson
Copy link
Member

TrevorBenson commented Jul 18, 2024

_Originally posted by @NikhilKakade-1 in: #3097 (comment)

https://github.com/sosreport/sos/blob/main/sos/cleaner/parsers/__init__.py#L97

The time complexity of the provided function becomes:

Worst Case: O(k * n * m)
Best Case: O(k * m)

In my opinion, optimizing the data masking process is essential. In terms of computational complexity, the worst-case scenario is O(k * n * m), where k is the number of lines, n is the number of compiled regexes in self.mapping.compiled_regexes, and m is the average length of the lines. Conversely, the best case is O(k * m).

I've been mulling over a few questions.

  1. Should we consider period>\ic cleanup of mapping files as the number of compiled regexes increases over time, potentially impacting total processing time?

Yes, it is worth raising a separate issue for the periodic cleanup of mapping files.

Originally posted by @pmoravec in #3097 (comment)


My response was if automating a removal of map regexes for obfuscation it seemed that tracking the temperature (ie.e most recent time a regex map obfuscation was used) may be required. I also thought that it should be left to the user when to enable this removal of past regex maps. However, a consistent hash and stored salt mentioned by @pmoravec might make the automated cleanup less of a concern. At least if I followed everything properly the regex map for the same sensitive data being created would likely result in the same obfuscation map entry, given the same hash method and stored salt.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant