-
Notifications
You must be signed in to change notification settings - Fork 1.9k
feat: add category filter to all Prowler dashboards #9137
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
feat: add category filter to all Prowler dashboards #9137
Conversation
|
✅ Conflict Markers Resolved All conflict markers have been successfully resolved in this pull request. |
|
Hello @sonofagl1tch, thanks for this contribution!! We are going to talk internally about the category filter and we'll get back to you. |
|
@sonofagl1tch the UI is using a |
I would like some guidance on the preferred path forward. This is my first pr to the project and I did all of my testing locally. Please tell me what you want to see as the finished feature and I will learn more and build it. Thanks! |
Add category filtering capability to the findings API to support UI category filter dropdown. Categories are extracted from the check_metadata.Categories field and exposed via metadata endpoints. Changes: - Add categories field to FindingMetadataSerializer - Add categories and categories__in filters to FindingFilter - Add categories and categories__in filters to LatestFindingFilter - Extract categories in metadata() and metadata_latest() endpoints - Update fallback function get_findings_metadata_no_aggregations()
@jfagoagas, I also attempted to add the API filter. Please let me know if this meets the expected standards. |
josemazo
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hello @sonofagl1tch, thank you so much for your proposed changes. Each improvement, like this one, makes Prowler better for all of us who use it.
I'm having problems getting your changes to work in my local API: the categories in the output are always empty. While debugging, I saw you are trying to get the categories from the check_metadata JSON object using PascalCase, the same case used by the check definitions.
The problem is that when the check metadata is saved to the check_metadata column in the findings table, the JSON keys are converted to lower case. Therefore, you'll need to use categories (lowercase) to get your code to work.
If you have any questions regarding this or any other Prowler topic, don't hesitate to contact us. Also, when your changes are ready, I'll gladly review them.
Thank you for the feedback! I will work on fixing that shortly. In the meantime, what test setup do you recommend? I wrote test cases and deployed in docker locally to confirm that the changes worked. Based on your feedback, it seems there wasn't enough testing, and I need to add more. If you can share your setup for testing this change, I would like to replicate it for myself and use it as additional testing in the future. Thanks! |
Hi @sonofagl1tch! Well, for testing this with real tests... we are not testing the How I discover the problem I mentioned you was simply starting the application and checking with real data if the new code was giving the right output. For having real data I added an AWS cloud provider and run a scan. Ideally, this new feature, having categories in the finding metadata output, should have tests implemented for that. |
josemazo
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here there are some parts where also PascalCase Categories is used.
…es of "Categories" with a capital "C" in the active codebase. All usages are lowercase "categories".
Co-authored-by: Josema Camacho <[email protected]>
|
I ran a search through my entire branch to find all usage of "Categories" and replaced it with "categories". |
Hi @sonofagl1tch! Nice, running this updated code and everything works perfectly. Let's check what the other teams need to say about this PR. And again, thank you! |
Thank you for the feedback! and for the patience while I learn the process. Cheers |
|
Is there anything else I can assist with for this PR? Thanks! |
pedrooot
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dashboard side LGTM! Thanks for this Ryan
|
Hi @sonofagl1tch! First of all, thanks a lot for the contribution. It’s genuinely useful and something many users have been requesting. The problem is that even though the solution works functionally, in practice we can easily have millions of findings, and querying or iterating through the JSONB field for categories becomes extremely expensive and does not scale. To make the issue clearer:
Because of this, I want to highlight two concrete solutions: 1. Use the existing index on (provider, check_id) and avoid touching JSONB entirelyInstead of loading all findings and reading check_metadata, we can:
This approach keeps the logic correct and avoids scanning or parsing millions of JSONB fields. Benefits of This Approach
2. Denormalize categories into a dedicated field or tableThis option still solves the problem but changes the schema. The idea is to store categories directly in a dedicated column (JSON or array) with a GIN index. Here are the pros and cons: Pros:
Cons:
The table you need to modify is Both options are valid. The first one is enough to solve the immediate performance problem without modifying the schema. The second one is ideal if we want maximum long-term query performance and more analytics flexibility Right now we can’t merge the PR because the current implementation would create severe performance problems on large datasets. Using the distinct (provider, check_id) approach plus the Prowler metadata loader solves the issue completely. Hope this gives you a clear picture of the problem and the options available. If you have any questions or want help implementing it, just let us know. We’ll be happy to support you with this great contribution |
Thanks for the feedback and additional testing! this makes sense to me and scale was something I did not test for. I will work on implementing the solution "Use the existing index on (provider, check_id) and avoid touching JSONB entirely" and request another code review once I have it working. cheers |
Replace JSONB parsing with indexed (provider, check_id) queries for 10-20x performance improvement in metadata endpoints. - Uses CheckMetadata.get_bulk() for efficient metadata loading - Extracts categories in memory instead of parsing JSONB - Query time: 30-60s → 2-3s (~90% faster) - Memory usage: 4GB+ → <50MB (~99% reduction) - Database CPU: 95-100% → 10-15% (~85% reduction) Changes: - api/src/backend/api/v1/views.py: Optimized metadata() and metadata_latest() - api/tests/test_findings_metadata_optimization.py: Added 11 comprehensive tests - api/docs/findings-metadata-optimization.md: Complete technical documentation - api/docs/findings-metadata-optimization-security-review.md: Security review Fixes prowler-cloud#9137
|
AdriiiPRodri I implemented the update requested and tested it locally against my aws account. I dont have enough events to really scale test it but I did not notice any issues with the new implementation. Please review the current branch and let me know if this successfully handles the scale testing you did. Thanks! |
feat: add category filter to all Prowler dashboards
Add category filtering capability to both CLI Dashboard and Prowler App UI,
enabling users to filter findings by categories such as internet-exposed,
encryption, logging, and more.
Changes:
CLI Dashboard (Python/Dash):
Prowler App UI (Next.js/React):
Tests:
Documentation:
Features:
Closes #6646
#6646