feat: add category filter to all Prowler dashboards #9137

sonofagl1tch · 2025-11-02T19:56:03Z

feat: add category filter to all Prowler dashboards

Add category filtering capability to both CLI Dashboard and Prowler App UI,
enabling users to filter findings by categories such as internet-exposed,
encryption, logging, and more.

Changes:

CLI Dashboard (Python/Dash):
- Add create_category_dropdown() function in dashboard/lib/dropdowns.py
- Integrate category dropdown into layout (5-column grid)
- Implement category filtering logic in dashboard/pages/overview.py
- Support comma-separated and pipe-separated category values
- Dynamic category options based on filtered data
Prowler App UI (Next.js/React):
- Add CATEGORY to FilterType enum in ui/types/filters.ts
- Extract and pass uniqueCategories from metadata endpoint
- Add category filter to FindingsFilters component
- Exclude categories from metadata endpoint filters
- Support categories__in query parameter
Tests:
- Add 10 unit tests for CLI Dashboard category filter
- Add 4 E2E tests for Prowler App UI category filter
- All tests passing (14/14)
Documentation:
- Update CLI Dashboard tutorial with category filter usage
- Create comprehensive category filter guide
- Add Prowler App UI category filter documentation
- Include examples and use cases

Features:

Multi-select dropdown with "All" default
Handles comma/pipe-separated categories in CSV
Filters by single or multiple categories
Works seamlessly with existing filters
Dynamic category options
Backward compatible

Closes #6646

#6646

github-actions · 2025-11-02T19:56:24Z

✅ Conflict Markers Resolved

All conflict markers have been successfully resolved in this pull request.

jfagoagas · 2025-11-04T09:16:47Z

Hello @sonofagl1tch, thanks for this contribution!! We are going to talk internally about the category filter and we'll get back to you.

jfagoagas · 2025-11-04T09:44:52Z

@sonofagl1tch the UI is using a category filter which is not present in the API. Do you plan to work on it?

sonofagl1tch · 2025-11-04T14:51:30Z

@sonofagl1tch the UI is using a category filter which is not present in the API. Do you plan to work on it?

I would like some guidance on the preferred path forward. This is my first pr to the project and I did all of my testing locally. Please tell me what you want to see as the finished feature and I will learn more and build it. Thanks!

Add category filtering capability to the findings API to support UI category filter dropdown. Categories are extracted from the check_metadata.Categories field and exposed via metadata endpoints. Changes: - Add categories field to FindingMetadataSerializer - Add categories and categories__in filters to FindingFilter - Add categories and categories__in filters to LatestFindingFilter - Extract categories in metadata() and metadata_latest() endpoints - Update fallback function get_findings_metadata_no_aggregations()

sonofagl1tch · 2025-11-05T22:28:25Z

@sonofagl1tch the UI is using a category filter which is not present in the API. Do you plan to work on it?

@jfagoagas, I also attempted to add the API filter. Please let me know if this meets the expected standards.

docs/user-guide/cli/tutorials/dashboard-category-filter.mdx

josemazo

Hello @sonofagl1tch, thank you so much for your proposed changes. Each improvement, like this one, makes Prowler better for all of us who use it.

I'm having problems getting your changes to work in my local API: the categories in the output are always empty. While debugging, I saw you are trying to get the categories from the check_metadata JSON object using PascalCase, the same case used by the check definitions.

The problem is that when the check metadata is saved to the check_metadata column in the findings table, the JSON keys are converted to lower case. Therefore, you'll need to use categories (lowercase) to get your code to work.

If you have any questions regarding this or any other Prowler topic, don't hesitate to contact us. Also, when your changes are ready, I'll gladly review them.

sonofagl1tch · 2025-11-10T16:43:57Z

Hello @sonofagl1tch, thank you so much for your proposed changes. Each improvement, like this one, makes Prowler better for all of us who use it.

I'm having problems getting your changes to work in my local API: the categories in the output are always empty. While debugging, I saw you are trying to get the categories from the check_metadata JSON object using PascalCase, the same case used by the check definitions.

The problem is that when the check metadata is saved to the check_metadata column in the findings table, the JSON keys are converted to lower case. Therefore, you'll need to use categories (lowercase) to get your code to work.

If you have any questions regarding this or any other Prowler topic, don't hesitate to contact us. Also, when your changes are ready, I'll gladly review them.

Thank you for the feedback! I will work on fixing that shortly.

In the meantime, what test setup do you recommend? I wrote test cases and deployed in docker locally to confirm that the changes worked. Based on your feedback, it seems there wasn't enough testing, and I need to add more. If you can share your setup for testing this change, I would like to replicate it for myself and use it as additional testing in the future. Thanks!

…tch/prowler into prowler-dashboard-6646

josemazo · 2025-11-11T09:16:07Z

Thank you for the feedback! I will work on fixing that shortly.

In the meantime, what test setup do you recommend? I wrote test cases and deployed in docker locally to confirm that the changes worked. Based on your feedback, it seems there wasn't enough testing, and I need to add more. If you can share your setup for testing this change, I would like to replicate it for myself and use it as additional testing in the future. Thanks!

Hi @sonofagl1tch!

Well, for testing this with real tests... we are not testing the check_metadata attribute in any of the finding metadata tests, test_findings_metadata_*. It isn't something we needed, until now.

How I discover the problem I mentioned you was simply starting the application and checking with real data if the new code was giving the right output. For having real data I added an AWS cloud provider and run a scan.

Ideally, this new feature, having categories in the finding metadata output, should have tests implemented for that.

josemazo

Here there are some parts where also PascalCase Categories is used.

api/src/backend/api/filters.py

…es of "Categories" with a capital "C" in the active codebase. All usages are lowercase "categories".

Co-authored-by: Josema Camacho <[email protected]>

sonofagl1tch · 2025-11-11T16:27:23Z

I ran a search through my entire branch to find all usage of "Categories" and replaced it with "categories".

josemazo · 2025-11-11T16:32:56Z

I ran a search through my entire branch to find all usage of "Categories" and replaced it with "categories".

Hi @sonofagl1tch! Nice, running this updated code and everything works perfectly. Let's check what the other teams need to say about this PR. And again, thank you!

sonofagl1tch · 2025-11-11T16:34:52Z

I ran a search through my entire branch to find all usage of "Categories" and replaced it with "categories".

Hi @sonofagl1tch! Nice, running this updated code and everything works perfectly. Let's check what the other teams need to say about this PR. And again, thank you!

Thank you for the feedback! and for the patience while I learn the process. Cheers

sonofagl1tch · 2025-11-13T15:09:05Z

Is there anything else I can assist with for this PR? Thanks!

pedrooot

Dashboard side LGTM! Thanks for this Ryan

AdriiiPRodri · 2025-11-19T17:17:11Z

Hi @sonofagl1tch!

First of all, thanks a lot for the contribution. It’s genuinely useful and something many users have been requesting. The problem is that even though the solution works functionally, in practice we can easily have millions of findings, and querying or iterating through the JSONB field for categories becomes extremely expensive and does not scale.

To make the issue clearer:

The current implementation filters categories by inspecting check_metadata inside every finding.
The /metadata endpoint also iterates through each finding to extract categories.
JSONB array lookups cannot rely on efficient indexing and become very slow at scale.
Scanning millions of findings just to extract categories leads to high CPU, IO and memory usage.

Because of this, I want to highlight two concrete solutions:

1. Use the existing index on (provider, check_id) and avoid touching JSONB entirely

Instead of loading all findings and reading check_metadata, we can:

Apply the user filters to the queryset.
Ask PostgreSQL only for distinct (provider, check_id) pairs. This is extremely fast because both fields are indexed and return only a few hundred items.
For each provider, load all check metadata once using CheckMetadata.get_bulk(provider).
Extract the categories from those check definitions in memory.

from collections import defaultdict
from prowler.lib.check.models import CheckMetadata

queryset = self.filter_queryset(self.get_queryset())

# Step 1: group distinct check_ids by provider
check_ids_by_provider = defaultdict(set)
for finding in queryset.values("provider", "check_id").distinct():
    check_ids_by_provider[finding["provider"]].add(finding["check_id"])

# Step 2: load metadata once per provider and collect categories
categories = set()
for provider, check_ids in check_ids_by_provider.items():
    bulk_metadata = CheckMetadata.get_bulk(provider)
    for check_id in check_ids:
        check_metadata = CheckMetadata.get(bulk_metadata, check_id)
        if check_metadata and check_metadata.Categories:
            categories.update(check_metadata.Categories)

categories = sorted(categories)

This approach keeps the logic correct and avoids scanning or parsing millions of JSONB fields.

Benefits of This Approach

Performance: Query distinct check_ids (hundreds) instead of iterating millions of findings
Scalability: Performance remains constant regardless of finding count
Efficient: Database query for distinct values is optimized with indexes
Maintainable: Uses existing Prowler check metadata infrastructure
Accurate: Always reflects the latest check definitions

2. Denormalize categories into a dedicated field or table

This option still solves the problem but changes the schema. The idea is to store categories directly in a dedicated column (JSON or array) with a GIN index.

Here are the pros and cons:

Pros:

Very fast category filtering using GIN indexes.
No need to inspect or parse check_metadata during filtering.
Queries become simpler and more predictable.
Allows efficient aggregations, counts and analytics on categories.

Cons:

Requires a migration.
Introduces redundancy because categories already exist inside check_metadata.
Schema change is more invasive and impacts storage size (although categories are small).
Needs additional logic to keep categories in sync if check definitions ever change.

The table you need to modify is ResourceScanSummary if you want to go with the second solution. This can be a bit complex, but we are here to help you. If you have any questions you can ask us, and I recommend looking at existing examples in the code.

Both options are valid. The first one is enough to solve the immediate performance problem without modifying the schema. The second one is ideal if we want maximum long-term query performance and more analytics flexibility

Right now we can’t merge the PR because the current implementation would create severe performance problems on large datasets. Using the distinct (provider, check_id) approach plus the Prowler metadata loader solves the issue completely.

Hope this gives you a clear picture of the problem and the options available. If you have any questions or want help implementing it, just let us know. We’ll be happy to support you with this great contribution

sonofagl1tch · 2025-11-20T18:24:59Z

Hi @sonofagl1tch!

First of all, thanks a lot for the contribution. It’s genuinely useful and something many users have been requesting. The problem is that even though the solution works functionally, in practice we can easily have millions of findings, and querying or iterating through the JSONB field for categories becomes extremely expensive and does not scale.

To make the issue clearer:

The current implementation filters categories by inspecting check_metadata inside every finding.

The /metadata endpoint also iterates through each finding to extract categories.

JSONB array lookups cannot rely on efficient indexing and become very slow at scale.

Scanning millions of findings just to extract categories leads to high CPU, IO and memory usage.

Because of this, I want to highlight two concrete solutions:

1. Use the existing index on (provider, check_id) and avoid touching JSONB entirely

Instead of loading all findings and reading check_metadata, we can:

Apply the user filters to the queryset.

Ask PostgreSQL only for distinct (provider, check_id) pairs. This is extremely fast because both fields are indexed and return only a few hundred items.

For each provider, load all check metadata once using CheckMetadata.get_bulk(provider).

Extract the categories from those check definitions in memory.
from collections import defaultdict
from prowler.lib.check.models import CheckMetadata

queryset = self.filter_queryset(self.get_queryset())

# Step 1: group distinct check_ids by provider
check_ids_by_provider = defaultdict(set)
for finding in queryset.values("provider", "check_id").distinct():
    check_ids_by_provider[finding["provider"]].add(finding["check_id"])

# Step 2: load metadata once per provider and collect categories
categories = set()
for provider, check_ids in check_ids_by_provider.items():
    bulk_metadata = CheckMetadata.get_bulk(provider)
    for check_id in check_ids:
        check_metadata = CheckMetadata.get(bulk_metadata, check_id)
        if check_metadata and check_metadata.Categories:
            categories.update(check_metadata.Categories)

categories = sorted(categories)
This approach keeps the logic correct and avoids scanning or parsing millions of JSONB fields.

Benefits of This Approach

Performance: Query distinct check_ids (hundreds) instead of iterating millions of findings

Scalability: Performance remains constant regardless of finding count

Efficient: Database query for distinct values is optimized with indexes

Maintainable: Uses existing Prowler check metadata infrastructure

Accurate: Always reflects the latest check definitions

2. Denormalize categories into a dedicated field or table

This option still solves the problem but changes the schema. The idea is to store categories directly in a dedicated column (JSON or array) with a GIN index.

Here are the pros and cons:

Pros:

Very fast category filtering using GIN indexes.

No need to inspect or parse check_metadata during filtering.

Queries become simpler and more predictable.

Allows efficient aggregations, counts and analytics on categories.

Cons:

Requires a migration.

Introduces redundancy because categories already exist inside check_metadata.

Schema change is more invasive and impacts storage size (although categories are small).

Needs additional logic to keep categories in sync if check definitions ever change.

The table you need to modify is ResourceScanSummary if you want to go with the second solution. This can be a bit complex, but we are here to help you. If you have any questions you can ask us, and I recommend looking at existing examples in the code.

Both options are valid. The first one is enough to solve the immediate performance problem without modifying the schema. The second one is ideal if we want maximum long-term query performance and more analytics flexibility

Right now we can’t merge the PR because the current implementation would create severe performance problems on large datasets. Using the distinct (provider, check_id) approach plus the Prowler metadata loader solves the issue completely.

Hope this gives you a clear picture of the problem and the options available. If you have any questions or want help implementing it, just let us know. We’ll be happy to support you with this great contribution

Thanks for the feedback and additional testing! this makes sense to me and scale was something I did not test for. I will work on implementing the solution "Use the existing index on (provider, check_id) and avoid touching JSONB entirely" and request another code review once I have it working.

cheers

Replace JSONB parsing with indexed (provider, check_id) queries for 10-20x performance improvement in metadata endpoints. - Uses CheckMetadata.get_bulk() for efficient metadata loading - Extracts categories in memory instead of parsing JSONB - Query time: 30-60s → 2-3s (~90% faster) - Memory usage: 4GB+ → <50MB (~99% reduction) - Database CPU: 95-100% → 10-15% (~85% reduction) Changes: - api/src/backend/api/v1/views.py: Optimized metadata() and metadata_latest() - api/tests/test_findings_metadata_optimization.py: Added 11 comprehensive tests - api/docs/findings-metadata-optimization.md: Complete technical documentation - api/docs/findings-metadata-optimization-security-review.md: Security review Fixes prowler-cloud#9137

sonofagl1tch · 2025-11-22T03:12:43Z

AdriiiPRodri I implemented the update requested and tested it locally against my aws account. I dont have enough events to really scale test it but I did not notice any issues with the new implementation. Please review the current branch and let me know if this successfully handles the scale testing you did. Thanks!

sonofagl1tch added 2 commits November 2, 2025 11:58

add category filter to all Prowler dashboards

0a68856

completed checklist and added changelog

d5eb66f

sonofagl1tch requested review from a team as code owners November 2, 2025 19:56

github-actions bot added documentation component/ui labels Nov 2, 2025

sonofagl1tch requested a review from a team as a code owner November 5, 2025 22:26

github-actions bot added the component/api label Nov 5, 2025

sonofagl1tch commented Nov 5, 2025

View reviewed changes

docs/user-guide/cli/tutorials/dashboard-category-filter.mdx Outdated Show resolved Hide resolved

Merge branch 'master' into prowler-dashboard-6646

9b1ee46

josemazo requested changes Nov 10, 2025

View reviewed changes

sonofagl1tch added 2 commits November 11, 2025 00:39

updated categories to use correct case

9af726b

Merge branch 'prowler-dashboard-6646' of https://github.com/sonofagl1…

6be3abe

…tch/prowler into prowler-dashboard-6646

sonofagl1tch requested a review from josemazo November 11, 2025 05:49

josemazo requested changes Nov 11, 2025

View reviewed changes

api/src/backend/api/filters.py Outdated Show resolved Hide resolved

api/src/backend/api/filters.py Outdated Show resolved Hide resolved

api/src/backend/api/filters.py Outdated Show resolved Hide resolved

api/src/backend/api/filters.py Outdated Show resolved Hide resolved

andoniaf added the community Opened by the Community label Nov 11, 2025

josemazo reviewed Nov 11, 2025

View reviewed changes

api/src/backend/api/filters.py Outdated Show resolved Hide resolved

sonofagl1tch and others added 2 commits November 11, 2025 11:03

did a codebase search for branch. results show there are no occurrenc…

46bfca4

…es of "Categories" with a capital "C" in the active codebase. All usages are lowercase "categories".

Update api/src/backend/api/filters.py

c5b8092

Co-authored-by: Josema Camacho <[email protected]>

sonofagl1tch requested a review from josemazo November 11, 2025 16:26

sonofagl1tch mentioned this pull request Nov 13, 2025

Prowler Dashboard filter for categories #6646

Open

chore(revision): remove dashboard tests

13d127e

pedrooot previously approved these changes Nov 13, 2025

View reviewed changes

chore(revision): remove docs

e461c65

pedrooot dismissed their stale review via e461c65 November 13, 2025 16:56

pedrooot previously approved these changes Nov 13, 2025

View reviewed changes

sonofagl1tch dismissed pedrooot’s stale review via baa3e1e November 21, 2025 03:18

sonofagl1tch marked this pull request as draft November 21, 2025 04:10

sonofagl1tch marked this pull request as ready for review November 22, 2025 03:11

feat: add category filter to all Prowler dashboards #9137

Are you sure you want to change the base?

feat: add category filter to all Prowler dashboards #9137

Conversation

sonofagl1tch commented Nov 2, 2025

Uh oh!

github-actions bot commented Nov 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jfagoagas commented Nov 4, 2025

Uh oh!

jfagoagas commented Nov 4, 2025

Uh oh!

sonofagl1tch commented Nov 4, 2025

Uh oh!

sonofagl1tch commented Nov 5, 2025

Uh oh!

Uh oh!

josemazo left a comment

Choose a reason for hiding this comment

Uh oh!

sonofagl1tch commented Nov 10, 2025

Uh oh!

josemazo commented Nov 11, 2025

Uh oh!

josemazo left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

sonofagl1tch commented Nov 11, 2025

Uh oh!

josemazo commented Nov 11, 2025

Uh oh!

sonofagl1tch commented Nov 11, 2025

Uh oh!

sonofagl1tch commented Nov 13, 2025

Uh oh!

pedrooot left a comment

Choose a reason for hiding this comment

Uh oh!

AdriiiPRodri commented Nov 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

1. Use the existing index on (provider, check_id) and avoid touching JSONB entirely

Benefits of This Approach

2. Denormalize categories into a dedicated field or table

Pros:

Cons:

Uh oh!

sonofagl1tch commented Nov 20, 2025

1. Use the existing index on (provider, check_id) and avoid touching JSONB entirely

Benefits of This Approach

2. Denormalize categories into a dedicated field or table

Pros:

Cons:

Uh oh!

sonofagl1tch commented Nov 22, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

github-actions bot commented Nov 2, 2025 •

edited

Loading

AdriiiPRodri commented Nov 19, 2025 •

edited

Loading