feat(reports): Set a minimum interval for each report's execution #28176

Vitor-Avila · 2024-04-22T18:35:21Z

SUMMARY

It's currently possible to create alerts and reports configured to be executed every minute. While it might be a suitable implementation for some use-cases, some users might create assets with a more recurrent frequency needed, leading to additional load to the DB which can cause slowness, timeouts, etc.

This PR introduces two configs that can be specified in superset_config.py:

# Set a minimum interval threshold between executions (for each Alert/Report)
# Value should be the amount of minutes (integer). Min: 1, Max: 60
ALERT_MINIMUM_INTERVAL_MINUTES = None
REPORT_MINIMUM_INTERVAL_MINUTES = None

This minimum interval is validated for both creation of new reports and also the modification of existing ones.

Note that this PR doesn't include a migration, so changing this config won't automatically change existing reports.

BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF

No UI changes.

TESTING INSTRUCTIONS

Both integration and unit tests added. For manual testing:

Specify a minimum interval in config:

ALERT_MINIMUM_INTERVAL_MINUTES = 2
REPORT_MINIMUM_INTERVAL_MINUTES = 2

Try creating a report/alert that exceeds this limit.

ADDITIONAL INFORMATION

Has associated issue:
Required feature flags:
Changes UI
Includes DB Migration (follow approval process in SIP-59)
- Migration is atomic, supports rollback & is backwards-compatible
- Confirm DB migration upgrade and downgrade tested
- Runtime estimates and downtime expectations provided
Introduces new feature or API
Removes existing feature or API

eschutho · 2024-04-22T20:32:53Z

superset/commands/report/base.py

+ if report_type == ReportScheduleType.ALERT
+ else "REPORT_MINIMUM_INTERVAL_MINUTES"
+ )
+ minimum_interval = current_app.config.get(config_key)


you may want to also perform some validation to make sure that this value is a number, or it will fail below at line 119.

great catch! Just fixed that

eschutho · 2024-04-22T20:43:17Z

superset/commands/report/base.py

+ # Check if cron is set to every minute (*)
+ # or if it has an every-minute block (1-5)
+ if "-" in minutes_interval or minutes_interval == "*":
+ raise ReportScheduleFrequencyNotAllowed(report_type, minimum_interval)


Typically the first arg when raising will be the error message, so this feels a bit out of pattern. I'd suggest that we can either keep the error message very generic, and let the UI get more specific, or pass these in as optional kwargs (but not required).

Let me know if the new implementation is better -- happy to further change it as well.

eschutho · 2024-04-22T20:53:07Z

tests/unit_tests/commands/report/base_test.py

+ Test the ``validate_report_frequency`` method when there's
+ only a configuration for alerts and user is creating report.
+ """
+ for schedule in TEST_SCHEDULES:


I think @pytest.mark.parametrize would work well for you on these for loop tests.

ohh, wasn't familiar with it! just added it

betodealmeida

I'm a bit worried about manipulating the crontab as a string to figure out the shortest interval — it can miss edge cases, and it forces constraints to the user due to the implementation details (in this case, the interval has to be in minutes, and has a maximum value of 60, which are artificial requirements).

An alternative solution would be to allow any minimum interval between reports:

ALERT_MINIMUM_INTERVAL_MINUTES = timedelta(seconds=90)

Or, as it seems to be more common for intervals in config.py:

ALERT_MINIMUM_INTERVAL_MINUTES = int(timedelta(seconds=90).total_seconds())

Then you can compute a few execution times programmatically and see if it violates the config (untested):

def is_valid_schedule(cron_schedule: str, config_value: int) -> bool:
    it = croniter(cron_schedule)
    previous = next(it)
    for _ in range(1000):  # how many? I have no idea...
        current = next(it)
        interval, previous = current - previous, current
        if interval.total_seconds() < config_value:
            return False

    return True

Vitor-Avila · 2024-04-23T12:27:21Z

I'm a bit worried about manipulating the crontab as a string to figure out the shortest interval — it can miss edge cases, and it forces constraints to the user due to the implementation details (in this case, the interval has to be in minutes, and has a maximum value of 60, which are artificial requirements).

An alternative solution would be to allow any minimum interval between reports:
ALERT_MINIMUM_INTERVAL_MINUTES = timedelta(seconds=90)
Or, as it seems to be more common for intervals in config.py:
ALERT_MINIMUM_INTERVAL_MINUTES = int(timedelta(seconds=90).total_seconds())
Then you can compute a few execution times programmatically and see if it violates the config (untested):
def is_valid_schedule(cron_schedule: str, config_value: int) -> bool:
    it = croniter(cron_schedule)
    previous = next(it)
    for _ in range(1000):  # how many? I have no idea...
        current = next(it)
        interval, previous = current - previous, current
        if interval.total_seconds() < config_value:
            return False

    return True

hey @betodealmeida that was the route I was initially going for too, but "how many repetitions?" (in the for _ in range(x)) is what made me think of another alternative. There's a lot of flexibility in the configuration (like the user could make it every 3 minutes until minute 58, but then re-execute on 59, select multiple days with a different interval between each in a month, etc).

I can see benefits and concerns in both routes:

PR approach: Biggest advantage here is that its more limited scope can validate all executions, since it only cares to the minute piece. Biggest disadvantage is that it's more restrictive (limit can't be higher than 60 minutes), and that it's handling a cron schedule as a string.
Implementing cron validation: No limit on the configuration side (user could create any interval needed like days, hours, etc). The solution might not iterate on all executions.

Since I believe most Orgs would like to limit this in the minutes range (like at least 5/10 minutes interval), I thought it made more sense to go with the first route. But I'm happy to migrate to the second approach -- we could start with a loop with 1440 as the repetition count (amount of minutes in a day) and go from there.

Vitor-Avila · 2024-04-23T12:28:17Z

thanks for the feedback, @eschutho 🙏 I'll put this on hold until we align on best route to move forward, before working on these.

john-bodley · 2024-04-26T17:19:46Z

@Vitor-Avila how do these setting reflected in the UI given one can specifying whatever cron schedule they desire?

Vitor-Avila · 2024-05-03T21:34:49Z

@Vitor-Avila how do these setting reflected in the UI given one can specifying whatever cron schedule they desire?

@john-bodley I haven't implemented any UI validation, so the user would face the toast error message when trying to save:

codecov-commenter · 2024-05-03T22:17:02Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 83.28%. Comparing base (2e5f3ed) to head (5f18562).
Report is 43 commits behind head on master.

Additional details and impacted files

@@             Coverage Diff             @@
##           master   #28176       +/-   ##
===========================================
+ Coverage   60.49%   83.28%   +22.79%     
===========================================
  Files        1931      520     -1411     
  Lines       76241    36922    -39319     
  Branches     8566        0     -8566     
===========================================
- Hits        46122    30752    -15370     
+ Misses      28015     6170    -21845     
+ Partials     2104        0     -2104

Flag	Coverage Δ
hive	`49.19% <13.55%> (+0.03%)`	⬆️
javascript	`?`
mysql	`77.56% <96.61%> (?)`
postgres	`77.69% <96.61%> (?)`
presto	`53.81% <13.55%> (+0.01%)`	⬆️
python	`83.28% <100.00%> (+19.80%)`	⬆️
sqlite	`77.16% <96.61%> (?)`
unit	`57.74% <42.37%> (+0.12%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Vitor-Avila · 2024-05-04T01:09:57Z

superset/commands/report/base.py

+ logger.error(
+ "Invalid value for %s: %s", config_key, minimum_interval, exc_info=True
+ )
+ return


I decided to avoid raising a ValidationError here to prevent blocking users from creating alerts/reports until the configuration is fixed by an admin. Open to feedback, tho

betodealmeida

Awesome! I just left a couple suggestions.

superset/commands/report/exceptions.py

betodealmeida · 2024-05-07T15:55:03Z

superset/config.py

+ALERT_MINIMUM_INTERVAL = 0
+REPORT_MINIMUM_INTERVAL = 0


Maybe this would be clearer about the expected units of ALERT_MINIMUM_INTERVAL and REPORT_MINIMUM_INTERVAL.

Suggested change

ALERT_MINIMUM_INTERVAL = 0

REPORT_MINIMUM_INTERVAL = 0

ALERT_MINIMUM_INTERVAL = int(timedelta(seconds=0).total_seconds())

REPORT_MINIMUM_INTERVAL = int(timedelta(seconds=0).total_seconds())

ohh that's a great suggestion! I think I'll use int(timedelta(minutes=0).total_seconds()) since seconds are not really exposed in the report configuration, but it's def better than 0. 🙌

pull-request-size bot added the size/XL label Apr 22, 2024

github-actions bot added the doc Namespace | Anything related to documentation label Apr 22, 2024

rusackas requested a review from eschutho April 22, 2024 20:01

eschutho reviewed Apr 22, 2024

View reviewed changes

betodealmeida reviewed Apr 23, 2024

View reviewed changes

Vitor-Avila added 3 commits May 3, 2024 19:05

feat(reports): Set a minimum interval for each report's execution

58cfa47

feat(reports): Set a minimum interval for each report's execution

8152441

addressing PR feedback

c8b30f3

Vitor-Avila force-pushed the report-minimal-interval branch from 36c72c5 to c8b30f3 Compare May 3, 2024 22:10

Fixing tests

5659987

Vitor-Avila commented May 4, 2024

View reviewed changes

Merge branch 'master' into report-minimal-interval

5f18562

betodealmeida approved these changes May 7, 2024

View reviewed changes

dosubot bot added the lgtm This PR has been approved by a maintainer label May 7, 2024

addressing PR feedback pt2

42e0517

eschutho approved these changes May 8, 2024

View reviewed changes

eschutho merged commit 1bf0401 into apache:master May 8, 2024
29 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(reports): Set a minimum interval for each report's execution #28176

feat(reports): Set a minimum interval for each report's execution #28176

Vitor-Avila commented Apr 22, 2024

eschutho Apr 22, 2024

Vitor-Avila May 3, 2024

eschutho Apr 22, 2024

Vitor-Avila May 3, 2024

eschutho Apr 22, 2024

Vitor-Avila May 3, 2024

betodealmeida left a comment •

edited

Vitor-Avila commented Apr 23, 2024

Vitor-Avila commented Apr 23, 2024

john-bodley commented Apr 26, 2024

Vitor-Avila commented May 3, 2024

codecov-commenter commented May 3, 2024 •

edited

Vitor-Avila May 4, 2024

betodealmeida left a comment

betodealmeida May 7, 2024

Vitor-Avila May 7, 2024

feat(reports): Set a minimum interval for each report's execution #28176

feat(reports): Set a minimum interval for each report's execution #28176

Conversation

Vitor-Avila commented Apr 22, 2024

SUMMARY

BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF

TESTING INSTRUCTIONS

ADDITIONAL INFORMATION

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

betodealmeida left a comment • edited

Choose a reason for hiding this comment

Vitor-Avila commented Apr 23, 2024

Vitor-Avila commented Apr 23, 2024

john-bodley commented Apr 26, 2024

Vitor-Avila commented May 3, 2024

codecov-commenter commented May 3, 2024 • edited

Codecov Report

Choose a reason for hiding this comment

betodealmeida left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

betodealmeida left a comment •

edited

codecov-commenter commented May 3, 2024 •

edited