superset feat(reports): Set a minimum interval for each report's execution

SUMMARY

It's currently possible to create alerts and reports configured to be executed every minute. While it might be a suitable implementation for some use-cases, some users might create assets with a more recurrent frequency needed, leading to additional load to the DB which can cause slowness, timeouts, etc.

This PR introduces two configs that can be specified in superset_config.py:

# Set a minimum interval threshold between executions (for each Alert/Report)
# Value should be the amount of minutes (integer). Min: 1, Max: 60
ALERT_MINIMUM_INTERVAL_MINUTES = None
REPORT_MINIMUM_INTERVAL_MINUTES = None

This minimum interval is validated for both creation of new reports and also the modification of existing ones.

Note that this PR doesn't include a migration, so changing this config won't automatically change existing reports.

BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF

No UI changes.

TESTING INSTRUCTIONS

Both integration and unit tests added. For manual testing:

Specify a minimum interval in config:

ALERT_MINIMUM_INTERVAL_MINUTES = 2
REPORT_MINIMUM_INTERVAL_MINUTES = 2

Try creating a report/alert that exceeds this limit.

ADDITIONAL INFORMATION

[ ] Has associated issue:
[ ] Required feature flags:
[ ] Changes UI
[ ] Includes DB Migration (follow approval process in SIP-59)
- [ ] Migration is atomic, supports rollback & is backwards-compatible
- [ ] Confirm DB migration upgrade and downgrade tested
- [ ] Runtime estimates and downtime expectations provided
[x] Introduces new feature or API
[ ] Removes existing feature or API

Apr 22 '24 18:04 Vitor-Avila

I'm a bit worried about manipulating the crontab as a string to figure out the shortest interval — it can miss edge cases, and it forces constraints to the user due to the implementation details (in this case, the interval has to be in minutes, and has a maximum value of 60, which are artificial requirements).

An alternative solution would be to allow any minimum interval between reports:
ALERT_MINIMUM_INTERVAL_MINUTES = timedelta(seconds=90)
Or, as it seems to be more common for intervals in config.py:
ALERT_MINIMUM_INTERVAL_MINUTES = int(timedelta(seconds=90).total_seconds())
Then you can compute a few execution times programmatically and see if it violates the config (untested):
def is_valid_schedule(cron_schedule: str, config_value: int) -> bool:
    it = croniter(cron_schedule)
    previous = next(it)
    for _ in range(1000):  # how many? I have no idea...
        current = next(it)
        interval, previous = current - previous, current
        if interval.total_seconds() < config_value:
            return False

    return True

hey @betodealmeida that was the route I was initially going for too, but "how many repetitions?" (in the for _ in range(x)) is what made me think of another alternative. There's a lot of flexibility in the configuration (like the user could make it every 3 minutes until minute 58, but then re-execute on 59, select multiple days with a different interval between each in a month, etc).

I can see benefits and concerns in both routes:

PR approach: Biggest advantage here is that its more limited scope can validate all executions, since it only cares to the minute piece. Biggest disadvantage is that it's more restrictive (limit can't be higher than 60 minutes), and that it's handling a cron schedule as a string.
Implementing cron validation: No limit on the configuration side (user could create any interval needed like days, hours, etc). The solution might not iterate on all executions.

Since I believe most Orgs would like to limit this in the minutes range (like at least 5/10 minutes interval), I thought it made more sense to go with the first route. But I'm happy to migrate to the second approach -- we could start with a loop with 1440 as the repetition count (amount of minutes in a day) and go from there.

Apr 23 '24 12:04 Vitor-Avila

thanks for the feedback, @eschutho 🙏 I'll put this on hold until we align on best route to move forward, before working on these.

Apr 23 '24 12:04 Vitor-Avila

@Vitor-Avila how do these setting reflected in the UI given one can specifying whatever cron schedule they desire?

Apr 26 '24 17:04 john-bodley

@Vitor-Avila how do these setting reflected in the UI given one can specifying whatever cron schedule they desire?

@john-bodley I haven't implemented any UI validation, so the user would face the toast error message when trying to save:

May 03 '24 21:05 Vitor-Avila

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 83.28%. Comparing base (2e5f3ed) to head (5f18562). Report is 31 commits behind head on master.

Additional details and impacted files

@@             Coverage Diff             @@
##           master   #28176       +/-   ##
===========================================
+ Coverage   60.49%   83.28%   +22.79%     
===========================================
  Files        1931      520     -1411     
  Lines       76241    36922    -39319     
  Branches     8566        0     -8566     
===========================================
- Hits        46122    30752    -15370     
+ Misses      28015     6170    -21845     
+ Partials     2104        0     -2104

Flag	Coverage Δ
hive	`49.19% <13.55%> (+0.03%)`	:arrow_up:
javascript	`?`
mysql	`77.56% <96.61%> (?)`
postgres	`77.69% <96.61%> (?)`
presto	`53.81% <13.55%> (+0.01%)`	:arrow_up:
python	`83.28% <100.00%> (+19.80%)`	:arrow_up:
sqlite	`77.16% <96.61%> (?)`
unit	`57.74% <42.37%> (+0.12%)`	:arrow_up:

Flags with carried forward coverage won't be shown. Click here to find out more.

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

May 03 '24 22:05 codecov-commenter