feat(reports): Set a minimum interval for each report's execution
SUMMARY
It's currently possible to create alerts and reports configured to be executed every minute. While it might be a suitable implementation for some use-cases, some users might create assets with a more recurrent frequency needed, leading to additional load to the DB which can cause slowness, timeouts, etc.
This PR introduces two configs that can be specified in superset_config.py:
# Set a minimum interval threshold between executions (for each Alert/Report)
# Value should be the amount of minutes (integer). Min: 1, Max: 60
ALERT_MINIMUM_INTERVAL_MINUTES = None
REPORT_MINIMUM_INTERVAL_MINUTES = None
This minimum interval is validated for both creation of new reports and also the modification of existing ones.
Note that this PR doesn't include a migration, so changing this config won't automatically change existing reports.
BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF
No UI changes.
TESTING INSTRUCTIONS
Both integration and unit tests added. For manual testing:
- Specify a minimum interval in config:
ALERT_MINIMUM_INTERVAL_MINUTES = 2
REPORT_MINIMUM_INTERVAL_MINUTES = 2
- Try creating a report/alert that exceeds this limit.
ADDITIONAL INFORMATION
- [ ] Has associated issue:
- [ ] Required feature flags:
- [ ] Changes UI
- [ ] Includes DB Migration (follow approval process in SIP-59)
- [ ] Migration is atomic, supports rollback & is backwards-compatible
- [ ] Confirm DB migration upgrade and downgrade tested
- [ ] Runtime estimates and downtime expectations provided
- [x] Introduces new feature or API
- [ ] Removes existing feature or API
I'm a bit worried about manipulating the crontab as a string to figure out the shortest interval — it can miss edge cases, and it forces constraints to the user due to the implementation details (in this case, the interval has to be in minutes, and has a maximum value of 60, which are artificial requirements).
An alternative solution would be to allow any minimum interval between reports:
ALERT_MINIMUM_INTERVAL_MINUTES = timedelta(seconds=90)Or, as it seems to be more common for intervals in
config.py:ALERT_MINIMUM_INTERVAL_MINUTES = int(timedelta(seconds=90).total_seconds())Then you can compute a few execution times programmatically and see if it violates the config (untested):
def is_valid_schedule(cron_schedule: str, config_value: int) -> bool: it = croniter(cron_schedule) previous = next(it) for _ in range(1000): # how many? I have no idea... current = next(it) interval, previous = current - previous, current if interval.total_seconds() < config_value: return False return True
hey @betodealmeida that was the route I was initially going for too, but "how many repetitions?" (in the for _ in range(x)) is what made me think of another alternative. There's a lot of flexibility in the configuration (like the user could make it every 3 minutes until minute 58, but then re-execute on 59, select multiple days with a different interval between each in a month, etc).
I can see benefits and concerns in both routes:
-
PR approach: Biggest advantage here is that its more limited scope can validate all executions, since it only cares to the minute piece. Biggest disadvantage is that it's more restrictive (limit can't be higher than 60 minutes), and that it's handling a cron schedule as a string.
-
Implementing cron validation: No limit on the configuration side (user could create any interval needed like days, hours, etc). The solution might not iterate on all executions.
Since I believe most Orgs would like to limit this in the minutes range (like at least 5/10 minutes interval), I thought it made more sense to go with the first route. But I'm happy to migrate to the second approach -- we could start with a loop with 1440 as the repetition count (amount of minutes in a day) and go from there.
thanks for the feedback, @eschutho 🙏 I'll put this on hold until we align on best route to move forward, before working on these.
@Vitor-Avila how do these setting reflected in the UI given one can specifying whatever cron schedule they desire?
@Vitor-Avila how do these setting reflected in the UI given one can specifying whatever cron schedule they desire?
@john-bodley I haven't implemented any UI validation, so the user would face the toast error message when trying to save:
Codecov Report
All modified and coverable lines are covered by tests :white_check_mark:
Project coverage is 83.28%. Comparing base (
2e5f3ed) to head (5f18562). Report is 31 commits behind head on master.
Additional details and impacted files
@@ Coverage Diff @@
## master #28176 +/- ##
===========================================
+ Coverage 60.49% 83.28% +22.79%
===========================================
Files 1931 520 -1411
Lines 76241 36922 -39319
Branches 8566 0 -8566
===========================================
- Hits 46122 30752 -15370
+ Misses 28015 6170 -21845
+ Partials 2104 0 -2104
| Flag | Coverage Δ | |
|---|---|---|
| hive | 49.19% <13.55%> (+0.03%) |
:arrow_up: |
| javascript | ? |
|
| mysql | 77.56% <96.61%> (?) |
|
| postgres | 77.69% <96.61%> (?) |
|
| presto | 53.81% <13.55%> (+0.01%) |
:arrow_up: |
| python | 83.28% <100.00%> (+19.80%) |
:arrow_up: |
| sqlite | 77.16% <96.61%> (?) |
|
| unit | 57.74% <42.37%> (+0.12%) |
:arrow_up: |
Flags with carried forward coverage won't be shown. Click here to find out more.
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.