volsync icon indicating copy to clipboard operation
volsync copied to clipboard

Support for randomness / hashed substitution in a trigger schedule

Open adampetrovic opened this issue 1 year ago • 16 comments

Describe the feature you'd like to have. Currently the trigger schedule directive on ReplicationSource supports a crontab style schedule or a static directive such as daily, yearly etc.

For those that are using a templatized ReplicationSource, many PVCs will inherit the schedule from a single template and therefore have the same trigger time, leading to many backup operations being triggered at the same time.

See example of such a template here: https://github.com/adampetrovic/home-ops/blob/main/kubernetes/templates/volsync/minio.yaml#L34

Some cron implementations use H as a hashed substitution for a value:

'H' is used in the Jenkins continuous integration system to indicate that a "hashed" value is substituted. Thus instead of a fixed number such as '20 * * * *' which means at 20 minutes after the hour every hour, 'H * * * *' indicates that the task is performed every hour at an unspecified but invariant time for each task. This allows spreading out tasks over time, rather than having all of them start at the same time and compete for resources

With this approach, volsync could take the name of the job as the hash value to evaluate to say a minute within the hour.

What is the value to the end user? (why is it a priority?)

Better load spreading of backups

How will we know we have a good solution? (acceptance criteria)

A user is able to defer random execution / jitter of backup times to the volsync operator instead of having to manage it themselves.

Additional context

adampetrovic avatar Oct 12 '24 22:10 adampetrovic

This is an interesting one, potentially something could be done to add a simplified version of this - most likely by:

  • replace H values using a hash based on the replicationsource/dest name
  • put the new trigger schedule into the status
  • if there is a trigger schedule in the status, that will be used (hash will not get computed again)

I don't want to promise this, but sounds like a neat idea if there is time to work on it.

@adampetrovic It looks like the Jenkins implementation may allow for more complex substitutions such as H/10 or H(1-3)?

tesshuflower avatar Oct 17 '24 13:10 tesshuflower

Ideally this is something delegated to the cron library and all volsync needs to worry about is passing through the job name for hashing.

Looking at https://github.com/robfig/cron, the last PR merged was 4 years ago, so I'm not going to blindly assume that route will be a fruitful one.

@adampetrovic It looks like the Jenkins implementation may allow for more complex substitutions such as H/10 or H(1-3)?

Yes it does, again it feels off though to have to think about implementing custom cron parsing in volsync.

If volsync were open to using a fork of the cron library, I could have a go at implementing this.

adampetrovic avatar Oct 17 '24 18:10 adampetrovic

I have implemented basic support (just standard H and steps H/20) in my fork, as a reference.

I have extended the API to add a function ParseWithJobName which allows for a job name to be parsed, which is used for consistent hashing to the same value.

https://github.com/adampetrovic/cron/commit/4c91fead6ecc480e7977a46700d68cf7211780dc

adampetrovic avatar Oct 17 '24 22:10 adampetrovic

I think my initial reaction is I'd rather not point at a fork, but let me think about it a bit.

With a fork:

  • someone needs to maintain it (maybe not much since the original cron project doesn't seem to be getting updates), what happens when fixes are needed etc.
  • if the fork goes away, then I need to get this hash functionality back to maintain behaviour (once hashed cron schedules are in the project)

If we put hashing code into volsync, we could also show the computed schedule in the status, so a user would be able to see what the hash resolved to which might be kind of nice. Disadvantage is it's not so clean and contained in cron parsing code.

tesshuflower avatar Oct 18 '24 14:10 tesshuflower

Rather than forking, what about switching to a different cron library that's more actively maintained? If they are willing to add hash support, of course.

bo0tzz avatar Oct 18 '24 14:10 bo0tzz

I couldn't find an existing library that supports setting a jitter in the cron format. I have to agreed that the jenkins cron format would be ideal.

The author of robfig/cron was asking about to adding H support but closed the issue 😢 Judging by the authors Github activity they might not be coming back to support the project.

@tesshuflower Another option could be (if upstream is truly abandoned) that backube could fork robfig/cron and support what's requested here in the library. I get that adding another project to your org might stretch your resources a bit so I totally get why you wouldn't want to; however, if adding this jitter / randomness / hashed substitution helps with both volsync and snapscheduler getting this feature maybe it can be considered?

onedr0p avatar Oct 18 '24 20:10 onedr0p

Another option could be (if upstream is truly abandoned) that backube could fork robfig/cron and support what's requested here in the library

I was going to suggest the same thing.

Given it's 4 years since the last commit I can't imagine the maintenance overhead being large. I'd be happy to volunteer my time to help with this, if needed

adampetrovic avatar Oct 18 '24 21:10 adampetrovic

Thanks for the offer @adampetrovic but perhaps we just put the specific hashing code into VolSync itself - this gives us more control and we can still keep it separate from the actual cron implementation.

Steps:

  • parse cron schedule with hashing (e.g. H * * * *). I was wrong above, I think we still need to compute this every time, in case a user changes the schedule
  • put the computed schedule into the status (e.g. 20 * * * *)
  • use the computed schedule from the status (same code as current)

This is something I can help with when I have some time - I'll take a look at your hashing function at some point and can perhaps use that (or I can get you to contribute it directly here if you prefer).

Questions:

  • What range of use-cases would we want to support when it comes to H?

tesshuflower avatar Oct 22 '24 16:10 tesshuflower

What range of use-cases would we want to support when it comes to H?

I think the primary set of features that cron already has like:

  • Ranges H(0-30) to support the ability to limit the range within a specific time period (e.g. all backups within the first 30 minutes of the hour)
  • Steps H/15 or H(0-30)/15 - which would allow for say, 15 minute intervals 3, 18, 33, 48 or 3, 18 (ranged)
  • Basic value separators perhaps? H(0-30),45 - 3, 18, 45. This would allow for a combination of randomness (within a range) and a deterministic value statically provided.

adampetrovic avatar Oct 22 '24 19:10 adampetrovic

Are you sure you want to do this in volsync? There's a tonne of parsing / repacking unrelated to the Hash feature that would need to be implemented from scratch that already exists in the cron implementation.

For what it's worth, I've now also added support for hashes with a range in my fork.

adampetrovic avatar Oct 22 '24 19:10 adampetrovic

I probably wasn't clear - if we support hashing, we'd just parse the hash values out, compute and put into the status - then scheduling would look at that resolved cron schedule (e.g H * * * * resolved to 20 * * * *) and figure out the next scheduled time using the existing cron parsing code.

tesshuflower avatar Oct 22 '24 20:10 tesshuflower

Yes, I am aware :)

Volsync would need to basically parse all possible permutations of a cron expression into individual fields and sub-fields within those fields as well as worry about the specifics of the Hash expression. Not impossible by any means, just a lot of edge cases to deal with for something that volsync shouldn't be directly concerned with (IMO).

For example:

0,1,H(5-10)/2,30 * * * * * involves parsing each field (split by <space>), then within each field split by , and then the mechanics of the hash implementation, then replacing the H(5-10)/2 into a comma separated list of values coming out of the hash calculation.

adampetrovic avatar Oct 22 '24 20:10 adampetrovic

I also forgot that the parsing function would need to know which field it is parsing (seconds, minutes, hours etc) in order to generate a valid set of values within a range.

adampetrovic avatar Oct 22 '24 21:10 adampetrovic

agreed, it's not super simple, which I think is why we'd want to keep the set of use-cases we support for H to be constrained to something that would be useful to you, at least at first.

tesshuflower avatar Oct 23 '24 13:10 tesshuflower

For me just a simple H would suffice. For others I can see a use case for wanting to restrict the output to a certain range (e.g first 30 minutes in an hour) with H(0-30)

adampetrovic avatar Oct 23 '24 14:10 adampetrovic

I urgently need this feature. Currently, I perform a backup every 30 minutes(based on a template). Below is the spike in CPU/memory/IO during my backup period:

Image

east4ming avatar Feb 22 '25 13:02 east4ming