Ability to back up a subset of data by time
Proposal: (Reported in Community Slack)
influx backup backs up an entire bucket every time. It'd be nice if it supported backing up a subset of data by time-range.
Current behavior:
influx backup exposes a single parameter for --bucket-id. It backs up the entirety of the bucket every time. This causes the size of backup files to be ever-increasing, and the matching restore times to also grow.
Desired behavior:
influx backup should be extended to have optional --start and --end parameters, like 1.x influxd backup did. These params would limit the scope of the backup.
This would be most useful. Some of our users have > 50GB of data in their Influx 1.x databases. Without the ability to limit a backup in time, this prevents us from upgrading to 2.x.
I wonder if instead of start/end times we should be providing incremental point-in-time backups similar to Enterprise 1.x .
Roughly that works by only re-copying shards in the backup that have changed. If the backups are run less frequently than the shard group duration, and there are no historical writes, I think this would accomplish the desired outcome of not having ever-growing backups.
See also https://docs.influxdata.com/enterprise_influxdb/v1.9/administration/backup-and-restore/#backup-options , the incremental strategy.
We also have an open Enterprise feature request for something like:
incremental-delete-previous: backup data added since the previous backup, and delete previous restore points
Which I think is also being requested here.
See also the 2.x docs for default shard group duration: https://docs.influxdata.com/influxdb/v2.0/reference/internals/shards/#shard-group-duration
Would be very great to have. Running backups on large databases can be very cumbersome in 2.x. In 1.x we were able to "chunk" backups by timerange, making the backups less resource intensive (both in CPU and mem in influxdb, and in storage space on the server making the backup). Honestly was quite surprised this didn't make it into the new backup tooling.
Note that direct competitors/alternatives like VictoriaMetrics and TimescaleDB support both incremental and full backups.
Currently, backup or export of time ranges is still not supported? Very much needed to have.
Mark it, and I hope this feature will be added.