influxdb icon indicating copy to clipboard operation
influxdb copied to clipboard

Ability to back up a subset of data by time

Open danxmoran opened this issue 5 years ago • 4 comments

Proposal: (Reported in Community Slack)

influx backup backs up an entire bucket every time. It'd be nice if it supported backing up a subset of data by time-range.

Current behavior: influx backup exposes a single parameter for --bucket-id. It backs up the entirety of the bucket every time. This causes the size of backup files to be ever-increasing, and the matching restore times to also grow.

Desired behavior: influx backup should be extended to have optional --start and --end parameters, like 1.x influxd backup did. These params would limit the scope of the backup.

danxmoran avatar Dec 22 '20 16:12 danxmoran

This would be most useful. Some of our users have > 50GB of data in their Influx 1.x databases. Without the ability to limit a backup in time, this prevents us from upgrading to 2.x.

HogeBlekker avatar Mar 31 '21 10:03 HogeBlekker

I wonder if instead of start/end times we should be providing incremental point-in-time backups similar to Enterprise 1.x .

Roughly that works by only re-copying shards in the backup that have changed. If the backups are run less frequently than the shard group duration, and there are no historical writes, I think this would accomplish the desired outcome of not having ever-growing backups.

See also https://docs.influxdata.com/enterprise_influxdb/v1.9/administration/backup-and-restore/#backup-options , the incremental strategy.

We also have an open Enterprise feature request for something like:

incremental-delete-previous: backup data added since the previous backup, and delete previous restore points

Which I think is also being requested here.

See also the 2.x docs for default shard group duration: https://docs.influxdata.com/influxdb/v2.0/reference/internals/shards/#shard-group-duration

lesam avatar Sep 08 '21 14:09 lesam

Would be very great to have. Running backups on large databases can be very cumbersome in 2.x. In 1.x we were able to "chunk" backups by timerange, making the backups less resource intensive (both in CPU and mem in influxdb, and in storage space on the server making the backup). Honestly was quite surprised this didn't make it into the new backup tooling.

Note that direct competitors/alternatives like VictoriaMetrics and TimescaleDB support both incremental and full backups.

jaapz avatar Sep 18 '23 12:09 jaapz

Currently, backup or export of time ranges is still not supported? Very much needed to have.

hooklab avatar Aug 06 '24 07:08 hooklab

Mark it, and I hope this feature will be added.

DamonDBT avatar Feb 13 '25 02:02 DamonDBT