Pruning of assets from storage nodes that they are no longer obliged to store
Problem
Currently, storage nodes don't delete assets no longer assigned to them in the runtime. And hence dead assets keep on accumulating on the nodes. This means that disk usage of the nodes would continue to grow unless operators manually clean up the space which is very risky. Other complications of not pruning assets are that DataObjects/Bags can't be effectively moved b/w different buckets/operators, as this action does not physically free up the space, so Storage Lead can't use the Globally available disk space effectively.
Proposal
TBD
#there should be a mean for the storage server operator to run a command:
- that will show the diff between what is assigned to the bucket (i.e QN) and what is in the server.
- the command should have modes:
- View
- Action (delete ) , should have a data loss mitigation::
- Provide a warning before execution.
- Check the availability of the object where it should if any.
- Force a replication if the object does not exist where it should be,
- Set the storage server to auto do the pruning, the auto pruning ( This mode should also be available in storage server command as an option. )should have a data loss mitigation:
- Back off period between detection (logs and metrics should be generated at this stage) and actioning the pruning.
- Check the availability of the object where it should if any.
- Force a replication if the object does not exist where it should be,
shouldn't this happen automatically?
That is the ideal with couple of two more consideration:
- Data loss mitigation consideration.
- Ability of the operator to force it through command.
@zeeshanakram3 Should we close this already or there's more work to be done?