Pool Managers-Pools: Duplicate Restores on Multiple Pools Following Unexpected Pool Restart
Hello,
With unexpected pool restarts described in https://github.com/dCache/dcache/issues/7652, we have observed that ongoing restores on a restarted pool are being rescheduled in duplicate across multiple pools. We have three pool managers in the USATLAS settings(dCache: 9.2.17). The duplicate and concurrent restores for the same file than appears only as a single restore on the pool managers.
Example:
The restore for 0000269A5C57C40241629D7EFBC3654FB04A started on pool dc269_12 which was automatically restarted due to a memory error. The timestamps in billing are:
Start time: 2024-08-31 21:12:31 End time: 2024-08-31 21:28:49
The restore was then rescheduled concurrently on three pools: dc263_12, dc267_12 and dc278_12 (still ongoing)
The timestamps in billing are: Start time on dc263_12: 2024-08-31 21:21:10 Start time on dc267_12: 2024-09-03 14:11:33
and dc278_12 which is only associated with the restore on the pool managers:
0000269A5C57C40241629D7EFBC3654FB04A@internal-net-external-net-world-net-*/* m=15 r=0 [dc278_12] [Waiting for stage: dc278_12 09.03 14:11:39] {0,}
Thank you in advance for any help