dcache icon indicating copy to clipboard operation
dcache copied to clipboard

Pool Managers-Pools: Duplicate Restores on Multiple Pools Following Unexpected Pool Restart

Open vingar opened this issue 1 year ago • 0 comments

Hello,

With unexpected pool restarts described in https://github.com/dCache/dcache/issues/7652, we have observed that ongoing restores on a restarted pool are being rescheduled in duplicate across multiple pools. We have three pool managers in the USATLAS settings(dCache: 9.2.17). The duplicate and concurrent restores for the same file than appears only as a single restore on the pool managers.

Example:

The restore for 0000269A5C57C40241629D7EFBC3654FB04A started on pool dc269_12 which was automatically restarted due to a memory error. The timestamps in billing are:

Start time: 2024-08-31 21:12:31 End time: 2024-08-31 21:28:49

The restore was then rescheduled concurrently on three pools: dc263_12, dc267_12 and dc278_12 (still ongoing)

The timestamps in billing are: Start time on dc263_12: 2024-08-31 21:21:10 Start time on dc267_12: 2024-09-03 14:11:33

and dc278_12 which is only associated with the restore on the pool managers:

0000269A5C57C40241629D7EFBC3654FB04A@internal-net-external-net-world-net-*/* m=15 r=0 [dc278_12] [Waiting for stage: dc278_12 09.03 14:11:39] {0,}

Thank you in advance for any help

vingar avatar Sep 05 '24 13:09 vingar