dcache icon indicating copy to clipboard operation
dcache copied to clipboard

Bulk release requests are not working if using relative path

Open ageorget opened this issue 1 year ago • 4 comments

Hi,

I found that release process is not working when the release is using relative path (without prefix) and this could explain why our Atlas staging buffer is full most of the time.

To reproduce it, I send a staging request of this file /atlasmctape/mc16_13TeV/HITS/e8351_s3126/mc16_13TeV.700337.Sh_2211_Znunu_pTV2_CVetoBVeto.simul.HITS.e8351_s3126_tid30364865_00/HITS.30364865._017868.pool.root.1

cat stageAtlas.json
{
"files": [
{"path": "/atlasmctape/mc16_13TeV/HITS/e8351_s3126/mc16_13TeV.700337.Sh_2211_Znunu_pTV2_CVetoBVeto.simul.HITS.e8351_s3126_tid30364865_00/HITS.30364865._017868.pool.root.1","diskLifetime":"PT1H"}
]
}

curl --capath /etc/grid-security/certificates --cacert $X509_USER_PROXY --cert $X509_USER_PROXY -X POST "https://ccdcamcli08.in2p3.fr:3880/api/v1/tape/stage" -H  "accept: application/json" -H  "content-type: application/json" -d @stageAtlas.json
{
  "requestId" : "1b72f21e-d66a-4af7-a784-6178a3c3a35c"
}%        

level=INFO ts=2024-08-12T16:07:39.771+0200 event=org.dcache.frontend.request request.method=POST request.url=https://ccdcamcli08.in2p3.fr:3880/api/v1/tape/stage response.code=201 response.reason=Created location=https://ccdcamcli08.in2p3.fr:3880/api/v1/tape/stage/1b72f21e-d66a-4af7-a784-6178a3c3a35c socket.remote=[2001:660:5009:84:134:158:239:7]:35504 user-agent=curl/7.29.0 user.dn="CN=1855496286,CN=GEORGET Adrien [email protected],O=Centre national de la recherche scientifique,C=FR,DC=tcs,DC=terena,DC=org" user.mapped=3327:124 request.entity="{\"files\":[{\"path\"[...]fetime\":\"PT1H\"}]}" response.entity="{\n  \"requestId\" : \"1b72f21e-d66a-4a[...]" duration=15

Staging is OK and file is pinned on disk cache :

\s pool-atlas-read-li425a rep sticky ls 000098FBFE5589274CABB284DA5BBB379C4B
self : expires 8/12/24, 4:12 PM
PinManager-0649a68f-2bc8-48e6-8138-c40d0b4bf130 : expires 8/14/24, 4:37 PM

Then I release the file using his relative path :

archiveinfo.json 
{
"paths": ["/atlasmctape/mc16_13TeV/HITS/e8351_s3126/mc16_13TeV.700337.Sh_2211_Znunu_pTV2_CVetoBVeto.simul.HITS.e8351_s3126_tid30364865_00/HITS.30364865._017868.pool.root.1"]
}

curl --capath /etc/grid-security/certificates --cacert $X509_USER_PROXY --cert $X509_USER_PROXY -X POST "https://ccdcamcli08.in2p3.fr:3880/api/v1/tape/release/1b72f21e-d66a-4af7-a784-6178a3c3a35c" -H  "accept: application/json" -H  "content-type: application/json" -d @archiveinfo.json

level=INFO ts=2024-08-12T16:10:47.568+0200 event=org.dcache.frontend.request request.method=POST request.url=https://ccdcamcli08.in2p3.fr:3880/api/v1/tape/release/1b72f21e-d66a-4af7-a784-6178a3c3a35c response.code=200 response.reason=OK socket.remote=[2001:660:5009:84:134:158:239:7]:35512 user-agent=curl/7.29.0 user.dn="CN=1855496286,CN=GEORGET Adrien [email protected],O=Centre national de la recherche scientifique,C=FR,DC=tcs,DC=terena,DC=org" user.mapped=3327:124 request.entity="{\"paths\":[\"/atlas[...]68.pool.root.1\"]}" duration=11

After 30min, pin is always active :

\s pool-atlas-read-li425a rep sticky ls 000098FBFE5589274CABB284DA5BBB379C4B
PinManager-0649a68f-2bc8-48e6-8138-c40d0b4bf130 : expires 8/14/24, 4:37 PM

And if I try to release the file using his full path, the file is instantly unpin from the disk :

cat archiveinfo.json
{
"paths": ["/pnfs/in2p3.fr/data/atlas/atlasmctape/mc16_13TeV/HITS/e8351_s3126/mc16_13TeV.700337.Sh_2211_Znunu_pTV2_CVetoBVeto.simul.HITS.e8351_s3126_tid30364865_00/HITS.30364865._017868.pool.root.1"]
}

[16:17]:curl --capath /etc/grid-security/certificates --cacert $X509_USER_PROXY --cert $X509_USER_PROXY -X POST "https://ccdcamcli08.in2p3.fr:3880/api/v1/tape/release/1b72f21e-d66a-4af7-a784-6178a3c3a35c" -H  "accept: application/json" -H  "content-type: application/json" -d @archiveinfo.json

level=INFO ts=2024-08-12T16:17:19.146+0200 event=org.dcache.frontend.request request.method=POST request.url=https://ccdcamcli08.in2p3.fr:3880/api/v1/tape/release/1b72f21e-d66a-4af7-a784-6178a3c3a35c response.code=200 response.reason=OK socket.remote=[2001:660:5009:84:134:158:239:7]:35522 user-agent=curl/7.29.0 user.dn="CN=1855496286,CN=GEORGET Adrien [email protected],O=Centre national de la recherche scientifique,C=FR,DC=tcs,DC=terena,DC=org" user.mapped=3327:124 request.entity="{\"paths\":[\"/pnfs/[...]68.pool.root.1\"]}" duration=27

In PinManager : Aug 12 16:17:20 ccdcamcli08 dcache@PinManagerDomain[129736]: 12 Aug 2024 16:17:20 (PinManager) [BackgroundUnpinner-201460] Unpining [955776409] 000098FBFE5589274CABB284DA5BBB379C4B (1b72f21e-d66a-4af7-a784-6178a3c3a35c) by 3327:124 2024-08-12 16:07:39 to 2024-08-14 16:07:45 is READY_TO_UNPIN on pool-atlas-read-li425a:PinManager-0649a68f-2bc8-48e6-8138-c40d0b4bf130

[ccdcamcli06] (bulk@bulkDomain) ageorget > \s pool-atlas-read-li425a rep sticky ls 000098FBFE5589274CABB284DA5BBB379C4B
[ccdcamcli06] (bulk@bulkDomain) ageorget > 

Bulk service also doesn't report when a release request is not done. Can you check this please?

Adrien

ageorget avatar Aug 12 '24 14:08 ageorget

Likely the same patch I did for staging needs to be applied to release.

DmitryLitvintsev avatar Aug 12 '24 16:08 DmitryLitvintsev

OK. Like last time. Here I have built an RPM with a patch:

https://drive.google.com/file/d/1mgXibWbUUnqM0WsRclAKh-K8x3awIBkx/view?usp=sharing

Could you deploy it on you frontend door. Before doing so, make sure you tried it on our test system.

DmitryLitvintsev avatar Aug 12 '24 21:08 DmitryLitvintsev

Thank you Dmitry for your quick fix. I just copied the frontend jar from the RPM like last time and Unpinning seems to work now :

Aug 13 10:03:04 ccdcamcli08 dcache@PinManagerDomain[129736]: 13 Aug 2024 10:03:04 (PinManager) [bulk PinManagerUnpin] Unpinned 0000D63CB52D45B0404F9AF60A8F8F8DDDE9 (955788379)
Aug 13 10:03:06 ccdcamcli08 dcache@PinManagerDomain[129736]: 13 Aug 2024 10:03:06 (PinManager) [bulk PinManagerUnpin] Unpinned 0000235E8344B2FA40B1A56C2E5F002231C4 (955788682)
Aug 13 10:03:06 ccdcamcli08 dcache@PinManagerDomain[129736]: 13 Aug 2024 10:03:06 (PinManager) [bulk PinManagerUnpin] Unpinned 0000A9031930F46E4E8D8540F22030C07F62 (955789107)
Aug 13 10:03:06 ccdcamcli08 dcache@PinManagerDomain[129736]: 13 Aug 2024 10:03:06 (PinManager) [bulk PinManagerUnpin] Unpinned 000040FBBED815A44A759CE710C56EF80CA3 (955789383)
Aug 13 10:03:07 ccdcamcli08 dcache@PinManagerDomain[129736]: 13 Aug 2024 10:03:07 (PinManager) [bulk PinManagerUnpin] Unpinned 0000536A5798A7474E109E7D02D2DD9D8683 (955788506)

ageorget avatar Aug 13 '24 07:08 ageorget

yes. Sorry for all this. This should have been fixed in one go.

DmitryLitvintsev avatar Aug 13 '24 11:08 DmitryLitvintsev