dcache
dcache copied to clipboard
NFS: unable to copy a file.
Here is situation.
[root@mu2ebuild01 litvinse]# time cp /pnfs/mu2e/persistent/users/mu2epro/valjob/reco_031021/dig.brownd.CeEndpointMixTriggered.MDC2020k.001210_00000000.art .
just hangs, nothing happens. I can't kill client (see issue #6192).
I see that mover does not start on the pool while cp is running:
[fndca3b] (p-mu2e-stkendca1903-9@p-mu2e-stkendca1903-9Domain) enstore > mover ls
[fndca3b] (p-mu2e-stkendca1903-9@p-mu2e-stkendca1903-9Domain) enstore >
Another file, located on the same pool:
"bad file":
[root@mu2ebuild01 litvinse]# cat /pnfs/mu2e/persistent/users/mu2epro/valjob/reco_031021/".(id)(dig.brownd.CeEndpointMixTriggered.MDC2020k.001210_00000000.art)"
000014625314BA2C40208DD8E3DE2884B86D
"good file":
[root@mu2ebuild01 litvinse]# cat /pnfs/mu2e/persistent/users/mu2epro/valjob/reco_031021/".(id)(dig.brownd.CeEndpointMixTriggered.MDC2020k.001210_00000009.art)"
00009261CCDCE6BA43ADAB99FCAD46BD9083
[root@mu2ebuild01 litvinse]#
located on the same pool:
[fndca3b] (p-mu2e-stkendca1903-9@p-mu2e-stkendca1903-9Domain) enstore > rep ls 000014625314BA2C40208DD8E3DE2884B86D
000014625314BA2C40208DD8E3DE2884B86D <C-------X--L(0)[0]> 2064392593 si={mu2e.persistent}
[fndca3b] (p-mu2e-stkendca1903-9@p-mu2e-stkendca1903-9Domain) enstore > rep ls 00009261CCDCE6BA43ADAB99FCAD46BD9083
00009261CCDCE6BA43ADAB99FCAD46BD9083 <C-------X--L(0)[0]> 2046187316 si={mu2e.persistent}
I can copy second file:
[root@mu2ebuild01 litvinse]# time cp /pnfs/mu2e/persistent/users/mu2epro/valjob/reco_031021/dig.brownd.CeEndpointMixTriggered.MDC2020k.001210_00000009.art .
real 0m21.162s
user 0m0.048s
sys 0m4.290s
and see mover on the pool while cp is running:
[fndca3b] (p-mu2e-stkendca1903-9@p-mu2e-stkendca1903-9Domain) enstore > mover ls
83886213 : RUNNING : 00009261CCDCE6BA43ADAB99FCAD46BD9083 IoMode=[READ] h={NFSv4.1/pNFS,OS=[6179c3cf000200c40000048f, seq: 0],cl=[131.225.240.47]} bytes=330301440 time/sec=3 LM=0 si={mu2e.persistent}
[fndca3b] (p-mu2e-stkendca1903-9@p-mu2e-stkendca1903-9Domain) enstore >
I see 3 hanging processes accessing the file:
[root@mu2ebuild01 litvinse]# lsof /pnfs/mu2e/persistent/users/mu2epro/valjob/reco_031021/dig.brownd.CeEndpointMixTriggered.MDC2020k.001210_00000000.art
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
cp 10691 rlc 3r REG 0,47 2064392593 6402197645 /pnfs/mu2e/persistent/users/mu2epro/valjob/reco_031021/dig.brownd.CeEndpointMixTriggered.MDC2020k.001210_00000000.art
cp 10787 mu2epro 3r REG 0,47 2064392593 6402197645 /pnfs/mu2e/persistent/users/mu2epro/valjob/reco_031021/dig.brownd.CeEndpointMixTriggered.MDC2020k.001210_00000000.art
mu2e 28316 mu2epro 4r REG 0,47 2064392593 6402197645 /pnfs/mu2e/persistent/users/mu2epro/valjob/reco_031021/dig.brownd.CeEndpointMixTriggered.MDC2020k.001210_00000000.art
[root@mu2ebuild01 litvinse]#
so I kill them and drop cache for good measure
[root@mu2ebuild01 litvinse]# kill -9 10691 10787 28316
[root@mu2ebuild01 litvinse]# lsof /pnfs/mu2e/persistent/users/mu2epro/valjob/reco_031021/dig.brownd.CeEndpointMixTriggered.MDC2020k.001210_00000000.art
[root@mu2ebuild01 litvinse]# echo 3 > /proc/sys/vm/drop_caches
Try cp again:
[root@mu2ebuild01 litvinse]# time cp /pnfs/mu2e/persistent/users/mu2epro/valjob/reco_031021/dig.brownd.CeEndpointMixTriggered.MDC2020k.001210_00000000.art .
Just hangs, nothing happens.
I unmount/mount:
[root@mu2ebuild01 litvinse]# umount -l /pnfs/mu2e
[root@mu2ebuild01 litvinse]# mount /pnfs/mu2e
Not helping. No mover starting. Cp hanging.
This is dCache 7.2
I migrated file to another pool, same story. And yes, I can use other protocols:
[root@mu2ebuild01 litvinse]# dccp dcap://fndca1:24136/pnfs/fnal.gov/usr/mu2e/persistent/users/mu2epro/valjob/reco_031021/dig.brownd.CeEndpointMixTriggered.MDC2020k.001210_00000000.art .
2064392593 bytes (1.92 GiB) in 27 seconds (72.9 MiB/s)
[root@mu2ebuild01 litvinse]# echo $?
0