dcache icon indicating copy to clipboard operation
dcache copied to clipboard

NFS: unable to copy a file.

Open DmitryLitvintsev opened this issue 4 years ago • 1 comments

Here is situation.

[root@mu2ebuild01 litvinse]# time cp  /pnfs/mu2e/persistent/users/mu2epro/valjob/reco_031021/dig.brownd.CeEndpointMixTriggered.MDC2020k.001210_00000000.art . 

just hangs, nothing happens. I can't kill client (see issue #6192).

I see that mover does not start on the pool while cp is running:

[fndca3b] (p-mu2e-stkendca1903-9@p-mu2e-stkendca1903-9Domain) enstore > mover ls 
[fndca3b] (p-mu2e-stkendca1903-9@p-mu2e-stkendca1903-9Domain) enstore > 

Another file, located on the same pool:

"bad file":
[root@mu2ebuild01 litvinse]# cat /pnfs/mu2e/persistent/users/mu2epro/valjob/reco_031021/".(id)(dig.brownd.CeEndpointMixTriggered.MDC2020k.001210_00000000.art)"
000014625314BA2C40208DD8E3DE2884B86D

"good file":

[root@mu2ebuild01 litvinse]# cat /pnfs/mu2e/persistent/users/mu2epro/valjob/reco_031021/".(id)(dig.brownd.CeEndpointMixTriggered.MDC2020k.001210_00000009.art)"
00009261CCDCE6BA43ADAB99FCAD46BD9083
[root@mu2ebuild01 litvinse]# 

located on the same pool:

[fndca3b] (p-mu2e-stkendca1903-9@p-mu2e-stkendca1903-9Domain) enstore > rep ls 000014625314BA2C40208DD8E3DE2884B86D
000014625314BA2C40208DD8E3DE2884B86D <C-------X--L(0)[0]> 2064392593 si={mu2e.persistent}

[fndca3b] (p-mu2e-stkendca1903-9@p-mu2e-stkendca1903-9Domain) enstore > rep ls 00009261CCDCE6BA43ADAB99FCAD46BD9083
00009261CCDCE6BA43ADAB99FCAD46BD9083 <C-------X--L(0)[0]> 2046187316 si={mu2e.persistent}

I can copy second file:

[root@mu2ebuild01 litvinse]# time cp /pnfs/mu2e/persistent/users/mu2epro/valjob/reco_031021/dig.brownd.CeEndpointMixTriggered.MDC2020k.001210_00000009.art . 

real	0m21.162s
user	0m0.048s
sys	0m4.290s

and see mover on the pool while cp is running:

[fndca3b] (p-mu2e-stkendca1903-9@p-mu2e-stkendca1903-9Domain) enstore > mover ls 
83886213 : RUNNING : 00009261CCDCE6BA43ADAB99FCAD46BD9083 IoMode=[READ] h={NFSv4.1/pNFS,OS=[6179c3cf000200c40000048f, seq: 0],cl=[131.225.240.47]} bytes=330301440 time/sec=3 LM=0 si={mu2e.persistent}

[fndca3b] (p-mu2e-stkendca1903-9@p-mu2e-stkendca1903-9Domain) enstore > 

I see 3 hanging processes accessing the file:

[root@mu2ebuild01 litvinse]# lsof  /pnfs/mu2e/persistent/users/mu2epro/valjob/reco_031021/dig.brownd.CeEndpointMixTriggered.MDC2020k.001210_00000000.art
COMMAND   PID    USER   FD   TYPE DEVICE   SIZE/OFF       NODE NAME
cp      10691     rlc    3r   REG   0,47 2064392593 6402197645 /pnfs/mu2e/persistent/users/mu2epro/valjob/reco_031021/dig.brownd.CeEndpointMixTriggered.MDC2020k.001210_00000000.art
cp      10787 mu2epro    3r   REG   0,47 2064392593 6402197645 /pnfs/mu2e/persistent/users/mu2epro/valjob/reco_031021/dig.brownd.CeEndpointMixTriggered.MDC2020k.001210_00000000.art
mu2e    28316 mu2epro    4r   REG   0,47 2064392593 6402197645 /pnfs/mu2e/persistent/users/mu2epro/valjob/reco_031021/dig.brownd.CeEndpointMixTriggered.MDC2020k.001210_00000000.art
[root@mu2ebuild01 litvinse]# 

so I kill them and drop cache for good measure

[root@mu2ebuild01 litvinse]# kill -9 10691 10787 28316 
[root@mu2ebuild01 litvinse]# lsof  /pnfs/mu2e/persistent/users/mu2epro/valjob/reco_031021/dig.brownd.CeEndpointMixTriggered.MDC2020k.001210_00000000.art
[root@mu2ebuild01 litvinse]# echo 3 > /proc/sys/vm/drop_caches 

Try cp again:

[root@mu2ebuild01 litvinse]#  time cp  /pnfs/mu2e/persistent/users/mu2epro/valjob/reco_031021/dig.brownd.CeEndpointMixTriggered.MDC2020k.001210_00000000.art . 

Just hangs, nothing happens.

I unmount/mount:

[root@mu2ebuild01 litvinse]# umount -l /pnfs/mu2e
[root@mu2ebuild01 litvinse]# mount  /pnfs/mu2e

Not helping. No mover starting. Cp hanging.

This is dCache 7.2

DmitryLitvintsev avatar Oct 28 '21 00:10 DmitryLitvintsev

I migrated file to another pool, same story. And yes, I can use other protocols:

[root@mu2ebuild01 litvinse]# dccp dcap://fndca1:24136/pnfs/fnal.gov/usr/mu2e/persistent/users/mu2epro/valjob/reco_031021/dig.brownd.CeEndpointMixTriggered.MDC2020k.001210_00000000.art . 
2064392593 bytes (1.92 GiB) in 27 seconds (72.9 MiB/s)
[root@mu2ebuild01 litvinse]# echo $?
0

DmitryLitvintsev avatar Oct 28 '21 00:10 DmitryLitvintsev