Device or resource busy
sometimes I checkout a svn dir on mfs mount point , shows the error "Device or resource busy ". anyone knows why ?
version: 3.0.88
It is hard to guess why it happens. In general mfs mount never returns EBUSY (Device or resource busy), so this error must be somehow generated in the kernel. Maybe you have open files or directories which are set as CWD of some processes inside the directory you want to checkout to?
If you tell as more about this case we may try to reproduce this on our mfs instance, but very likely it is not connected (at least directly) with MFS.
CentOS 6.9 Kernel 2.6.32-696 Fuse 2.8.5 MooseFS 3.0.101
I have a `random` trouble with the same issue «device or resource busy.»
Problem identification :
When shutil.copytree do a makedirs.os (before 2.7.15 shutil rewrite, and with 2.7.15, and with system python 2.6, but i can't test with py 3.x )
Problem reproduction :
import os
import time
for i in range(33300):
try:
from datetime import datetime
tStamp = datetime.utcnow().strftime('%Y-%m-%d_%H-%M-%S-%f')
pNum = str(i).zfill(3)
fName = tStamp+pNum
os.makedirs(fName + "/seconddir" + "/otherdir" + "/lastdir" , 0755)
except OSError as e:
if e.errno != errno.EEXIST:
raise
The first time, it work good, and i erase alls dirs manually after (rm -rf 2018* ; sync). The second time only 5491 directories were created. The third time, 1 directory only. The last test, 471 directories created. Created a really new directory somewhere else, and retry : same behaviour, the first test is good, the others fails with random dirs number really created before raise in my stup** script)
No visibles errors on the side of MooseFS. But this never happends (the same hard, same os, same softwares, same conditions) on the NFS parts. ( readers please : do not overinterpret MooseFS have others goals and do the right job, we just try to understand this case )
Your track, acid-maker, with CWD (and fuse tag) seems interesting too, how could i improve my tests and help identify the pb ?
We have similar problem. Any guide as to what possibly causes this would be great. Worth mentioning that while this happens, running same code on a different host (client) with same spec would work. The code we use to replicate is:
import os
import shutil
from time import sleep
# Delete directory.
for i in range(33300):
fig_dir = "/mfsmount/test/test_resource_busy/test1"
print(os.path.exists(fig_dir))
if os.path.exists(fig_dir):
print("Deleting folder: {}".format(fig_dir))
shutil.rmtree(fig_dir)
# then create it again to start afresh
print("Creating fresh folder: {}".format(fig_dir))
# sleep(0.001) # if this added the error doesn't happen.
os.makedirs(fig_dir)
Hi @pault28, Thank you for your update and python script. Could you please share with us more information about your server it happens on, including:
-
mfsmountversion, - OS and its version,
- Kernel version,
-
libfuseversion, - Information about custom OS / kernel tuning,
- Would it be possible for you to grab
.oplogon the mountpoint it happens while running the script? Please do it as follows:cat /mfsmount/.oplog > ~/mfs_oplog.txt(assuming/mfsmountis your MooseFS mount directory). Please note that.oplogis not a real file – it is a special, MooseFS-internal node and it is not shown inls -a.
We will try to plan testing on our test cluster.
From this place I would kindly ask and encourage other Community Users to help us debugging the issue – Dear Community, please do not hesitate to run tests with scripts provided above and collect the information mentioned. I hope it will be helpful in tracking this issue and may have a positive impact on resolving it.
Many thanks, Piotr
Re - above comment. Our spec is:
Ubuntu 16.04.3 LTS (GNU/Linux 4.4.0-87-generic x86_64)
Kernel 4.4.0-87-generic
Fuse 2.9.4-1ubuntu3.1
Moosefs client: 3.0.101-1 and 3.0.103-1
MooseFS 3.0.101-1
Is this fix https://github.com/moosefs/moosefs/blob/master/NEWS#L10 related?
@oxide94 Thanks for the logs grab info. I have done as you requested. Please see https://gist.github.com/pault28/c3900a4e9665f0a1cd4ccaf2e5848f44. I must warn it's quite a huge file with close to 200,000 lines.
mfsmount version details below:
MFS version 3.0.101-1
FUSE library version: 2.9.4
fusermount version: 2.9.4
PS: We don't have a custom OS /Kernel tuning.
Hi @pault28, Thank you for your update and python script. Could you please share with us more information about your server it happens on, including:
mfsmountversion,- OS and its version,
- Kernel version,
libfuseversion,- Information about custom OS / kernel tuning,
- Would it be possible for you to grab
.oplogon the mountpoint it happens while running the script? Please do it as follows:cat /mfsmount/.oplog > ~/mfs_oplog.txt(assuming/mfsmountis your MooseFS mount directory). Please note that.oplogis not a real file – it is a special, MooseFS-internal node and it is not shown inls -a.We will try to plan testing on our test cluster.
From this place I would kindly ask and encourage other Community Users to help us debugging the issue – Dear Community, please do not hesitate to run tests with scripts provided above and collect the information mentioned. I hope it will be helpful in tracking this issue and may have a positive impact on resolving it.
Many thanks, Piotr
@oxide94 - Any word on this?
Hi @pault28, Thanks for bumping the thread. Unfortunately I got very busy recently and I cannot take care about this case myself, but I hope Alex @xandrus would have a moment – if not this week, maybe next week?
Thanks, Piotr
Hi, Thank you for the information. Basically, the problem only appeared on older kernels and FUSE like in CentOS 6, but this is something new.
I have executed your test in our lab, but at this moment we don't see any problems. This can take some time because it is highly probable that errors will not appear until inodes start to be reused in the kernel.
Out of curiosity. Would you be so kind and tell us how fast you will get this error when you umount and mount moosefs client again?
By the way, how many make dirs and remove dirs are made on these "ebusy" nodes?
Hi @xandrus Thanks for coming back on this. We are using this on a production system, so umount is highly discouraged. However I have just ran the code on two different servers without experiencing the bug yet. As far as I know it happens unpredictably. What will good is figuring out anything we can tune to avoid it happening.
Also you said you saw this "older kernels and FUSE like in CentOS 6", can you mention the exact versions and was there any there any recommended fix?
By the way, how many make dirs and remove dirs are made on these "ebusy" nodes? As whole we have a lot of directories. Total as reported in mfs master as of today is as follows:
Dirs: 6344723
Files: 8744041
@oxide94 @xandrus - just chasing this up. Any new development on this?
There is also a randow error ":boost::filesystem::create_directory: Device or resource busy" reported by my mongodb backup script. mfs version: 3.0.104-1 chunkserver veriosn: 3.0.104-1 ,3.0.105-1 mfs client version: 3.0.104-1 FUSE library version: 2.9.2 fusermount version: 2.9.2
@MonkeyFang - what are your mfsmount options?
Less occurences but still have "device or resource busy" with MooseFS 3.0.107 & MooseFS 3.0.109 (impact on all app that use MooseFS storage)
@tnktls - what are your mfsmount options?
@chogata : mfsmaster=server,port=9421,mfsxattrcacheto=60,mfsentrycacheto=15,mfsnegentrycacheto=2,mfsdirentrycacheto=30 0 0
fuse.mfs rw,relatime,userid=0,groupid=0,allowother 0 0
@tnktls I just noticed you wrote earlier that you use CentOS 6. Kernel 2.x that is used in CentOS 6 has a bug, it does not re-use inodes, so after long/intense operation period it can "run out" of them, resulting in this problem. There is nothing we can do about it, we recommend upgrading to CentOS 7.
If anybody has this problem on another OS/newer kernel, please, report the details of your operating system in this thread.
We recently run some more tests and any scenarios we knew that could generate the EBUSY error stopped working on kernel 4.19. So our current recommendation is to upgrade to at least kernel 4.19 whenever possible.
@chogata OK, kernel 4.19, i have noted your advice, so : can you remove the mention that MooseFS support Red Hat 6 (&Centos6) ? To clarify the situation.
This problem is widely encountered by different projects using FUSE. We could reproduce the problem very easly with 'dd ; mkfs.ntfs' for example (see Red Hat solutions 1268853 for a bigger problem), or in Ruby who seems to have implemented a solution (Rfuse::fuse::unlink) or in GO (depecrated pathfs to go-fuse today) And the most relevant seems to be in MapR (see https://mapr.com/support/s/article/mkdir-fails-with-EBUSY-on-MapR-Fuse-Mount )
So i'm afraid I don't really understand why MooseFS wouldn't implement a similar solution and tell to us "upgrade your kernel" ! So MooseFS does not support Red Hat 6 or Red Hat 7 if they request a kernel 4.19+
I really hope a solution, upstream with FUSE or internal moosefs bypass/protection like other projects. And I remain at your disposal,
This is the repository for community edition version of MooseFS (non Pro). Users of the Pro version should request assistance with their issues via Pro support channels.
Our current recommendation for community users is to use kernel 4.19+ whenever possible. This issue was not closed, which means we have our attention on this problem. We implemented some changes that got rid of the EBUSY problem for some use cases.
With current MooseFS architecture, the only solution that can get rid of the problem in every use case with 100% guarantee of success would be to not re-use inodes at all. But that would mean huge increase in RAM demand on Master (and potentially also running out of inodes on very busy clusters), so it's not a viable solution. While we pursue other avenues to decrease the problem, we cannot guarantee we will find a way to completely eradicate it. Others having done so may be because their system architecture is different than ours. And complete change of architecture is out of the question - by the time we accomplished that, those systems for which we made the effort would no longer be supported by their publishers anyway...
We have some Pro channel tickets for a long time ago (and thanks to provides a semi-release for us, before ~107) and we've done all your recommendations (now on 109) through the support of our industrial subcontractor. But there are always "ebusy" today. xxxxTB on MooseFS on production systems we cant upgrade (this is not dev tools nore geek toy : strict n really hot prod)
Is there any chance if we switch to fuse3 (semi support by Red Hat) on ours RH6 system improves the situation of moosefs ?
Second : have you considered making and maintaining your own gnu / linux distribution?
I'm afraid fuse3 won't help you. The problem lies within the kernel itself. There is no way to "ask" the kernel, which inodes it still keeps in its cache. So if there are bugs in the kernel that cause it to keep inodes it absolutely shouldn't keep (inodes it was instructed to "forget"), we don't know about it. And version of fuse doesn't matter, because it's not fuse that decides, which inodes to keep.
We have one more idea how to prevent the kernel from keeping inodes cached for too long, we are currently preparing to test it, but we cannot yet predict if it will help at all. I will keep you posted.