datasets icon indicating copy to clipboard operation
datasets copied to clipboard

Option to disable file locking

Open VRehnberg opened this issue 1 year ago • 0 comments

Feature request

Commands such as load_dataset creates file locks with filelock.FileLock. It would be good if there was a way to disable this.

Motivation

File locking doesn't work on all file-systems (in my case NFS mounted Weka). If the cache_dir only had small files then it would be possible to point to local disk and the problem would be solved. However, as cache_dir is both where the small info files are written and the processed datasets are put this isn't a feasible solution.

Considering https://github.com/huggingface/datasets/issues/6395 I still do think this is something that belongs in HuggingFace. The possibility to control packages separately is valuable. It might be that a user has their dataset on a file-system that doesn't support file-locking while they are using file locking on local disk to control some other type of access.

Your contribution

My suggested solution:

diff --git a/src/datasets/utils/_filelock.py b/src/datasets/utils/_filelock.py
index 19620e6e..58f41a02 100644
--- a/src/datasets/utils/_filelock.py
+++ b/src/datasets/utils/_filelock.py
@@ -18,11 +18,15 @@
 import os
 
 from filelock import FileLock as FileLock_
-from filelock import UnixFileLock
+from filelock import SoftFileLock, UnixFileLock
 from filelock import __version__ as _filelock_version
 from packaging import version
 
 
+if os.getenv('HF_USE_SOFTFILELOCK', 'false').lower() in ('true', '1'):
+    FileLock_ = SoftFileLock
+
+
 class FileLock(FileLock_):
     """
     A `filelock.FileLock` initializer that handles long paths.

VRehnberg avatar Mar 20 '24 15:03 VRehnberg