Option to disable file locking
Feature request
Commands such as load_dataset creates file locks with filelock.FileLock. It would be good if there was a way to disable this.
Motivation
File locking doesn't work on all file-systems (in my case NFS mounted Weka). If the cache_dir only had small files then it would be possible to point to local disk and the problem would be solved. However, as cache_dir is both where the small info files are written and the processed datasets are put this isn't a feasible solution.
Considering https://github.com/huggingface/datasets/issues/6395 I still do think this is something that belongs in HuggingFace. The possibility to control packages separately is valuable. It might be that a user has their dataset on a file-system that doesn't support file-locking while they are using file locking on local disk to control some other type of access.
Your contribution
My suggested solution:
diff --git a/src/datasets/utils/_filelock.py b/src/datasets/utils/_filelock.py
index 19620e6e..58f41a02 100644
--- a/src/datasets/utils/_filelock.py
+++ b/src/datasets/utils/_filelock.py
@@ -18,11 +18,15 @@
import os
from filelock import FileLock as FileLock_
-from filelock import UnixFileLock
+from filelock import SoftFileLock, UnixFileLock
from filelock import __version__ as _filelock_version
from packaging import version
+if os.getenv('HF_USE_SOFTFILELOCK', 'false').lower() in ('true', '1'):
+ FileLock_ = SoftFileLock
+
+
class FileLock(FileLock_):
"""
A `filelock.FileLock` initializer that handles long paths.