[BUG] can not sync dataset use alluxio runtime
What is your environment(Kubernetes version, Fluid version, etc.)
k8s version: 1.20.11 fluid version: 0.8.6-2131f34
Describe the bug
I create Dataset based on nfs path, when I use alluxio runtime , there exist no way to sync Dataset
What you expect to happen:
When I update data under nfs path, there exist way sync Dataset by hand or automatic.
How to reproduce it
Create Dataset based on nfs path, when data under nfs path changed , you will find no sync in Dataset
Additional Information
alluxio bug or fluid bug?
I tried to config auto sync in alluxio master, errors:
sync successfully by hand, however no sync in Dataset
@Crazybean-lwb From my understanding, Alluxio now only supports ActiveSync for HDFS (Refering to this doc). So when mounting a UFS of type local, ActiveSync cannot be enabled.
For more information, you can make a feature request in Alluxio community.
doc
ok👌 I will check later.
Another problem: sync successfully by hand, however no sync in Dataset
eg:
alluxio fs ls -R -Dalluxio.user.file.metadata.sync.interval=0 /dir
@Crazybean-lwb Did you mean Fluid Dataset's status (e.g. UfsTotalSize, FileNum) should be updated?
ok👌 I will check later. Another problem: sync successfully by hand, however no sync in Dataset eg:
alluxio fs ls -R -Dalluxio.user.file.metadata.sync.interval=0 /dir
@Crazybean-lwb If "no sync in Dataset" means you can't see the changes in your Application Pod, this is becase Alluxio Fuse uses Kernel Cache to cache the metadata. So it needs some time to see the changes in your Application Pod, you can try again.
And this is the document about the Alluxio Fuse Metadata Cache: Metadata Cache
yes,fluid dataset should sync togethere with ufs
ok👌 I will check later. Another problem: sync successfully by hand, however no sync in Dataset eg:
alluxio fs ls -R -Dalluxio.user.file.metadata.sync.interval=0 /dir@Crazybean-lwb If "no sync in Dataset" means you can't see the changes in your Application Pod, this is becase Alluxio Fuse uses Kernel Cache to cache the metadata. So it needs some time to see the changes in your Application Pod, you can try again.
And this is the document about the Alluxio Fuse Metadata Cache: Metadata Cache
Strange, I just change a word in a file, then I sync in alluxio client. I can find file changed in alluxio client, while days later no change in fluid dataset.
Strange, I just change a word in a file, then I sync in alluxio client. I can find file changed in alluxio client, while days later no change in fluid dataset.
@Crazybean-lwb Alluxio Fuse caches two types of data: metadata of the file and the actual data in the file. In Fluid, the default readonly configuration of the Alluxio Fuse is kernel_cache,ro,attr_timeout=7200,entry_timeout=7200, which means Alluxio Fuse will sync the metadata of the file but no data cache invalidation will happen. That's why you directly change a word in the file but can't see the change in Dataset. If you delete/upload some files in your UFS, you can see the changes in Dataset after some time.
What's more, we want to do some optimization on this configuration. Could you tell us more about your usage scenarios?
Strange, I just change a word in a file, then I sync in alluxio client. I can find file changed in alluxio client, while days later no change in fluid dataset.
@Crazybean-lwb Alluxio Fuse caches two types of data: metadata of the file and the actual data in the file. In Fluid, the default readonly configuration of the Alluxio Fuse is
kernel_cache,ro,attr_timeout=7200,entry_timeout=7200, which means Alluxio Fuse will sync the metadata of the file but no data cache invalidation will happen. That's why you directly change a word in the file but can't see the change in Dataset. If you delete/upload some files in your UFS, you can see the changes in Dataset after some time.What's more, we want to do some optimization on this configuration. Could you tell us more about your usage scenarios?
you are right. A compare test:
[ufs]
[dataset]
put some new word in 2023-05-23.txt, while create new file 2023-06-27.txt
minutes later, I can see 2023-06-27.txt in dataset, 2023-05-23.txt no sysnc
A question: When can I know new file sync sucessfully