fluid icon indicating copy to clipboard operation
fluid copied to clipboard

[BUG] can not sync dataset use alluxio runtime

Open Crazybean-lwb opened this issue 2 years ago • 9 comments

What is your environment(Kubernetes version, Fluid version, etc.)

k8s version: 1.20.11 fluid version: 0.8.6-2131f34

Describe the bug

I create Dataset based on nfs path, when I use alluxio runtime , there exist no way to sync Dataset

What you expect to happen:

When I update data under nfs path, there exist way sync Dataset by hand or automatic.

How to reproduce it

Create Dataset based on nfs path, when data under nfs path changed , you will find no sync in Dataset

Additional Information

alluxio bug or fluid bug?

Crazybean-lwb avatar Jun 12 '23 11:06 Crazybean-lwb

I tried to config auto sync in alluxio master, errors:

image

sync successfully by hand, however no sync in Dataset image

Crazybean-lwb avatar Jun 12 '23 11:06 Crazybean-lwb

@Crazybean-lwb From my understanding, Alluxio now only supports ActiveSync for HDFS (Refering to this doc). So when mounting a UFS of type local, ActiveSync cannot be enabled.

For more information, you can make a feature request in Alluxio community.

TrafalgarZZZ avatar Jun 14 '23 07:06 TrafalgarZZZ

doc

ok👌 I will check later. Another problem: sync successfully by hand, however no sync in Dataset eg: alluxio fs ls -R -Dalluxio.user.file.metadata.sync.interval=0 /dir

Crazybean-lwb avatar Jun 14 '23 12:06 Crazybean-lwb

@Crazybean-lwb Did you mean Fluid Dataset's status (e.g. UfsTotalSize, FileNum) should be updated?

TrafalgarZZZ avatar Jun 15 '23 05:06 TrafalgarZZZ

ok👌 I will check later. Another problem: sync successfully by hand, however no sync in Dataset eg: alluxio fs ls -R -Dalluxio.user.file.metadata.sync.interval=0 /dir

@Crazybean-lwb If "no sync in Dataset" means you can't see the changes in your Application Pod, this is becase Alluxio Fuse uses Kernel Cache to cache the metadata. So it needs some time to see the changes in your Application Pod, you can try again.

And this is the document about the Alluxio Fuse Metadata Cache: Metadata Cache

zhang-x-z avatar Jun 20 '23 07:06 zhang-x-z

yes,fluid dataset should sync togethere with ufs

Crazybean-lwb avatar Jun 25 '23 11:06 Crazybean-lwb

ok👌 I will check later. Another problem: sync successfully by hand, however no sync in Dataset eg: alluxio fs ls -R -Dalluxio.user.file.metadata.sync.interval=0 /dir

@Crazybean-lwb If "no sync in Dataset" means you can't see the changes in your Application Pod, this is becase Alluxio Fuse uses Kernel Cache to cache the metadata. So it needs some time to see the changes in your Application Pod, you can try again.

And this is the document about the Alluxio Fuse Metadata Cache: Metadata Cache

Strange, I just change a word in a file, then I sync in alluxio client. I can find file changed in alluxio client, while days later no change in fluid dataset.

Crazybean-lwb avatar Jun 25 '23 11:06 Crazybean-lwb

Strange, I just change a word in a file, then I sync in alluxio client. I can find file changed in alluxio client, while days later no change in fluid dataset.

@Crazybean-lwb Alluxio Fuse caches two types of data: metadata of the file and the actual data in the file. In Fluid, the default readonly configuration of the Alluxio Fuse is kernel_cache,ro,attr_timeout=7200,entry_timeout=7200, which means Alluxio Fuse will sync the metadata of the file but no data cache invalidation will happen. That's why you directly change a word in the file but can't see the change in Dataset. If you delete/upload some files in your UFS, you can see the changes in Dataset after some time.

What's more, we want to do some optimization on this configuration. Could you tell us more about your usage scenarios?

zhang-x-z avatar Jun 26 '23 06:06 zhang-x-z

Strange, I just change a word in a file, then I sync in alluxio client. I can find file changed in alluxio client, while days later no change in fluid dataset.

@Crazybean-lwb Alluxio Fuse caches two types of data: metadata of the file and the actual data in the file. In Fluid, the default readonly configuration of the Alluxio Fuse is kernel_cache,ro,attr_timeout=7200,entry_timeout=7200, which means Alluxio Fuse will sync the metadata of the file but no data cache invalidation will happen. That's why you directly change a word in the file but can't see the change in Dataset. If you delete/upload some files in your UFS, you can see the changes in Dataset after some time.

What's more, we want to do some optimization on this configuration. Could you tell us more about your usage scenarios?

you are right. A compare test: [ufs] image

[dataset] image put some new word in 2023-05-23.txt, while create new file 2023-06-27.txt minutes later, I can see 2023-06-27.txt in dataset, 2023-05-23.txt no sysnc

A question: When can I know new file sync sucessfully

Crazybean-lwb avatar Jun 27 '23 04:06 Crazybean-lwb