glusterfs icon indicating copy to clipboard operation
glusterfs copied to clipboard

Left-over DHT linkfile's

Open jkroonza opened this issue 4 years ago • 1 comments

Description of problem:

In some scenario we've been unable to figure out (suspected race condition) our glusterfs 8.5 host (yes we're aware this is old, but it's stable for us, with minimal issues, nor can we find anything in the release notes for newer versions that seem to address this), containing mostly maildir structured data, ends up with files indicating heal requirements. This is invariably for what I believe to be DHT linkfile entries.

We're never able to locate relevant gfid files on any other bricks and we suspect a race condition whereby the file is removed at some point prior to linkfile creation, or similar. We suspect:

1 file gets created by process. 2 file gets renamed. 3 file gets removed by another process. 4 linkfile's are created by another (possibly rename) process.

What is unclear is what operations are triggering these, and how concurrent access affects this. We're fairly certain this is a race condition, and we're actually OK with these files getting created, as long as they can get cleaned up again by the heal process. In this case when the file needs to heal (which it never does) it should be noticed that it's a linkfile, and that the linked to file is no longer present, thus allowing the linkfile to be cleaned up.

A few out of hand questions:

  1. Is it safe to simply delete these linkfiles (assuming their link count is 1).

  2. Is it safe to remove the xattrop files (indicating heal required, will this happen automatically if 1 is done).

  3. Is the underlying problem fixed in a newer version of glusterfs, could you potentially point me at where?

Expected results:

We expect these bogus linkfiles to either never be created, or for the heal process to clean them up.

Mandatory info: - The output of the gluster volume info command:

Volume Name: gv_mail
Type: Distributed-Replicate
Volume ID: fdb043f4-f290-4ca3-b747-8c02d917fa75
Status: Started
Snapshot Count: 0
Number of Bricks: 2 x 2 = 4
Transport-type: tcp
Bricks:
Brick1: bagheera:/mnt/gluster/mail-a
Brick2: uriel:/mnt/gluster/mail-b
Brick3: uriel:/mnt/gluster/mail-a
Brick4: bagheera:/mnt/gluster/mail-b
Options Reconfigured:
cluster.granular-entry-heal: enable
cluster.data-self-heal-algorithm: full
cluster.locking-scheme: granular
cluster.self-heal-daemon: enable
performance.write-behind: off
performance.client-io-threads: on
nfs.disable: on
transport.address-family: inet
cluster.metadata-self-heal: off
cluster.entry-self-heal: off
cluster.data-self-heal: off
performance.readdir-ahead: on
performance.cache-size: 256MB
server.event-threads: 4
client.event-threads: 4
performance.open-behind: off
diagnostics.brick-log-level: WARNING
diagnostics.client-log-level: WARNING

- The output of the gluster volume status command:

Status of volume: gv_mail
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick bagheera:/mnt/gluster/mail-a          49152     0          Y       26741
Brick uriel:/mnt/gluster/mail-b             49152     0          Y       5099 
Brick uriel:/mnt/gluster/mail-a             49155     0          Y       3019 
Brick bagheera:/mnt/gluster/mail-b          49153     0          Y       26740
Self-heal Daemon on localhost               N/A       N/A        Y       26762
Self-heal Daemon on uriel                   N/A       N/A        Y       20028
 
Task Status of Volume gv_mail
------------------------------------------------------------------------------
There are no active volume tasks

- The output of the gluster volume heal command:

bagheera [12:05:09] /mnt/gluster # gluster volume heal gv_mail
Launching heal operation to perform index self heal on volume gv_mail has been successful 
Use heal info commands to check status.
bagheera [12:05:16] /mnt/gluster # gluster volume heal gv_mail info
Brick bagheera:/mnt/gluster/mail-a
Status: Connected
Number of entries: 0

Brick uriel:/mnt/gluster/mail-b
Status: Connected
Number of entries: 0

Brick uriel:/mnt/gluster/mail-a
<gfid:917d15bd-490d-471b-8bbf-2765e73ccdb0> 
<gfid:065eeef2-0fde-4479-ad0d-b6c3bba54e63> 
<gfid:cdd95656-5e3f-47af-b355-56889bb50912> 
<gfid:e8a58e55-ee9a-491a-b2b9-e566f514d2e2> 
<gfid:08797d26-afb4-4cd8-b9c2-bc3dbe51fcdb> 
Status: Connected
Number of entries: 5

Brick bagheera:/mnt/gluster/mail-b
<gfid:a136ec67-369c-4110-855f-be9c35d8ad60> 
<gfid:f9be89c4-1f58-4d35-8ad4-f6e27a87a37c> 
<gfid:b04a1f91-3f04-4951-af69-19d53821245f> 
<gfid:86d25dc6-c204-42dd-b0e6-cf11d1cbaf33> 
<gfid:645dbd48-a75b-4734-a998-20775562bdef> 
<gfid:215f56ba-87cc-4ad4-b1bc-0ae90db6d9d2> 
Status: Connected
Number of entries: 6

**- Provide logs present on following locations of client and server nodes - /var/log/glusterfs/

These are too big, and I have no idea about time frames that's sensible here.

**- Is there any crash ? Provide the backtrace and coredump

No crash.

Additional info:

Note the available files:

bagheera [12:09:29] /mnt/gluster # for f in 917d15bd-490d-471b-8bbf-2765e73ccdb0 065eeef2-0fde-4479-ad0d-b6c3bba54e63 cdd95656-5e3f-47af-b355-56889bb50912 e8a58e55-ee9a-491a-b2b9-e566f514d2e2 08797d26-afb4-4cd8-b9c2-bc3dbe51fcdb 136ec67-369c-4110-855f-be9c35d8ad60 f9be89c4-1f58-4d35-8ad4-f6e27a87a37c b04a1f91-3f04-4951-af69-19d53821245f 86d25dc6-c204-42dd-b0e6-cf11d1cbaf33 645dbd48-a75b-4734-a998-20775562bdef 215f56ba-87cc-4ad4-b1bc-0ae90db6d9d2; do ls -l */.glusterfs/${f:0:2}/${f:2:2}/${f}; done
ls: cannot access '*/.glusterfs/91/7d/917d15bd-490d-471b-8bbf-2765e73ccdb0': No such file or directory
ls: cannot access '*/.glusterfs/06/5e/065eeef2-0fde-4479-ad0d-b6c3bba54e63': No such file or directory
ls: cannot access '*/.glusterfs/cd/d9/cdd95656-5e3f-47af-b355-56889bb50912': No such file or directory
ls: cannot access '*/.glusterfs/e8/a5/e8a58e55-ee9a-491a-b2b9-e566f514d2e2': No such file or directory
ls: cannot access '*/.glusterfs/08/79/08797d26-afb4-4cd8-b9c2-bc3dbe51fcdb': No such file or directory
ls: cannot access '*/.glusterfs/13/6e/136ec67-369c-4110-855f-be9c35d8ad60': No such file or directory
---------T 1 mail mail 0 Feb  3  2021 mail-b/.glusterfs/f9/be/f9be89c4-1f58-4d35-8ad4-f6e27a87a37c
---------T 1 mail mail 0 Aug  6 18:13 mail-b/.glusterfs/b0/4a/b04a1f91-3f04-4951-af69-19d53821245f
---------T 1 mail mail 0 Jun  6  2021 mail-b/.glusterfs/86/d2/86d25dc6-c204-42dd-b0e6-cf11d1cbaf33
---------T 1 mail mail 0 Oct 13 01:32 mail-b/.glusterfs/64/5d/645dbd48-a75b-4734-a998-20775562bdef
---------T 1 mail mail 0 Aug 17 17:54 mail-b/.glusterfs/21/5f/215f56ba-87cc-4ad4-b1bc-0ae90db6d9d2
uriel [12:10:07] /mnt/gluster # for f in 917d15bd-490d-471b-8bbf-2765e73ccdb0 065eeef2-0fde-4479-ad0d-b6c3bba54e63 cdd95656-5e3f-47af-b355-56889bb50912 e8a58e55-ee9a-491a-b2b9-e566f514d2e2 08797d26-afb4-4cd8-b9c2-bc3dbe51fcdb 136ec67-369c-4110-855f-be9c35d8ad60 f9be89c4-1f58-4d35-8ad4-f6e27a87a37c b04a1f91-3f04-4951-af69-19d53821245f 86d25dc6-c204-42dd-b0e6-cf11d1cbaf33 645dbd48-a75b-4734-a998-20775562bdef 215f56ba-87cc-4ad4-b1bc-0ae90db6d9d2; do ls -l */.glusterfs/${f:0:2}/${f:2:2}/${f}; done
---------T 1 mail mail 0 Oct 28 19:57 mail-a/.glusterfs/91/7d/917d15bd-490d-471b-8bbf-2765e73ccdb0
---------T 1 mail mail 0 Jul 22  2021 mail-a/.glusterfs/06/5e/065eeef2-0fde-4479-ad0d-b6c3bba54e63
---------T 1 mail mail 0 Oct 16  2020 mail-a/.glusterfs/cd/d9/cdd95656-5e3f-47af-b355-56889bb50912
---------T 1 mail mail 0 Sep 29 19:00 mail-a/.glusterfs/e8/a5/e8a58e55-ee9a-491a-b2b9-e566f514d2e2
---------T 1 mail mail 0 Oct  5 17:22 mail-a/.glusterfs/08/79/08797d26-afb4-4cd8-b9c2-bc3dbe51fcdb
ls: cannot access '*/.glusterfs/13/6e/136ec67-369c-4110-855f-be9c35d8ad60': No such file or directory
ls: cannot access '*/.glusterfs/f9/be/f9be89c4-1f58-4d35-8ad4-f6e27a87a37c': No such file or directory
ls: cannot access '*/.glusterfs/b0/4a/b04a1f91-3f04-4951-af69-19d53821245f': No such file or directory
ls: cannot access '*/.glusterfs/86/d2/86d25dc6-c204-42dd-b0e6-cf11d1cbaf33': No such file or directory
ls: cannot access '*/.glusterfs/64/5d/645dbd48-a75b-4734-a998-20775562bdef': No such file or directory
ls: cannot access '*/.glusterfs/21/5f/215f56ba-87cc-4ad4-b1bc-0ae90db6d9d2': No such file or directory
# file: mail-a/.glusterfs/cd/d9/cdd95656-5e3f-47af-b355-56889bb50912
trusted.afr.gv_mail-client-3=0x000000000000000100000000
trusted.gfid=0xcdd956565e3f47afb35556889bb50912
trusted.gfid2path.b81ecffa01ec7ea1=0x63376139396137632d643435612d346531372d383666372d3164613937373665353433332f313630323830373533332e483536393632395033313532302e6761726d722e696577632e636f2e7a612c533d3130363437
trusted.glusterfs.dht.linkto=0x67765f6d61696c2d7265706c69636174652d3000
trusted.glusterfs.mdata=0x010000000000000000000000005f88e6ed00000000243c2dd9000000005f88e6ed00000000243c2dd9000000005f88e6ed00000000243c2dd9

From what I understand is the dht.linkto attribute points to a specific replica pair where the actual file can be located. Based on the ls commands above, this isn't actually the case since the GFID no longer exist on those bricks.

Theory is further based on the following:

bagheera [12:17:42] /mnt/gluster # find */.glusterfs -perm 01000
...
mail-a/.glusterfs/00/00/0000fea0-0c87-4f97-a527-69b597dc9d2d
...
bagheera [12:17:46] /mnt/gluster # ls -lah */.glusterfs/00/00/0000fea0-0c87-4f97-a527-69b597dc9d2d
---------T 2 mail mail   0 Feb 26  2020 mail-a/.glusterfs/00/00/0000fea0-0c87-4f97-a527-69b597dc9d2d
-rw------- 2 mail mail 38K Feb 26  2020 mail-b/.glusterfs/00/00/0000fea0-0c87-4f97-a527-69b597dc9d2d
bagheera [12:18:09] /mnt/gluster # getfattr -m . -d -e hex mail-a/.glusterfs/00/00/0000fea0-0c87-4f97-a527-69b597dc9d2d
# file: mail-a/.glusterfs/00/00/0000fea0-0c87-4f97-a527-69b597dc9d2d
trusted.gfid=0x0000fea00c874f97a52769b597dc9d2d
trusted.gfid2path.fd538547e4744689=0x63633632393234362d346531392d343737632d393763662d6634353136646236336631322f313538323732313931342e483534353336395032363738382e6761726d722e696577632e636f2e7a612c533d3338343135
trusted.glusterfs.dht.linkto=0x67765f6d61696c2d7265706c69636174652d3100
trusted.glusterfs.mdata=0x010000000000000000000000005e566b7a0000000022fc7059000000005e566b7a0000000022fc7059000000005e566b7a0000000022fc7059

Where it can be plainly seen that there is a related file. What is different here is that the link count for the linkfile is 2 as well. So using another strategy to find these remnant files:

uriel [12:20:08] /mnt/gluster # find */.glusterfs -perm 01000 -links 1
mail-a/.glusterfs/00/c4/00c4d673-6bfe-48c2-b4af-59797a61d51d
mail-a/.glusterfs/7e/fd/7efd344a-5f25-4a9e-9e55-d0a3a05abbdb
mail-a/.glusterfs/7e/6e/7e6e6794-fbd1-41e9-8e0a-09df5dab1133
mail-a/.glusterfs/99/71/9971edf0-9c5c-4c20-a3f4-ea5b6c75d8db
mail-a/.glusterfs/6f/c2/6fc2c8cf-7384-4a9e-9232-00e1e3ba10cf
mail-a/.glusterfs/78/ad/78adaf24-698e-4a14-bc04-6e3a9fb926ac
mail-a/.glusterfs/c5/1f/c51f4e10-9d4a-4e2d-bb9f-37c985466d16
mail-a/.glusterfs/92/e5/92e52778-1561-4829-b125-0d6b46846494
mail-a/.glusterfs/60/b4/60b4b6de-cfa0-40c4-9647-62aa9760250e
mail-a/.glusterfs/60/c0/60c07eb6-eaa0-411c-aada-579bba1215ad
mail-a/.glusterfs/98/dc/98dcf110-af25-4bde-b0ba-cdf0fba20fed
... many more.

bagheera [12:19:28] /mnt/gluster # find */.glusterfs -perm 01000 -links 1
mail-a/.glusterfs/7e/6f/7e6f26fe-3ae1-489b-a8c3-9424ab0b09b1
find: ‘mail-a/.glusterfs/49/f0/49f0a22f-de27-4286-bce0-f3ce5d5e0ecd’: No such file or directory
mail-a/.glusterfs/b2/c4/b2c43192-6ff8-4871-bf89-18ea6e7364c3
mail-a/.glusterfs/ca/62/ca62bad5-3622-440e-b58d-247e979b02a3
mail-a/.glusterfs/8e/04/8e0419ea-ad6c-4e5e-98f7-8b576b750f87
mail-a/.glusterfs/c9/b6/c9b66d89-ce68-43c4-b538-79c21284ac77
mail-a/.glusterfs/05/30/053048f3-6407-4cf5-9a38-6df2de4443df
mail-b/.glusterfs/53/1c/531ce709-c243-4e7a-8283-40dcbdc1c99a
... many more

Respectively that's 149 on bagheera and 138 on uriel. Using a small loop to check for what we believe to be the broken state (above was piped into ~/glusterfs_linkfiles.txt):

bagheera [12:29:06] /mnt/gluster # exec 3<~/glusterfs_linkfiles.txt; while read F<&3; do FL=(*/${F#mail-?/}); [ "${#FL[@]}" -ne 1 ] && echo "${F} is OK"; done
bagheera [12:29:57] /mnt/gluster # 

uriel [12:26:36] /mnt/gluster # exec 3<~/glusterfs_linkfiles.txt; while read F<&3; do FL=(*/${F#mail-?/}); [ "${#FL[@]}" -ne 1 ] && echo "${F} is OK"; done
uriel [12:30:08] /mnt/gluster # 

Note: inverse test as we expected most of these to be broken so we'd rather know about those that are OK instead. Ideally we'd want to check cross-server however, due to 2 x 2 we expect any linked to files to be present locally as well.

It can be seen that over time a few has built up which didn't even get flagged by heal required (Either that or someone merely rm'ed the xattrop files whenever this cropped up).

uriel [12:30:08] /mnt/gluster # find */.glusterfs/?? -type f -links 1 | wc -l
141
bagheera [12:33:57] /mnt/gluster # find */.glusterfs/?? -type f -links 1 | wc -l
151

So there are other files too that this happens with:

bagheera [12:36:48] /mnt/gluster # find */.glusterfs/?? -type f -links 1 ! -perm 01000 | xargs ls -l
-rw-r--r-- 1 mail mail 714755 May 31  2017 mail-a/.glusterfs/60/8b/608b28f2-3e33-427c-969e-e9eb5fc9bb1a
-rw-r--r-- 1 mail mail     11 Dec 31  2020 mail-a/.glusterfs/9a/49/9a49c85b-e81c-44f7-8ad3-df93f0cddfe0

uriel [12:35:50] /mnt/gluster # find */.glusterfs/?? -type f -links 1 ! -perm 01000 | xargs ls -l
-rw-r--r-- 1 mail mail 74571 Dec 27  2017 mail-a/.glusterfs/24/1e/241ebb2f-ede6-44d7-a9c1-5b587f7cdf01
-rw-r--r-- 1 mail mail 45782 Dec 26  2020 mail-a/.glusterfs/d3/c9/d3c96409-9122-48db-951b-728d50515c61
-rw-r--r-- 1 mail mail    11 Dec 31  2020 mail-b/.glusterfs/9a/49/9a49c85b-e81c-44f7-8ad3-df93f0cddfe0

These were few enough to cross-server check and in 2/5 cases a gfid file did exist on the replica brick of the other server, also with a link count of 1.

- The operating system / glusterfs version:

Gentoo Linux, glusterfs version 8.5. Upgrades are viable but we'd prefer to clean out the heal status first.

jkroonza avatar Feb 02 '22 10:02 jkroonza

Thank you for your contributions. Noticed that this issue is not having any activity in last ~6 months! We are marking this issue as stale because it has not had recent activity. It will be closed in 2 weeks if no one responds with a comment here.

stale[bot] avatar Sep 21 '22 00:09 stale[bot]

Closing this issue as there was no update since my last update on issue. If this is an issue which is still valid, feel free to open it.

stale[bot] avatar Oct 22 '22 18:10 stale[bot]