glusterfs Glusterfs does not heal sparse files correctly and fills up whole new brick on disperse volume after failed brick reset

Description of problem: Gluster heal process fills up whole free space on replaced brick of disperse volume, if there are sparse files in volume.

The exact command to reproduce the issue:

create new disperse volume (tested with 4+2), e.g. with gluster volume create vol1 disperse-data 4 redundancy 2 transport tcp node1:/gluster/nvme1/brick node1:/gluster/nvme2/brick node2:/gluster/nvme1/brick node2:/gluster/nvme2/brick node3:/gluster/nvme1/brick node3:/gluster/nvme2/brick
place some sparse files on volume, e.g. with cp -avp --sparse=always source destination-vol1/
reset a brick, e.g. with gluster volume reset-brick vol1 node3:/gluster/nvme1/brick start and gluster volume reset-brick vol1 node3:/gluster/nvme1/brick node3:/gluster/nvme1/brick commit force

Actual results: The volume starts healing, but sparse files on new brick appear as their real size corresponds to apparent sparse file size, eventually filling up whole brick. Besides that, such volume starts reporting wrong size with df when mounted. Healing process never ends, leaving some files unhealed, and new brick reports No space left on device (see brick log fragment bellow).

Expected results: Dispersed volume with sparse files on it should be correctly healed after brick reset.

Mandatory info: - The output of the gluster volume info command:

Volume Name: vol1
Type: Disperse
Volume ID: ***
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x (4 + 2) = 6
Transport-type: tcp
Bricks:
Brick1: node1:/gluster/nvme1/brick
Brick2: node1:/gluster/nvme2/brick
Brick3: node2:/gluster/nvme1/brick
Brick4: node2:/gluster/nvme2/brick
Brick5: node3:/gluster/nvme1/brick
Brick6: node3:/gluster/nvme2/brick
Options Reconfigured:
cluster.server-quorum-type: none
storage.health-check-interval: 600
storage.health-check-timeout: 30
auth.allow: ***
nfs.disable: on
transport.address-family: inet
storage.fips-mode-rchecksum: on
features.cache-invalidation: on
network.ping-timeout: 5
server.allow-insecure: on
network.remote-dio: disable
client.event-threads: 8
server.event-threads: 8
performance.io-thread-count: 8
cluster.eager-lock: enable
cluster.locking-scheme: granular
performance.low-prio-threads: 32
cluster.data-self-heal-algorithm: full
performance.client-io-threads: off
cluster.lookup-optimize: off
performance.readdir-ahead: off
cluster.readdir-optimize: off
cluster.enable-shared-storage: enable

- The output of the gluster volume status command:

Status of volume: vol1
Gluster process                              TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick node1:/gluster/nvme1/brick             49168     0          Y       1239337
Brick node1:/gluster/nvme2/brick             49169     0          Y       1239344
Brick node2:/gluster/nvme1/brick             49168     0          Y       1363957
Brick node2:/gluster/nvme2/brick             49169     0          Y       1363964
Brick node3:/gluster/nvme1/brick             49157     0          Y       848916
Brick node3:/gluster/nvme2/brick             49158     0          Y       848923
Self-heal Daemon on localhost                N/A       N/A        Y       848936
Self-heal Daemon on node1                    N/A       N/A        Y       1239357
Self-heal Daemon on node2                    N/A       N/A        Y       1363977

Task Status of Volume vol1
------------------------------------------------------------------------------
There are no active volume tasks

- The output of the gluster volume heal command:

Launching heal operation to perform index self heal on volume vol1 has been successful
Use heal info commands to check status.

**- Provide logs present on following locations of client and server nodes - /var/log/glusterfs/bricks/gluster-nvme1-brick.log

[2022-05-19 17:23:17.588485 +0000] W [dict.c:1532:dict_get_with_ref] (-->/usr/lib64/glusterfs/9.1/xlator/features/index.so(+0x3bdc) [0x7f0915443bdc] -->/lib64/libglusterfs.so.0(dict_get_str+0x3c) [0x7f0924c5318c] -->/lib64/libglusterfs.so.0(dict_get_with_ref+0x85) [0x7f0924c519b5] ) 0-dict: dict OR key (link-count) is NULL [Invalid argument]
[2022-05-19 17:23:17.601320 +0000] E [MSGID: 113072] [posix-inode-fd-ops.c:2068:posix_writev] 0-vol1-posix: write failed: offset 0, [No space left on device]
[2022-05-19 17:23:17.601396 +0000] E [MSGID: 115067] [server-rpc-fops_v2.c:1324:server4_writev_cbk] 0-vol1-server: WRITE info [{frame=12201148}, {WRITEV_fd_no=3}, {uuid_utoa=***
-dc3d-4041-8e11-835327df299c}, {client=CTX_ID:***-GRAPH_ID:4-PID:1027276-HOST:my-host-name.cz-PC_NAME:vol1-client-4-RECON_NO:-0}, {error-xlator=vol1-posix}, {errno=28}, {error=No space left on device}]

**- Is there any crash ? Provide the backtrace and coredump No

Additional info: I am also concerned about very high PIDs, even short after node restart, but that may not be related.

- The operating system / glusterfs version: centos 8 / glusterfs 9.1 and also tested on glusterfs 9.4

May 22 '22 09:05 strzinek

bump

Jul 31 '22 17:07 strzinek

{error=No space left on device}], not sure if you still need this but it seems you have run out of space

Dec 28 '22 11:12 david-peters-aitch2o

Yes, that is the result of this reported error.

Dec 28 '22 11:12 strzinek

Thank you for your contributions. Noticed that this issue is not having any activity in last ~6 months! We are marking this issue as stale because it has not had recent activity. It will be closed in 2 weeks if no one responds with a comment here.

Aug 12 '23 14:08 stale[bot]

The error still persists and I think it is serious. I am losing confidence in the future of gluster.

Sep 21 '23 20:09 strzinek

There was a fix that might be related - https://github.com/gluster/glusterfs/issues/2317 - I wonder which version this is still happening.

Sep 24 '23 08:09 mykaul