scrub status request timed out

Open ndTEC opened this issue 3 years ago • 0 comments

Description of problem:

After upgrade from 8.6 to 9.4 the command to get status of bitrot is not working

The exact command to reproduce the issue: gluster volume bitrot gvol0 scrub status

The full output of the command that failed: Error : Request timed out

Expected results: Status of bitrot

Mandatory info: - The output of the gluster volume info command: Volume Name: gvol0 Type: Replicate Volume ID: ebef8839-0c96-48e1-b606-270d14996585 Status: Started Snapshot Count: 0 Number of Bricks: 1 x 4 = 4 Transport-type: tcp Bricks: Brick1: g1:/storage/gluster/gfs_vg/brick1 Brick2: g3:/storage/gluster/gfs_vg/brick1 Brick3: g4:/storage/gluster/gfs_vg/brick1 Brick4: g2:/storage/gluster/gfs_vg/brick1 Options Reconfigured: cluster.server-quorum-type: server cluster.shd-max-threads: 4 network.inode-lru-limit: 50000 performance.io-thread-count: 16 client.event-threads: 8 cluster.lookup-optimize: on disperse.eager-lock: off server.event-threads: 16 performance.write-behind-window-size: 256Mb server.allow-insecure: on auth.allow: 192.168.2.* transport.address-family: inet nfs.disable: on features.cache-invalidation: on features.cache-invalidation-timeout: 600 performance.stat-prefetch: on performance.cache-invalidation: on cluster.readdir-optimize: on performance.parallel-readdir: on performance.cache-size: 256MB performance.io-cache: on performance.md-cache-timeout: 600 performance.client-io-threads: true server.outstanding-rpc-limit: 256 performance.strict-o-direct: on disperse.shd-wait-qlength: 2048 features.bitrot: on features.scrub: Active features.scrub-freq: monthly cluster.server-quorum-ratio: 60%

- The output of the gluster volume status command:

- The output of the gluster volume heal command:

**- Provide logs present on following locations of client and server nodes - /var/log/glusterfs/

**- Is there any crash ? Provide the backtrace and coredump

[2022-09-14 12:14:47.194407 +0000] E [rpc-clnt.c:179:call_bail] 0-gvol0-client-0: bailing out frame type(GlusterFS 4.x v1), op(LOOKUP(27)), xid = 0x19, unique = 27, sent = 2022-09-14 11:44:45.200389 +0000, timeout = 1800 for 192.168.2.66:49152 [2022-09-14 12:14:47.194443 +0000] W [MSGID: 114031] [client-rpc-fops_v2.c:2625:client4_0_lookup_cbk] 0-gvol0-client-0: remote operation failed. [{path=gfid:00000000-0000-0000-0000-000000000008}, {gfid=00000000-0000-0000-0000-000000000008}, {errno=107}, {error=Transport endpoint is not connected}] [2022-09-14 12:14:47.194478 +0000] E [MSGID: 118027] [bit-rot-scrub.c:1622:br_lookup_bad_obj_dir] 0-gvol0-bit-rot-0: failed to lookup the bad objects directory (gfid: 00000000-0000-0000-0000-000000000008 (Transport endpoint is not connected)) [2022-09-14 12:14:47.194510 +0000] I [socket.c:3809:socket_submit_outgoing_msg] 0-socket.glusterfsd: not connected (priv->connected = -1) [2022-09-14 12:14:47.194518 +0000] E [rpcsvc.c:1567:rpcsvc_submit_generic] 0-rpc-service: failed to submit message (XID: 0x3, Program: Gluster Brick operations, ProgVers: 2, Proc: 11) to rpc-transport (socket.glusterfsd) [2022-09-14 12:14:47.194540 +0000] E [glusterfsd-mgmt.c:262:glusterfs_submit_reply] 0-glusterfs: Reply submission failed

Additional info:

- The operating system / glusterfs version: AlmaLinux 8.6 / fluster 9.4 Note: Please hide any confidential data which you don't want to share in public like IP address, file name, hostname or any other configuration

Sep 14 '22 12:09 ndTEC