gluster-kubernetes icon indicating copy to clipboard operation
gluster-kubernetes copied to clipboard

some bricks is offline after gluster container recreate

Open toyangdon opened this issue 7 years ago • 4 comments

When I recreate gluster container,some bricks in this node is offline. I find something in glusterd.log:

[2018-11-13 02:01:34.895633] I [glusterd-utils.c:5962:glusterd_brick_start] 0-management: starting a fresh brick process for brick /var/lib/heketi/mounts/vg_9c9fdf12cba6212fd6ccb05bfd270c41/brick_1024c2e1e5d792d11db3d178da54cca1/brick
[2018-11-13 02:01:34.899988] I [rpc-clnt.c:1044:rpc_clnt_connection_init] 0-management: setting frame-timeout to 600
[2018-11-13 02:01:34.900198] W [socket.c:3216:socket_connect] 0-management: Error disabling sockopt IPV6_V6ONLY: "Operation not supported"
[2018-11-13 02:01:37.330378] I [glusterd-utils.c:5962:glusterd_brick_start] 0-management: starting a fresh brick process for brick /var/lib/heketi/mounts/vg_9c9fdf12cba6212fd6ccb05bfd270c41/brick_cb20e11db86329774d7366eb5d632ef9/brick
[2018-11-13 02:01:37.333050] I [rpc-clnt.c:1044:rpc_clnt_connection_init] 0-management: setting frame-timeout to 600
[2018-11-13 02:01:39.170395] I [glusterd-utils.c:5868:glusterd_brick_start] 0-management: discovered already-running brick /var/lib/heketi/mounts/vg_9c9fdf12cba6212fd6ccb05bfd270c41/brick_fd34cfc29f41624e4f1e76ed061f9dad/brick
[2018-11-13 02:01:39.170437] I [MSGID: 106143] [glusterd-pmap.c:282:pmap_registry_bind] 0-pmap: adding brick /var/lib/heketi/mounts/vg_9c9fdf12cba6212fd6ccb05bfd270c41/brick_fd34cfc29f41624e4f1e76ed061f9dad/brick on port 49173

Some brick is considered as "already-running brick ",but they are not rumming in fact. I try to remove volume mount "glusterfs-run",and then I find all of bricks become normal after recreate gluster container.
What the use of the volume mount "glusterfs-run" in glusterfs-daemonset.yaml ? Can I remove it?

toyangdon avatar Nov 14 '18 07:11 toyangdon

We're running into this same problem. Does anyone know the purpose of the glusterfs-run volume with an emptyDir type: https://github.com/gluster/gluster-kubernetes/blob/master/deploy/kube-templates/glusterfs-daemonset.yaml#L101?

kcao3 avatar Mar 14 '19 15:03 kcao3

We are also observing this issue, running on gluster 4.1.7.

It claims it found an already-running brick and that brick is never started. The containers that rely on it hang or enter a crash loop. It's pretty easy to reproduce, you just have to hard-power off the VM hosting gluster or use a kill -9 on the gluster processes.

Should this issue be moved to bugzilla? https://bugzilla.redhat.com/

bischoje avatar May 17 '19 19:05 bischoje

Same for me (gluster 4.1.7). Any news on this issue?

ghost avatar Jul 18 '19 22:07 ghost

could it be that pidfile of the brick has a pid which was points to some other running process? If this is still reproducible, can you check the pid ?

kinsu avatar Jan 09 '20 15:01 kinsu