deployment-automation icon indicating copy to clipboard operation
deployment-automation copied to clipboard

RP tuner fails to complete on RHEL

Open pmw-rp opened this issue 2 years ago • 1 comments

When using a RHEL image on Azure, the tuner fails to make progress and times out after 15 mins.

The RHEL image was defined in vars.tf as follows:

variable "vm_image" {
  description = "Source image reference for the VMs"
  type = object({
    publisher = string
    offer     = string
    sku       = string
    version   = string
  })
  default = {
    publisher = "RedHat"
    offer     = "RHEL"
    sku       = "8-lvm-gen2"
    version   = "latest"
  } 
}

During the issue, we see the following line repeatedly in the logs:

May 25 10:01:21 redpanda0 rpk[32651]: WARN  2023-05-25 10:01:21,464 [shard 0] cluster - cluster_discovery.cc:247 - Error requesting cluster bootstrap info from {host: 10.0.1.4, port: 33145}, retrying. std::__1::system_error (error system:113, No route to host)

However, the port is open:

[root@redpanda0 ~]# netstat -plutan
Active Internet connections (servers and established)
Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name    
tcp        0      0 0.0.0.0:22              0.0.0.0:*               LISTEN      1674/sshd           
tcp        0      0 10.0.1.5:33145          0.0.0.0:*               LISTEN      32651/redpanda      
tcp        0      0 0.0.0.0:5355            0.0.0.0:*               LISTEN      1025/systemd-resolv 
...

This seems to be related to firewalling. After logging in to each broker VM and running sudo systemctl stop firewalld, the playbook ran to completion successfully on retrying.

pmw-rp avatar May 25 '23 11:05 pmw-rp

would be interesting to see what happens with firewalld enabled, and indivudally running the rpk redpanda tuner enabling one at a time and seeing how startup fails.

I bet it's the fstrim one. docs imply that this makes some socket call out to dbus to (but should be happening over a unix socket, not a network socket)

https://docs.redpanda.com/docs/reference/rpk/rpk-redpanda/rpk-redpanda-tune-list/

hcoyote avatar May 25 '23 15:05 hcoyote