Calico/VPP Cannot run on servers without AVX512 instruction set.
Environment
- Calico/VPP version: 3.23
- Kubernetes version: 1.24.4
- Deployment type: bare-metal
- Network configuration: Calico
Issue description Calico/VPP Cannot run on servers without AVX512 instruction set.
Calico/VPP logs
kubectl logs -n calico-vpp-dataplane calico-vpp-node-2gk5k
Defaulted container "vpp" out of: vpp, agent
time="2023-12-01T02:10:51Z" level=info msg="-- Environment --"
time="2023-12-01T02:10:51Z" level=info msg="CorePattern: /var/lib/vpp/vppcore.%e.%p"
time="2023-12-01T02:10:51Z" level=info msg="ExtraAddrCount: 0"
time="2023-12-01T02:10:51Z" level=info msg="RxMode: adaptive"
time="2023-12-01T02:10:51Z" level=info msg="TapRxMode: adaptive"
time="2023-12-01T02:10:51Z" level=info msg="Tap MTU override: 0"
time="2023-12-01T02:10:51Z" level=info msg="Service CIDRs: [10.96.0.0/12]"
time="2023-12-01T02:10:51Z" level=info msg="Tap Queue Size: rx:1024 tx:1024"
time="2023-12-01T02:10:51Z" level=info msg="PHY Queue Size: rx:1024 tx:1024"
time="2023-12-01T02:10:51Z" level=info msg="Hugepages 16"
time="2023-12-01T02:10:51Z" level=info msg="KernelVersion 5.15.0-73"
time="2023-12-01T02:10:51Z" level=info msg="Drivers map[uio_pci_generic:%!s(bool=false) vfio-pci:%!s(bool=true)]"
time="2023-12-01T02:10:51Z" level=info msg="vfio iommu: false"
time="2023-12-01T02:10:51Z" level=info msg="-- Interface Spec --"
time="2023-12-01T02:10:51Z" level=info msg="Interface Name: ens8"
time="2023-12-01T02:10:51Z" level=info msg="Native Driver: dpdk"
time="2023-12-01T02:10:51Z" level=info msg="vppIpConfSource: linux"
time="2023-12-01T02:10:51Z" level=info msg="New Drive Name: "
time="2023-12-01T02:10:51Z" level=info msg="PHY target #Queues rx:1 tx:1"
time="2023-12-01T02:10:51Z" level=info msg="-- Interface config --"
time="2023-12-01T02:10:51Z" level=info msg="Node IP4: 192.168.3.7/24"
time="2023-12-01T02:10:51Z" level=info msg="Node IP6: "
time="2023-12-01T02:10:51Z" level=info msg="PciId: 0000:49:00.0"
time="2023-12-01T02:10:51Z" level=info msg="Driver: ice"
time="2023-12-01T02:10:51Z" level=info msg="Linux IF was up ? true"
time="2023-12-01T02:10:51Z" level=info msg="Promisc was on ? false"
time="2023-12-01T02:10:51Z" level=info msg="DoSwapDriver: false"
time="2023-12-01T02:10:51Z" level=info msg="Mac: 40:a6:b7:9e:e1:90"
time="2023-12-01T02:10:51Z" level=info msg="Addresses: [192.168.3.7/24 ens8,fe80::42a6:b7ff:fe9e:e190/64]"
time="2023-12-01T02:10:51Z" level=info msg="Routes: [{Ifindex: 16 Dst: fe80::/64 Src: <nil> Gw: <nil> Flags: [] Table: 254}, {Ifindex: 16 Dst: 192.168.3.0/24 Src: 192.168.3.7 Gw: <nil> Flags: [] Table: 254}]"
time="2023-12-01T02:10:51Z" level=info msg="PHY original #Queues rx:288 tx:288"
time="2023-12-01T02:10:51Z" level=info msg="MTU 1500"
time="2023-12-01T02:10:51Z" level=info msg="isTunTap false"
time="2023-12-01T02:10:51Z" level=info msg="isVeth false"
time="2023-12-01T02:10:51Z" level=info msg="Running with uplink dpdk"
time="2023-12-01T02:10:51Z" level=info msg="deleting Route {Ifindex: 16 Dst: fe80::/64 Src: <nil> Gw: <nil> Flags: [] Table: 254}"
time="2023-12-01T02:10:51Z" level=info msg="deleting Route {Ifindex: 16 Dst: 192.168.3.0/24 Src: 192.168.3.7 Gw: <nil> Flags: [] Table: 254}"
time="2023-12-01T02:10:51Z" level=info msg="VPP started [PID 3681601]"
time="2023-12-01T02:10:51Z" level=info msg="Waiting for VPP... [0/10]"
/usr/bin/vpp[3681601]: tls_init_ca_chain:609: Could not initialize TLS CA certificates
/usr/bin/vpp[3681601]: tls_mbedtls_init:644: failed to initialize TLS CA chain
/usr/bin/vpp[3681601]: tls_init_ca_chain:976: Could not initialize TLS CA certificates
/usr/bin/vpp[3681601]: tls_openssl_init:1050: failed to initialize TLS CA chain
time="2023-12-01T02:10:53Z" level=info msg="Waiting for VPP... [1/10]"
time="2023-12-01T02:10:55Z" level=info msg="Waiting for VPP... [2/10]"
time="2023-12-01T02:10:57Z" level=info msg="Waiting for VPP... [3/10]"
time="2023-12-01T02:10:59Z" level=info msg="Waiting for VPP... [4/10]"
time="2023-12-01T02:11:01Z" level=warning msg="Waiting for VPP... [5/10] cannot connect to VPP on socket /var/run/vpp/vpp-api.sock: VPP API socket file /var/run/vpp/vpp-api.sock does not exist"
time="2023-12-01T02:11:02Z" level=info msg="Received signal child exited, vpp index 1"
time="2023-12-01T02:11:02Z" level=info msg="VPP exited:true status:0 signaled:false"
time="2023-12-01T02:11:02Z" level=info msg="Done with signal child exited"
time="2023-12-01T02:11:03Z" level=warning msg="Waiting for VPP... [6/10] cannot connect to VPP on socket /var/run/vpp/vpp-api.sock: VPP API socket file /var/run/vpp/vpp-api.sock does not exist"
time="2023-12-01T02:11:05Z" level=warning msg="Waiting for VPP... [7/10] cannot connect to VPP on socket /var/run/vpp/vpp-api.sock: VPP API socket file /var/run/vpp/vpp-api.sock does not exist"
time="2023-12-01T02:11:07Z" level=warning msg="Waiting for VPP... [8/10] cannot connect to VPP on socket /var/run/vpp/vpp-api.sock: VPP API socket file /var/run/vpp/vpp-api.sock does not exist"
time="2023-12-01T02:11:09Z" level=warning msg="Waiting for VPP... [9/10] cannot connect to VPP on socket /var/run/vpp/vpp-api.sock: VPP API socket file /var/run/vpp/vpp-api.sock does not exist"
time="2023-12-01T02:11:11Z" level=error msg="Error connecting to VPP (SIGINT -1): Cannot connect to VPP after 10 tries"
time="2023-12-01T02:11:11Z" level=info msg="Terminating Vpp 1 (SIGINT)"
time="2023-12-01T02:11:11Z" level=info msg="Restoring configuration"
time="2023-12-01T02:11:11Z" level=info msg="Received signal interrupt, vpp index 1"
time="2023-12-01T02:11:11Z" level=info msg="Signaled vpp (PID -1) interrupt"
time="2023-12-01T02:11:11Z" level=info msg="Done with signal interrupt"
Using systemctl
Using systemd-networkd
time="2023-12-01T02:11:13Z" level=info msg="restoring address 192.168.3.7/24 ens8"
time="2023-12-01T02:11:13Z" level=info msg="restoring address fe80::42a6:b7ff:fe9e:e190/64"
time="2023-12-01T02:11:13Z" level=info msg="restoring route {Ifindex: 16 Dst: fe80::/64 Src: <nil> Gw: <nil> Flags: [] Table: 254}"
time="2023-12-01T02:11:13Z" level=info msg="restoring routes : {Ifindex: 17 Dst: fe80::/64 Src: <nil> Gw: <nil> Flags: [] Table: 254} already exists"
time="2023-12-01T02:11:13Z" level=info msg="restoring route {Ifindex: 16 Dst: 192.168.3.0/24 Src: 192.168.3.7 Gw: <nil> Flags: [] Table: 254}"
time="2023-12-01T02:11:13Z" level=info msg="restoring routes : {Ifindex: 17 Dst: 192.168.3.0/24 Src: 192.168.3.7 Gw: <nil> Flags: [] Table: 254} already exists"
time="2023-12-01T02:11:13Z" level=info msg="calico-vpp-pid file doesn't exist. Agent probably not started"
time="2023-12-01T02:11:13Z" level=info msg="Timeout : SIGKILL vpp 1"
time="2023-12-01T02:11:13Z" level=info msg="Received signal killed, vpp index 1"
time="2023-12-01T02:11:13Z" level=info msg="Signaled vpp (PID -1) killed"
time="2023-12-01T02:11:13Z" level=info msg="Done with signal killed"
time="2023-12-01T02:11:14Z" level=error msg="VPP run failed with Error running VPP: cannot connect to VPP after 10 tries"
kubectl logs -n calico-vpp-dataplane calico-vpp-node-j6l7s -c agent
2023/12/01 02:53:44 File Content:
2023/12/01 02:53:44 Error reading file:%!(EXTRA *fs.PathError=open /var/run/vpp/vppmanagerlinuxmtu: no such file or directory)
time="2023-12-01T02:54:04Z" level=fatal msg="Error loading configuration: Vpp-host mtu not ready after 20 tries"
I dived into Calico/VPP code find the reason is VPP cannot start up, so I guess VPP cannot start up on server without AVX512 instruction set.
Hi @Huxianying,
How did you conclude that VPP failed to start due to absence of AVX512 instruction set? Could you share details of the bare metal server that you are using? Lastly, you seem to be using an older version of Calico/VPP.
I deployed VPP(version 22.02, which same with Calico/VPP 3.23 VPP version) on a bare-metal without AVX512 instruction set support, and I noticed three differences in the logs compared to a bare-metal with AVX512 support.
with AVX512 support:
without AVX512 support:
we can see from above two pictures:
- intel_uncore_init: no uncore units found
- topdown-level2: not supported
- memory-stalls: not supported
And i noticed in VPP code:
So i guess that it is possible that topdown and memory utilizes avx512 instruction set.
So i guess that it is possible that topdown and memory utilizes avx512 instruction set.
Yes, going by the logs it certainly seems that way but I am not sure if that is reason enough for vpp to fail to start. Could you share the entire vpp logs? Are you able to reproduce it consistently on that server? Also, could you share the bare-metal server details so I can try and reproduce?