[ARM platform]Remote snapshotter test failed on pulling image due to microVM time out of sync
On ARM64 platform, remote snapshotter test failed on pulling image due to microVM time out of sync. Failed CI build link. When CI tried to run test TestSnapshotterMetrics_Isolated which will pull an image and unpack in remote snapshotter, it failed with below issue on pulling image(the image pulling fine on ARM host with ctr).
=== RUN TestSnapshotterMetrics_Isolated
metrics_integ_test.go:54:
Error Trace: metrics_integ_test.go:54
Error: Received unexpected error:
Failed to pull image on microVM[0]: failed to extract layer sha256:fdb3c0ecba2ee0b2b39f778f7da3beb4ee4c75f6f4b8a083211b12971fde4ad6: failed to mount /var/lib/containerd/tmpmounts/containerd-mount3440161598: no such file or directory: unknown
Test: TestSnapshotterMetrics_Isolated
Checked the containerd.log for the test, found that there is Error x509: certificate has expired or is not yet valid: current time 2022-08-07T13:25:13Z is before 2023-02-21T00:00:00Z seems the microVM time is not synced right.
{\"key\":\"0/1/extract-101125884-ByGC sha256:fdb3c0ecba2ee0b2b39f778f7da3beb4ee4c75f6f4b8a083211b12971fde4ad6\",\"level\":\"info\",\"mountpoint\":\"/var/lib/containerd-stargz-grpc/snapshotter/snapshots/1/fs\",\"msg\":\"Received status code: 401 Unauthorized. Refreshing creds...\",\"parent\":\"\",\"src\":\"ghcr.io/firecracker-microvm/firecracker-containerd/amazonlinux:latest-esgz/sha256:efc8b66d208d6eaa2e24799081e13c035f25ad585cec5d478845a744f98324b8\",\"time\":\"2022-08-07T13:25:13.491954298Z\"}" jailer=noop runtime=aws.firecracker vmID=0 vmm_stream=stdout
time="2023-04-06T18:05:06.129064764Z" level=debug msg="[ 5.441207] containerd-stargz-grpc[744]: {\"error\":\"Get \\\"https://pkg-containers.githubusercontent.com/ghcr1/blobs/sha256:efc8b66d208d6eaa2e24799081e13c035f25ad585cec5d478845a744f98324b8?se=2023-04-06T18%3A15%3A00Z\\u0026sig=8ccbSSbYFbIlb%2Fr9eh5ptZlKm5W1pkw50WfKqVLTY4A%3D\\u0026sp=r\\u0026spr=https\\u0026sr=b\\u0026sv=2019-12-12\\\": x509: certificate has expired or is not yet valid: current time 2022-08-07T13:25:13Z is before 2023-02-21T00:00:00Z\"
Tried to set the microVM time in the commit, but didn’t make effect, rerun still had old time. Strange part is AMD platform build has the right time in MicroVM.
Guest kernels needs to be compiled with KVM_PTP support as a mechanism for clock sync.
CONFIG_PTP_1588_CLOCK=y
CONFIG_PTP_1588_CLOCK_KVM=y
We can see that all the arm microvm kernel configs are missing CONFIG_PTP_1588_CLOCK_KVM=y incomparison to all the x86 configs. This discrepancy is due to 4.14 arm64 missing the feature which has been upstreamed since 5.3, good discussion here as experienced by kata-containers: https://github.com/kata-containers/packaging/pull/693
We can see that the CI build logs indicate that it failed & was using the 4.14 as well.
default-vmlinux.bin: OK
--
| chmod 0400 default-vmlinux.bin
| _submodules/firecracker/tools/devtool -y build_kernel --config tools/kernel-configs/microvm-kernel-aarch64-4.14.config
The solution for this issue needs 2 parts:
- guest kernel configs need the missing property
- whatever kernel we choose needs to have ptp_kvm commit: https://github.com/torvalds/linux/blob/16a8829130ca22666ac6236178a6233208d425c3/Documentation/virt/kvm/arm/ptp_kvm.rst#L4
@BinSquare Thanks for taking a look at the issue.
The two parts solution makes sense to me. I did try compiling the kernel with
CONFIG_PTP_1588_CLOCK=y
CONFIG_PTP_1588_CLOCK_KVM=y
But since we don't have the ptp_kvm patch, the change did not make any difference.