TridentBackendConfig doesn't get deleted
Describe the bug A clear and concise description of what the bug is.
Hello,
I am trying to test FSxONTAP filesystem with iSCSI protocol for persistent volumes to deploy Victoria Metrics time series database into EKS cluster. I am following Run containerized applications efficiently using Amazon FSx for NetApp ONTAP and Amazon EKS with some support from AWS. At some point, I tried to delete TridentBackupConfig to start all over again. It seems to get stuck in Deleting phase forever. The documentation does say that it stays in Deleting phase when it has dependent objects. I have uninstalled the workload and tried to delete PVs/PVCs created using this tbc, but it didn’t help. PVCs got deleted and the PVs got stuck in the Terminating state. What else is included in the backend components? Should I be deleting FSxONTAP filesystem itself for me to be able to clean up the tbc? What if I can’t afford to lose my persistent volumes? Is FSxONTAP+iSCSI recommended for the workloads like Victoria Metrics database deployed into the EKS clusters?
kt get tbc
NAME BACKEND NAME BACKEND UUID PHASE STATUS
backend-fsx-ontap-san backend-fsx-ontap-san 949563cb-6717-4455-a778-7fb16c906630 Deleting Success
kt get tbc backend-fsx-ontap-san -o yaml
apiVersion: trident.netapp.io/v1
kind: TridentBackendConfig
metadata:
creationTimestamp: "2023-12-03T20:07:27Z"
deletionGracePeriodSeconds: 0
deletionTimestamp: "2023-12-04T21:17:33Z"
finalizers:
- trident.netapp.io
generation: 2
name: backend-fsx-ontap-san
namespace: trident
resourceVersion: "1049263"
uid: d835c761-9317-4171-bc22-540b9d5ce864
spec:
credentials:
name: backend-fsx-ontap-san-secret
managementLIF: 198.19.255.172
storageDriverName: ontap-san
svm: ekssvm
version: 1
status:
backendInfo:
backendName: backend-fsx-ontap-san
backendUUID: 949563cb-6717-4455-a778-7fb16c906630
deletionPolicy: delete
lastOperationStatus: Success
message: 'Backend is in a deleting state, cannot proceed with the TridentBackendConfig
deletion. '
phase: Deleting
ko get pv
NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE
pvc-0462c1f0-0a54-43c5-8b1b-7bd3fa6fb205 50Gi RWO Delete Terminating observability/mysql-volume fsx-basic-block 3d2h
pvc-d2ea4c54-e23d-4a95-b35e-68fd85989937 50Gi RWO Delete Terminating observability/vmstorage-volume-victoria-metrics-cluster-vmstorage-0 fsx-basic-block 2d21h
pvc-ef602c9c-4a27-4d6d-a542-55e470d2553f 50Gi RWO Delete Terminating observability/vmstorage-volume-victoria-metrics-cluster-vmstorage-1 fsx-basic-block 2d21h
Environment Provide accurate information about the environment to help us reproduce the issue.
- Trident version: 23.07.1
- Trident installation flags used: helm install trident -n trident --create-namespace trident-installer/helm/trident-operator-23.07.1.tgz
- Container runtime: containerd://1.6.19
- Kubernetes version: 1.25
- Kubernetes orchestrator: EKS
- Kubernetes enabled feature gates: [e.g. CSINodeInfo]
- OS: Amazon Linux 2
- NetApp backend types: ONTAP
- Other:
To Reproduce kubectl delete tbc backend-fsx-ontap-san
Expected behavior A clear and concise description of what you expected to happen. backend-fsx-ontap-san should. be deleted
Additional context Add any other context about the problem here.
The PVs stuck in Terminating are probably the dependency that keeps the TridentBackend from deleting. Can you do a kubectl describe on one of them to see if there is anything helpful on why they are stuck?
Besides that, Trident support multiple backends in parallel. So even the current one still is in Deleting state, you can just add a new backend (or multiple, if you like).
Thanks for looking into it @wonderland. I don't see anything popping out from the describe output. I did notice that creating another backend with a different name does work. It is just that leaving some objects in hung state makes me nervous about the health of the system.
k describe pv pvc-d2ea4c54-e23d-4a95-b35e-68fd85989937
Name: pvc-d2ea4c54-e23d-4a95-b35e-68fd85989937
Labels: <none>
Annotations: pv.kubernetes.io/provisioned-by: csi.trident.netapp.io
volume.kubernetes.io/provisioner-deletion-secret-name:
volume.kubernetes.io/provisioner-deletion-secret-namespace:
Finalizers: [external-attacher/csi-trident-netapp-io]
StorageClass: fsx-basic-block
Status: Terminating (lasts 3d5h)
Claim: observability/vmstorage-volume-victoria-metrics-cluster-vmstorage-0
Reclaim Policy: Delete
Access Modes: RWO
VolumeMode: Filesystem
Capacity: 50Gi
Node Affinity: <none>
Message:
Source:
Type: CSI (a Container Storage Interface (CSI) volume source)
Driver: csi.trident.netapp.io
FSType: ext4
VolumeHandle: pvc-d2ea4c54-e23d-4a95-b35e-68fd85989937
ReadOnly: false
VolumeAttributes: backendUUID=949563cb-6717-4455-a778-7fb16c906630
internalName=trident_pvc_d2ea4c54_e23d_4a95_b35e_68fd85989937
name=pvc-d2ea4c54-e23d-4a95-b35e-68fd85989937
protocol=block
storage.kubernetes.io/csiProvisionerIdentity=1701633261584-3547-csi.trident.netapp.io
Events: <none>
Every kubernetes PV has an associated "tvol" custom kubernetes resource created in 'trident' namespace.
"oc describe tvol ..." could give you hints.
Another location to look at it is the trident-controller logs.
That finalizer Finalizers: [external-attacher/csi-trident-netapp-io] would be what's holding it up.
@sontivr it looks like for some reason a PV was stranded, and that is holding you back from deleting the TBC. As @wonderland mentioned, you could just go ahead and create a new TBC to get around this. To clean the old TBC up, you will need to remove the finalizer (@jamessevener, thank you :)). Before you do that, please make sure that this PV is not associated with a PVC and is not being used by a workload. That should help you resolve your issue.