Better feedback on problems
Describe the feature you'd like to have. It would be good if the operator provided more actionable feedback in the CR status so that it was easier to troubleshoot when things aren't working as expected.
What is the value to the end user? (why is it a priority?) End users would be able to more quickly diagnose what's not working and why so that they can get their environment up and running more quickly.
How will we know we have a good solution? (acceptance criteria)
- The
.status.conditionssection of the Source and Destination CRs should provide information about where in the reconcile sequence the operator is. - Anything else?
Additional context In #66, the operator wasn't able to complete the sync iteration because the snapshot couldn't be created. Unfortunately, there was no error (or status information) generated to indicate that this was the problem. Do the the async nature of kube, it's going to be hard to tell whether something is broken/misconfigured or just taking a while, but by having the operator expose more status information, it can at least narrow down where to look when things aren't progressing as quickly as expected.
Brainstorming a bit:
We have the "Synchronizing" condition:
const (
ConditionSynchronizing string = "Synchronizing"
SynchronizingReasonSync string = "SyncInProgress"
SynchronizingReasonSched string = "WaitingForSchedule"
SynchronizingReasonManual string = "WaitingForManual"
SynchronizingReasonCleanup string = "CleaningUp"
)
While the expressiveness of "reason" is fairly limited (and I don't know that I want to expand the number of reason codes), the free form messages could be more descriptive:
- "Creating keys"
- "Creating Snapshot"
- "Waiting for Job to complete"
- etc.
This was discussed in this week's meeting. Some thoughts/ideas:
- @shawn-hurley mentioned the possibility of having a webhook to validate and reject w/ a message about the misconfiguration
- @alaypatel07 suggested posting events
- This would be very similar to the Pod's model where you get events when things are taking a long time (e.g., volume mount). Like Pod events, we may not be able to say whether it's broken or just taking a long time, but it would expose what's causing the delay.
- @alaypatel07 also suggested adding more Conditions to the RS/RD objects for detectable misconfigurations (like multiple default VSClasses)
@JohnStrunk I am taking look at webhooks and its applications. Please provide any inputs if you have
/close We have been making progress on this in a number of other issues/PRs