feat(k8s): Make RunContainer detect/report more error conditions like image pull errors
This is an alternate implementation of #279 using the SharedInformer infra to watch events introduced in #519.
Closes #279
Overview
The Docker Runtime surfaces image pull errors naturally because of the blocking synchronous requests to pull an image or start a container.
The Kubernetes Runtime has to watch for events that show any errors with pulling images.
RunContainer changes in k8s runtime
Wait until one of these conditions before returning:
- build canceled
- container is running
- got an image pull error
WaitContainer changes in k8s runtime
Wait until one of these conditions before returning:
- build canceled (new)
- container is terminated (was already doing this)
containerTracker additions
I added signal channels for Running and ImagePulled. I also added a channel for receiving image pull errors in RunContainer.
podTracker changes
I need pod Name and Namespace in more places, so I copy those into the podTracker now.
I was surprised that the SharedInformerFactory I used to create the Informer/Lister for watching the build Pod did NOT work for watching events. Apparently, events don't get labeled, so I had to add a separate eventInformerFactory that uses a FieldSelector instead of a LabelSelector when making the list/watch API calls.
I did not rename informerFactory to podInformerFactory because it can easily be used for other k8s resources in the future (if needed), just not for Events.
I also added the Event event handler funcs (wow that was a mouth full - events about Events).
podTracker.inspectContainerEvent() (called by the Event event handlers) is responsible for
- sending the image pull events to
RunContainer - closing the
ImagePulledsignal channel when an event indicates that the pull was successful
I also adjusted podTracker.inspectContainerStatuses() so that it can also:
- surface image pull errors (in case the error is visible there, but not through the Events API - which is unlikely, but there doesn't seem to be any kind of guarantees about event availability, so this covers our bases)
- close the
Runningsignal channel when the container status shows that it is running.
TODO:
- [x] merge #302
- [x] rebase
- [ ] write tests
Codecov Report
Merging #331 (65cbe59) into master (a9cbe48) will increase coverage by
1.01%. The diff coverage is100.00%.
@@ Coverage Diff @@
## master #331 +/- ##
==========================================
+ Coverage 81.18% 82.20% +1.01%
==========================================
Files 68 68
Lines 5273 5472 +199
==========================================
+ Hits 4281 4498 +217
+ Misses 848 831 -17
+ Partials 144 143 -1
| Impacted Files | Coverage Δ | |
|---|---|---|
| runtime/kubernetes/build.go | 88.60% <100.00%> (+1.55%) |
:arrow_up: |
| runtime/kubernetes/container.go | 91.64% <100.00%> (+9.61%) |
:arrow_up: |
| runtime/kubernetes/pod_tracker.go | 97.37% <100.00%> (+1.51%) |
:arrow_up: |
I'm running into some issues with this, so I'm moving it back to draft for now.