Svcwatcher core after losing master/leader
Is this a BUG REPORT or FEATURE REQUEST?: bug
What happened: Svcwatcher Pod lost master for any reason, so the process was exiting:
E0704 17:44:12.997128 1 svcwatcher.go:93] Lost master
F0704 17:44:12.997152 1 svcwatcher.go:97] Lost lease
E0704 17:44:12.997232 1 event.go:269] Unable to write event: 'can't create an event with namespace 'default' in namespace 'kube-system'' (may retry after sleeping)
goroutine 1 [running]:
github.com/golang/glog.stacks(0xc000374400, 0xc00028e000, 0x3b, 0x9e)
/go/src/github.com/nokia/danm/vendor/github.com/golang/glog/glog.go:769 +0xb8
github.com/golang/glog.(*loggingT).output(0x20605c0, 0xc000000003, 0xc00023c0e0, 0x1fce7c9, 0xd, 0x61, 0x0)
/go/src/github.com/nokia/danm/vendor/github.com/golang/glog/glog.go:720 +0x372
github.com/golang/glog.(*loggingT).println(0x20605c0, 0xc000000003, 0xc00002feb0, 0x1, 0x1)
/go/src/github.com/nokia/danm/vendor/github.com/golang/glog/glog.go:633 +0xe7
github.com/golang/glog.Fatalln(...)
/go/src/github.com/nokia/danm/vendor/github.com/golang/glog/glog.go:1141
main.main()
/go/src/github.com/nokia/danm/cmd/svcwatcher/svcwatcher.go:97 +0x9e4
E0704 17:44:20.320002 1 event.go:269] Unable to write event: 'can't create an event with namespace 'default' in namespace 'kube-system'' (may retry after sleeping)
glog: Flush took longer than 10s
What you expected to happen: No core dump before exit.
How to reproduce it: It happens frequently during deployment.
Anything else we need to know?:
Environment:
- DANM version (use
danm -version):
# /opt/cni/bin/danm -version
2020/07/22 12:31:18 DANM binary was built from release: v4.2.0-0
2020/07/22 12:31:18 DANM binary was built from commit: c0a4c1570845556cf911a46df475c45a85941bb2
- Kubernetes version (use
kubectl version):
# kubectl version
Client Version: version.Info{Major:"1", Minor:"18", GitVersion:"v1.18.4", GitCommit:"c96aede7b5205121079932896c4ad89bb93260af", GitTreeState:"clean", BuildDate:"2020-06-17T11:41:22Z", GoVersion:"go1.13.9", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"18", GitVersion:"v1.18.4", GitCommit:"c96aede7b5205121079932896c4ad89bb93260af", GitTreeState:"clean", BuildDate:"2020-06-17T11:33:59Z", GoVersion:"go1.13.9", Compiler:"gc", Platform:"linux/amd64"}
so, this is the 97th line where it cores: https://github.com/nokia/danm/blob/master/cmd/svcwatcher/svcwatcher.go#L97 It is literally a library call without references to any objects I think I have already stated earlier that glog is shite :) maybe the non-newline API wouldn't core, but I absolutely refuse to deep dive into its code. solution is removing the usage of the whole library
the cannot create event remark above is more interesting for me
reg the Eventing issue: the leader election library creates an event recorder without a namespace defined, so it defaults to default but our component runs in the kube-system, so when we really want to record an event it fails something like: https://github.com/tsuru/remesher/pull/5
which is funny because as far as I can tell the Events are raised using the meta of the provided EndPointsLock: https://github.com/kubernetes/client-go/blob/00dbcca6ee44c678754d3f5fda1bd0e704b26fe2/tools/leaderelection/resourcelock/endpointslock.go#L100, and lo and behold we do set the proper namespace into the lock: https://github.com/nokia/danm/blob/master/cmd/svcwatcher/svcwatcher.go#L74
soo...
I guess others also have issues with the library :) https://bugzilla.redhat.com/show_bug.cgi?id=1842002
@TothFerenc any comments on above? I'm kind of on the opinion that this is how stuff works, and we just need to live with it
Maybe we can create a new TODO issue about log module harmonization (use the same logging engine across all DANM components), and this issue can depend on it. Of couse I will close this issue once client libraries are fixed in the meantime.