danm icon indicating copy to clipboard operation
danm copied to clipboard

Svcwatcher core after losing master/leader

Open TothFerenc opened this issue 5 years ago • 5 comments

Is this a BUG REPORT or FEATURE REQUEST?: bug

What happened: Svcwatcher Pod lost master for any reason, so the process was exiting:

E0704 17:44:12.997128       1 svcwatcher.go:93] Lost master
F0704 17:44:12.997152       1 svcwatcher.go:97] Lost lease
E0704 17:44:12.997232       1 event.go:269] Unable to write event: 'can't create an event with namespace 'default' in namespace 'kube-system'' (may retry after sleeping)
goroutine 1 [running]:
github.com/golang/glog.stacks(0xc000374400, 0xc00028e000, 0x3b, 0x9e)
        /go/src/github.com/nokia/danm/vendor/github.com/golang/glog/glog.go:769 +0xb8
github.com/golang/glog.(*loggingT).output(0x20605c0, 0xc000000003, 0xc00023c0e0, 0x1fce7c9, 0xd, 0x61, 0x0)
        /go/src/github.com/nokia/danm/vendor/github.com/golang/glog/glog.go:720 +0x372
github.com/golang/glog.(*loggingT).println(0x20605c0, 0xc000000003, 0xc00002feb0, 0x1, 0x1)
        /go/src/github.com/nokia/danm/vendor/github.com/golang/glog/glog.go:633 +0xe7
github.com/golang/glog.Fatalln(...)
        /go/src/github.com/nokia/danm/vendor/github.com/golang/glog/glog.go:1141
main.main()
        /go/src/github.com/nokia/danm/cmd/svcwatcher/svcwatcher.go:97 +0x9e4
E0704 17:44:20.320002       1 event.go:269] Unable to write event: 'can't create an event with namespace 'default' in namespace 'kube-system'' (may retry after sleeping)
glog: Flush took longer than 10s

What you expected to happen: No core dump before exit.

How to reproduce it: It happens frequently during deployment.

Anything else we need to know?:

Environment:

  • DANM version (use danm -version):
# /opt/cni/bin/danm -version
2020/07/22 12:31:18 DANM binary was built from release: v4.2.0-0
2020/07/22 12:31:18 DANM binary was built from commit: c0a4c1570845556cf911a46df475c45a85941bb2
  • Kubernetes version (use kubectl version):
# kubectl version
Client Version: version.Info{Major:"1", Minor:"18", GitVersion:"v1.18.4", GitCommit:"c96aede7b5205121079932896c4ad89bb93260af", GitTreeState:"clean", BuildDate:"2020-06-17T11:41:22Z", GoVersion:"go1.13.9", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"18", GitVersion:"v1.18.4", GitCommit:"c96aede7b5205121079932896c4ad89bb93260af", GitTreeState:"clean", BuildDate:"2020-06-17T11:33:59Z", GoVersion:"go1.13.9", Compiler:"gc", Platform:"linux/amd64"}

TothFerenc avatar Jul 22 '20 10:07 TothFerenc

so, this is the 97th line where it cores: https://github.com/nokia/danm/blob/master/cmd/svcwatcher/svcwatcher.go#L97 It is literally a library call without references to any objects I think I have already stated earlier that glog is shite :) maybe the non-newline API wouldn't core, but I absolutely refuse to deep dive into its code. solution is removing the usage of the whole library

the cannot create event remark above is more interesting for me

Levovar avatar Aug 03 '20 15:08 Levovar

reg the Eventing issue: the leader election library creates an event recorder without a namespace defined, so it defaults to default but our component runs in the kube-system, so when we really want to record an event it fails something like: https://github.com/tsuru/remesher/pull/5

which is funny because as far as I can tell the Events are raised using the meta of the provided EndPointsLock: https://github.com/kubernetes/client-go/blob/00dbcca6ee44c678754d3f5fda1bd0e704b26fe2/tools/leaderelection/resourcelock/endpointslock.go#L100, and lo and behold we do set the proper namespace into the lock: https://github.com/nokia/danm/blob/master/cmd/svcwatcher/svcwatcher.go#L74

soo...

Levovar avatar Aug 03 '20 15:08 Levovar

I guess others also have issues with the library :) https://bugzilla.redhat.com/show_bug.cgi?id=1842002

Levovar avatar Aug 03 '20 15:08 Levovar

@TothFerenc any comments on above? I'm kind of on the opinion that this is how stuff works, and we just need to live with it

Levovar avatar Aug 10 '20 11:08 Levovar

Maybe we can create a new TODO issue about log module harmonization (use the same logging engine across all DANM components), and this issue can depend on it. Of couse I will close this issue once client libraries are fixed in the meantime.

TothFerenc avatar Aug 10 '20 12:08 TothFerenc