pd
pd copied to clipboard
Enhance go leak check
Enhancement Task
for example, we will meet goroutine leak which top stack is runtime_pollWait which resulted from dashboard
Goroutine 1362 in state IO wait, with internal/poll.runtime_pollWait on top of the stack:
goroutine 1362 [IO wait]:
internal/poll.runtime_pollWait(0x14dc55908, 0x72)
/opt/homebrew/opt/go/libexec/src/runtime/netpoll.go:343 +0xa0
internal/poll.(*pollDesc).wait(0x14008d8e580?, 0x0?, 0x0)
/opt/homebrew/opt/go/libexec/src/internal/poll/fd_poll_runtime.go:84 +0x28
internal/poll.(*pollDesc).waitRead(...)
/opt/homebrew/opt/go/libexec/src/internal/poll/fd_poll_runtime.go:89
internal/poll.(*FD).Accept(0x14008d8e580)
/opt/homebrew/opt/go/libexec/src/internal/poll/fd_unix.go:611 +0x250
net.(*netFD).accept(0x14008d8e580)
/opt/homebrew/opt/go/libexec/src/net/fd_unix.go:172 +0x28
net.(*TCPListener).accept(0x140088d1840)
/opt/homebrew/opt/go/libexec/src/net/tcpsock_posix.go:152 +0x28
net.(*TCPListener).Accept(0x140088d1840)
/opt/homebrew/opt/go/libexec/src/net/tcpsock.go:315 +0x2c
github.com/pingcap/tidb-dashboard/pkg/tidb.(*proxy).run(0x14008d9e1e0, {0x104f79778?, 0x14007fd5c20})
/Users/pingcap/go/pkg/mod/github.com/pingcap/[email protected]/pkg/tidb/proxy.go:227 +0x37c
created by github.com/pingcap/tidb-dashboard/pkg/tidb.(*Forwarder).Start in goroutine 1296
/Users/pingcap/go/pkg/mod/github.com/pingcap/[email protected]/pkg/tidb/forwarder.go:57 +0x1d4
besides we will meet goroutine leak error which top stack is runtime_pollWait as well, but the root case is go.etcd.io/etcd/pkg/transport.timeoutConn.Read which is different with dashboard
internal/poll.runtime_pollWait(0x10f8cc2a0, 0x72)
/opt/homebrew/opt/go/libexec/src/runtime/netpoll.go:343 +0xa0
internal/poll.(*pollDesc).wait(0x140016ac400?, 0x1400159a000?, 0x0)
/opt/homebrew/opt/go/libexec/src/internal/poll/fd_poll_runtime.go:84 +0x28
internal/poll.(*pollDesc).waitRead(...)
/opt/homebrew/opt/go/libexec/src/internal/poll/fd_poll_runtime.go:89
internal/poll.(*FD).Read(0x140016ac400, {0x1400159a000, 0x1000, 0x1000})
/opt/homebrew/opt/go/libexec/src/internal/poll/fd_unix.go:164 +0x200
net.(*netFD).Read(0x140016ac400, {0x1400159a000?, 0x14000436160?, 0x140013dcc60?})
/opt/homebrew/opt/go/libexec/src/net/fd_posix.go:55 +0x28
net.(*conn).Read(0x140016b81c0, {0x1400159a000?, 0x14001141bd8?, 0x1?})
/opt/homebrew/opt/go/libexec/src/net/net.go:179 +0x34
go.etcd.io/etcd/pkg/transport.timeoutConn.Read({{0x106cf7120?, 0x140016b81c0?}, 0x14001141c78?, 0x1041334cc?}, {0x1400159a000?, 0x104133074?, 0x1400039b068?})
/Users/pingcap/go/pkg/mod/go.etcd.io/[email protected]/pkg/transport/timeout_conn.go:43 +0xa8
net/http.(*persistConn).Read(0x140013dcc60, {0x1400159a000?, 0x104133570?, 0x140013caf00?})
/opt/homebrew/opt/go/libexec/src/net/http/transport.go:1954 +0x50
bufio.(*Reader).fill(0x14001073d40)
/opt/homebrew/opt/go/libexec/src/bufio/bufio.go:113 +0xf8
bufio.(*Reader).Peek(0x14001073d40, 0x1)
/opt/homebrew/opt/go/libexec/src/bufio/bufio.go:151 +0x60
net/http.(*persistConn).readLoop(0x140013dcc60)
/opt/homebrew/opt/go/libexec/src/net/http/transport.go:2118 +0x14c
created by net/http.(*Transport).dialConn in goroutine 316
/opt/homebrew/opt/go/libexec/src/net/http/transport.go:1776 +0x1144
When we use top stack to ignore, it results in dashboard errors not being exposed. A better way to treat these two issues is to:
- wait for the etcd timeout (because the goroutine is still exiting)
- Troubleshooting the dashboard So we need a more fine-grained to check