PD url in discovery service need to be fix
Bug Report
What version of Kubernetes are you using?
Client Version: version.Info{Major:"1", Minor:"22", GitVersion:"v1.22.17", GitCommit:"a7736eaf34d823d7652415337ac0ad06db9167fc", GitTreeState:"clean", BuildDate:"2022-12-08T11:41:04Z", GoVersion:"go1.16.15", Compiler:"gc", Platform:"darwin/amd64"} Server Version: version.Info{Major:"1", Minor:"23+", GitVersion:"v1.23.16-airbnb0", GitCommit:"3051216402e96974819f485e35143cbe2fdf43a8", GitTreeState:"clean", BuildDate:"2023-02-07T20:29:26Z", GoVersion:"go1.19.5", Compiler:"gc", Platform:"linux/amd64"}
What version of TiDB Operator are you using?
TiDB Operator Version: version.Info{GitVersion:"v0.0.0-master+$Format:%h$", GitCommit:"d9f634343ef71926b43c85df17458fb55dc955b8", GitTreeState:"clean", BuildDate:"2023-02-15T05:18:23Z", GoVersion:"go1.18.10", Compiler:"gc", Platform:"linux/amd64"}
What storage classes exist in the Kubernetes cluster and what are used for PD/TiKV pods?
N/A What's the status of the TiDB cluster pods?
N/A What did you do?
https://github.com/pingcap/tidb-operator/blob/master/pkg/discovery/discovery.go#L271 need to be fixed and include not only PeerMembers but Members as well.
And for https://github.com/pingcap/tidb-operator/blob/master/pkg/discovery/discovery.go#L260, it needs to actually connect to the pd and fetch the member instead of just construct a url from spec
What did you expect to see?
What did you see instead?
In the discovery.go when it's generating the list of PD urls it only includes the PeerMembers but not local member. We are deploying tidb across 3 k8s clusters. Does it make sense to include the local PD in the VerifyPDEndpoint response?
var returnPDMember string returnPDMembers := []string{pdURL} for _, peerPDMember := range tc.Status.PD.PeerMembers { if peerPDMember.Health { if len(pdEndpoint.scheme) == 0 { peerPDEndpoint := parsePDURL(peerPDMember.ClientURL) returnPDMember = fmt.Sprintf("%s:%s", peerPDEndpoint.pdMemberName, peerPDEndpoint.pdMemberPort) } else { returnPDMember = peerPDMember.ClientURL } returnPDMembers = append(returnPDMembers, returnPDMember) } }
Also we have Heterogeneous tidb and tikv cluster, currently discovery is generating to url from spec, this gave us wrong pd url in our use case. Does it make sense to actually connect to PD and fetch the PD member?
// if local pd doesn't exist, return target cluster pd peer addr if tc.Heterogeneous() && tc.WithoutLocalPD() { addr := controller.PDPeerFullyDomain(tc.Spec.Cluster.Name, tc.Spec.Cluster.Namespace, tc.Spec.Cluster.ClusterDomain) if pdEndpoint.scheme != "" { addr = fmt.Sprintf("%s://%s", pdEndpoint.scheme, addr) } addr = addr + ":" + pdEndpoint.pdMemberPort return addr, nil }
https://github.com/pingcap/tidb-operator/blob/master/pkg/discovery/discovery.go#L271 need to be fixed and include not only
PeerMembersbutMembersas well.
Members should already be included in returnPDMembers := []string{pdURL} where pdURL is the URL to the local Service.
And for https://github.com/pingcap/tidb-operator/blob/master/pkg/discovery/discovery.go#L260, it needs to actually connect to the pd and fetch the member instead of just construct a url from spec this gave us wrong pd url in our use case
Can you show the case with the wrong PD URL? Yeah, connecting and fetching members may be better.
Members should already be included in returnPDMembers := []string{pdURL} where pdURL is the URL to the local Service.
For example, we are running tidb in 3 k8s cluster 1a, 1b and 1e. This is from 1e. When render the TiKV start script it uses https://mussel-prod-pd:2379 when it call verifyPdendPoint which is the pdUrl in returnPDMembers := []string{pdURL}
/tikv-server --pd=https://mussel-prod-pd:2379,https://mussel-prod-pd-0.mussel-prod-pd-peer.tidb-mussel-prod.svc.us-east-1a.mussel-prod.tidb.musta.ch:2379,https://mussel-prod-pd-0.mussel-prod-pd-peer.tidb-mussel-prod.svc.us-east-1b.mussel-prod.tidb.musta.ch:2379,https://mussel-prod-pd-1.mussel-prod-pd-peer.tidb-mussel-prod.svc.us-east-1a.mussel-prod.tidb.musta.ch:2379,https://mussel-prod-pd-1.mussel-prod-pd-peer.tidb-mussel-prod.svc.us-east-1b.mussel-prod.tidb.musta.ch:2379 --advertise-addr=mussel-prod-tikv-0.mussel-prod-tikv-peer.tidb-mussel-prod.svc.us-east-1e.mussel-prod.tidb.musta.ch:20160 --addr=0.0.0.0:20160 --status-addr=0.0.0.0:20180 --advertise-status-addr=mussel-prod-tikv-0.mussel-prod-tikv-peer.tidb-mussel-prod.svc.us-east-1e.mussel-prod.tidb.musta.ch:20180 --data-dir=/var/lib/tikv --capacity=0 --config=/etc/tikv/tikv.toml
@csuzhangxc
Can you show the case with the wrong PD URL? Yeah, connecting and fetching members may be better. For example we have a heterogeneous tidb cluster which just contains tidb instance. So we have two tc in this case
- mussel-prod
- mussel-prod-heterogeneous
addr := controller.PDPeerFullyDomain(tc.Spec.Cluster.Name, tc.Spec.Cluster.Namespace, tc.Spec.Cluster.ClusterDomain)
so the tc.Spec.Cluster.Name is mussel-prod-heterogeneous which is wrong since PD is in tc mussel-prod. Does this make sense to you @csuzhangxc ?
Members should already be included in returnPDMembers := []string{pdURL} where pdURL is the URL to the local Service.
For example, we are running tidb in 3 k8s cluster 1a, 1b and 1e. This is from 1e. When render the TiKV start script it uses
https://mussel-prod-pd:2379when it call verifyPdendPoint which is the pdUrl inreturnPDMembers := []string{pdURL}/tikv-server --pd=https://mussel-prod-pd:2379,https://mussel-prod-pd-0.mussel-prod-pd-peer.tidb-mussel-prod.svc.us-east-1a.mussel-prod.tidb.musta.ch:2379,https://mussel-prod-pd-0.mussel-prod-pd-peer.tidb-mussel-prod.svc.us-east-1b.mussel-prod.tidb.musta.ch:2379,https://mussel-prod-pd-1.mussel-prod-pd-peer.tidb-mussel-prod.svc.us-east-1a.mussel-prod.tidb.musta.ch:2379,https://mussel-prod-pd-1.mussel-prod-pd-peer.tidb-mussel-prod.svc.us-east-1b.mussel-prod.tidb.musta.ch:2379 --advertise-addr=mussel-prod-tikv-0.mussel-prod-tikv-peer.tidb-mussel-prod.svc.us-east-1e.mussel-prod.tidb.musta.ch:20160 --addr=0.0.0.0:20160 --status-addr=0.0.0.0:20180 --advertise-status-addr=mussel-prod-tikv-0.mussel-prod-tikv-peer.tidb-mussel-prod.svc.us-east-1e.mussel-prod.tidb.musta.ch:20180 --data-dir=/var/lib/tikv --capacity=0 --config=/etc/tikv/tikv.toml
@csuzhangxc
is there any problem when using https://mussel-prod-pd:2379 in 1e (for its local member)?
Can you show the case with the wrong PD URL? Yeah, connecting and fetching members may be better. For example we have a heterogeneous tidb cluster which just contains tidb instance. So we have two tc in this case
- mussel-prod
- mussel-prod-heterogeneous
addr := controller.PDPeerFullyDomain(tc.Spec.Cluster.Name, tc.Spec.Cluster.Namespace, tc.Spec.Cluster.ClusterDomain)
so the
tc.Spec.Cluster.Nameismussel-prod-heterogeneouswhich is wrong since PD is in tcmussel-prod. Does this make sense to you @csuzhangxc ?
you mean, tc.Spec.Cluster.Name in mussel-prod is mussel-prod-heterogeneous, but not tc.Spec.Cluster.Name in mussel-prod-heterogeneous is mussel-prod?
if PD is in mussel-prod:
- for
mussel-prod, I thinktc.Heterogeneous() && tc.WithoutLocalPD()will be forfalse, and it will usepdURL - for
mussel-prod-heterogeneous, it should settc.Spec.Cluster.Nameasmussel-prod, and then useaddr := controller.PDPeerFullyDomain(tc.Spec.Cluster.Name, tc.Spec.Cluster.Namespace, tc.Spec.Cluster.ClusterDomain)tomussel-prod