pixie icon indicating copy to clipboard operation
pixie copied to clipboard

postgres client trace using golang's pg library is not getting traced on Google GKE

Open harold-kfuse opened this issue 4 years ago • 10 comments

I'm using Stirling standalone mode, which should also captures client-side trace (since all client traffic in this case will be treated as out of cluster).

I simply log the captured pgsql_events. For some reason only trace from the postgres server is getting captured. If I use psql to connect to postgres server and run queries, I see both client and server side trace is getting captured.

For the golang client, I'm using https://github.com/go-pg/pg

Note that on a similar deployment on Amazon EKS, I am able to see both client and server side tracing getting captured.

To reproduce.

  1. deploy a postgres server using bitnami helm. I'm using this:
  • name: postgresql version: 10.3.18 repository: https://charts.bitnami.com/bitnami
  1. Create a sample pg client program (the example here https://pkg.go.dev/github.com/go-pg/pg/v10#section-readme will also do) and deploy it on the GKE cluster.

harold-kfuse avatar Sep 16 '21 00:09 harold-kfuse

Pasted here an example client that simply writes/reads from pg.

package main

import (
	"flag"
	"fmt"
	"time"

	pg "github.com/go-pg/pg/v10"
	"github.com/go-pg/pg/v10/orm"
)

var url = flag.String("url", "", "")
var user = flag.String("user", "postgres", "Postgres Username")
var password = flag.String("password", "", "Postgres password")
var db = flag.String("db", "postgres", "Postgres DB")

type User struct {
	Id     int64
	Name   string
	Emails []string
}

func (u User) String() string {
	return fmt.Sprintf("User<%d %s %v>", u.Id, u.Name, u.Emails)
}

type Story struct {
	Id       int64
	Title    string
	AuthorId int64
	Author   *User `pg:"rel:has-one"`
}

func (s Story) String() string {
	return fmt.Sprintf("Story<%d %s %s>", s.Id, s.Title, s.Author)
}

func createSchema(db *pg.DB) error {
	models := []interface{}{
		(*User)(nil),
		(*Story)(nil),
	}

	for _, model := range models {
		err := db.Model(model).CreateTable(&orm.CreateTableOptions{
			Temp: true,
		})
		if err != nil {
			return err
		}
	}
	return nil
}

func main() {
	flag.Parse()
	db := pg.Connect(&pg.Options{
		Addr:     *url,
		User:     *user,
		Password: *password,
		Database: *db,
	})
	defer db.Close()
	createSchema(db)
	for {
		user1 := &User{
			Name:   "admin",
			Emails: []string{"admin1@admin", "admin2@admin"},
		}
		_, err := db.Model(user1).Insert()
		if err != nil {
			panic(err)
		}

		_, err = db.Model(&User{
			Name:   "root",
			Emails: []string{"root1@root", "root2@root"},
		}).Insert()
		if err != nil {
			panic(err)
		}

		story1 := &Story{
			Title:    "Cool story",
			AuthorId: user1.Id,
		}
		_, err = db.Model(story1).Insert()
		if err != nil {
			panic(err)
		}

		// Select user by primary key.
		user := &User{Id: user1.Id}
		err = db.Model(user).WherePK().Select()
		if err != nil {
			panic(err)
		}

		// Select all users.
		var users []User
		err = db.Model(&users).Select()
		if err != nil {
			panic(err)
		}

		// Select story and associated author in one query.
		story := new(Story)
		err = db.Model(story).
			Relation("Author").
			Where("story.id = ?", story1.Id).
			Select()
		if err != nil {
			panic(err)
		}

		fmt.Println(user)
		fmt.Println(users)
		fmt.Println(story)
		time.Sleep(5)
	}
}

harold-kfuse avatar Sep 16 '21 00:09 harold-kfuse

I uploaded a container with the above program above in dockerhub.

Below is a sample yaml to deploy in kubernetes (run kubectl apply -f pgclient.yaml):

Replace the arguments accordingly. Below assumes that the postgres server has a headless service called postgresql-headless, password is pgpassword and a database called postgres.

apiVersion: v1
kind: Pod
metadata:
  name: pgclient
spec:
  containers:
    - name: pgclient
      image: index.docker.io/haroldkf/pgclient:0.1.0
      imagePullPolicy: Always
      args: ["-url=postgresql-headless.test:5432", "-password=pgpassword", "-db=postgres"]
      resources:
        limits:
          cpu: 500m
          memory: 512Mi
        requests:
          cpu: 100m
          memory: 128M

harold-kfuse avatar Sep 16 '21 01:09 harold-kfuse

@oazizi000

harold-kfuse avatar Sep 16 '21 01:09 harold-kfuse

Update. I think this is related to the node image that comes with GKE. The default node image now is Container-optimized OS with containerd. There are a few other choices in the GKE console. It only worked when I chose Ubuntu with Docker image.

There is also a Container-optimized OS with docker, which also has the same issue as above.

harold-kfuse avatar Sep 17 '21 02:09 harold-kfuse

@harold-kfuse Thanks for reporting this! There was a fix for Container-optimized OS that has been merged into main about security_socket_sendmsg. Would you test with the newest pixie release on your app and let us know if it solves the problem?

chengruizhe avatar Oct 25 '21 21:10 chengruizhe

Hi @chengruizhe ,

I'm still seeing some issue. I can occasionally see both server and client traces now. But sometimes either server or client traces are missing completely. If I restart the pods, then it will eventually start tracing it again.

I remember seeing this issue when I first encountered the issue I mentioned in this PR. I just attribute it to the same issue, but may it's a different issue. When I try to increase log level and add additional logs, I eventually saw that it was due to ParseFailureRate > threshold, so conn tracker gets disabled. I haven't done the same investigation this time around, but symptom seems the same.

harold-kfuse avatar Oct 26 '21 21:10 harold-kfuse

Looks like a parse failure. Tagging @yzhao1012 on this issue. Maybe he could help take a look and debug the issue.

chengruizhe avatar Oct 27 '21 00:10 chengruizhe

Looks like a parse failure. Tagging @yzhao1012 on this issue. Maybe he could help take a look and debug the issue.

@chengruizhe Should I file a new issue for this?

harold-kfuse avatar Oct 27 '21 15:10 harold-kfuse

Thanks @harold-kfuse !

Looks like a parse failure. Tagging @yzhao1012 on this issue. Maybe he could help take a look and debug the issue.

@chengruizhe Should I file a new issue for this?

No need to create new issue, the history is useful for repro.

I eventually saw that it was due to ParseFailureRate > threshold, so conn tracker gets disabled.

Do you have logs at hands? If so, please paste them in this thread. Otherwise, don't need to spend time, I'll try to reproduce it myself.

yzhao1012 avatar Oct 27 '21 18:10 yzhao1012

Thanks @harold-kfuse !

Looks like a parse failure. Tagging @yzhao1012 on this issue. Maybe he could help take a look and debug the issue.

@chengruizhe Should I file a new issue for this?

No need to create new issue, the history is useful for repro.

I eventually saw that it was due to ParseFailureRate > threshold, so conn tracker gets disabled.

Do you have logs at hands? If so, please paste them in this thread. Otherwise, don't need to spend time, I'll try to reproduce it myself.

Thanks. I don't have the logs at hand now.

harold-kfuse avatar Oct 27 '21 18:10 harold-kfuse