cosmo icon indicating copy to clipboard operation
cosmo copied to clipboard

High memory utilization after router upgrade to 0.91.0

Open PeteMac88 opened this issue 1 year ago • 10 comments

Component(s)

router

Component version

0.91.0

wgc version

latest

controlplane version

latest

router version

0.91.0

What happened?

After upgrading from router version 0.84.4 we observe much higher memory utilization of the router instances. Initially after the upgrade all instances were OOMKilled because the pod hit the memory limits. We are running the router in a k8s cluster so here are the resources before the update which were fine for the older router version:

resources:
   requests:
        cpu: 400m
        memory: 256Mi
   limits:
        cpu: 1000m
        memory: 600Mi

Now memory utilization increased to around 1000Mi. Do you have an explanation for this?

Environment information

Environment

Kubernetes 1.25

Router configuration

No response

Router execution config

No response

Log output

No response

Additional context

No response

PeteMac88 avatar Jun 05 '24 06:06 PeteMac88

WunderGraph commits fully to Open Source and we want to make sure that we can help you as fast as possible. The roadmap is driven by our customers and we have to prioritize issues that are important to them. You can influence the priority by becoming a customer. Please contact us here.

github-actions[bot] avatar Jun 05 '24 06:06 github-actions[bot]

We're taking a look

jensneuse avatar Jun 05 '24 08:06 jensneuse

Hi @PeteMac88 Could you check with the 0.90.0 version?

devsergiy avatar Jun 05 '24 08:06 devsergiy

Hey @devsergiy, es definitely higher as the 0.84.4 version but not as high as 0.91.0. Here is a table with the average utilization with same load:

0.84.4 ~ 350Mi 0.90.0 ~ 700 Mi 0.91.0. ~ 1200Mi

PeteMac88 avatar Jun 05 '24 09:06 PeteMac88

Thanks a lot

Could you also check 0.88.0?

We already know the reasons why it changes in 0.90.1 But it is a bit of a surprise that it also spikes before that on 0.90.0

devsergiy avatar Jun 05 '24 11:06 devsergiy

@devsergiy Any update here? I needed to downgrade the router yesterday because of the increasing memory consumption.

PeteMac88 avatar Jun 14 '24 08:06 PeteMac88

@PeteMac88 We are aware of the issue, and we will address it in due course.

Thank you for your patience!

Aenimus avatar Jun 17 '24 08:06 Aenimus

Moreover, were you able to check 0.88.0 as requested @PeteMac88?

Aenimus avatar Jun 17 '24 08:06 Aenimus

Are you able to build the router from source like so:

go build -tags=pprof main.go

Then run the router with your regular traffic and run the following command:

go tool pprof http://localhost:6060/debug/pprof/heap

You need to have the go toolchain installed for this.

Once you've got a heap profile, can you upload it here for investigation? Thanks

jensneuse avatar Jun 20 '24 08:06 jensneuse

I also ran into this problem when upgrading to 0.95.6 this week. I observed unbounded memory growth when load testing the router with our batch requests > 100 list items and ~ 150 fields across 2 subgraphs (No deep nesting & max 2 levels).

I did not run into this problem with the same queries that only returned a single item.

Reverting to 0.88.0 definitely helpled and the containers use less memory now.

jfroundjian avatar Jul 18 '24 15:07 jfroundjian

@PeteMac88 @jfroundjian please reopen the issue if the problem persists with the latest router [email protected]. Thank you!

Aenimus avatar Aug 16 '24 20:08 Aenimus