OpenSearch icon indicating copy to clipboard operation
OpenSearch copied to clipboard

OpenSearch Data Nodes memory exhaustion after upgrade from 2.9 to 2.12 (JDK 21 upgrade)

Open rlevytskyi opened this issue 2 years ago • 27 comments

Describe the bug

Hello OpenSearch Team, We’ve just updated our OpenSearch cluster from version 2.9.0 to 2.12.0. Among other issues, we’ve noticed that Opensearch is now consuming waaay more memory than previous version, i.e. it became unusable with the same configuration, even after providing it with 15% more RAM. To make it responsive again, we had to close many indices.

Related component

Other

To Reproduce

  1. Have a 2.9 cluster of 4 data nodes with 112GB of Xmx RAM and 13.6 TB of storage
  2. Fill it with 5500 indices (mostly small of 1 shards, but several big of 4 shards) up to 75% of capacity
  3. Update 2.9 to 2.12 and add RAM to make it 128GB
  4. See many GC messages at logs and almost inoperable cluster
  5. Close 2000 indices to make it work again

Expected behavior

We didn't expect significant memory usage increase at version upgrade

Additional Details

Plugins Security pluging for SAML authn and authz

Screenshots Please note almost horizontal Heap usage before upgrade, increase after upgrade, and horizontal again after closing some indices. image

Host/Environment (please complete the following information):

  • OS: Oracle Linux 8.9
  • OS: Docker image opensearchproject/opensearch:2.12.0

rlevytskyi avatar Feb 26 '24 08:02 rlevytskyi

Thanks @rlevytskyi for reporting the issue. did you try a heap dump? It will help us debug further here. (You can try with smaller heap, the issue might generate faster in that case).

Couple of questions:

  1. Are you running non dedicated cluster manager cluster?
  2. What is the cluster state size, you can check via _cluster/state API output?
  3. How many shards are there overall?
  4. When you observed JVM heap spiking, was it during upgrade from 2.9 to 2.12 or post upgrade as well, it was consistently high?

shwetathareja avatar Feb 27 '24 08:02 shwetathareja

Thank you @shwetathareja for your reply! Here are clarifications:

  1. Yes we are running non-dedicated manager cluster, we have four nodes running both data and master-eligible nodes and two coordinating nodes: % curl logs:9200/_cat/nodes\?s=name d data - v480-data.company.com m master - v480-master.company.com d data - v481-data.company.com m master * v481-master.company.com d data - v482-data.company.com m master - v482-master.company.com d data - v483-data.company.com m master - v483-master.company.com - - - v484-coordinator.company.com - - - v485-coordinator.company.com
  2. Quite a lot of output: % curl logs:9200/_cluster/state | wc 0 4989 193053405
  3. 26696 reported by _cat/shards
  4. It was hitting the top during upgrade and also post upgrade.

rlevytskyi avatar Feb 27 '24 13:02 rlevytskyi

Re heap dump, where should we collect it and when? Right now, we see nothing at data nodes. Coordinating nodes sometimes telling something like [INFO ][o.o.i.b.HierarchyCircuitBreakerService] [v484-coordinator.company.com] attempting to trigger G1GC due to high heap usage [8204216264] [INFO ][o.o.i.b.HierarchyCircuitBreakerService] [v484-coordinator.company.com] GC did bring memory usage down, before [8204216264], after [3248648136], allocations [71], duration [62] but will it's heap dump useful?

rlevytskyi avatar Feb 27 '24 13:02 rlevytskyi

@rlevytskyi one of the major changes in 2.12 is that it is bundled with JDK-21 by default, any chances you could downgrade JDK to 17 for your deployment (may need altering Docker image) to eliminate the JDK version change as a suspect? Thank you.

reta avatar Feb 27 '24 17:02 reta

Thank you Andriy for your reply. I've searched the https://github.com/opensearch-project/OpenSearch and was unable to find appropriate Dockerfile. Could you please point me to the right one?

rlevytskyi avatar Feb 27 '24 19:02 rlevytskyi

I think you need those https://github.com/opensearch-project/opensearch-build/tree/main/docker/release/dockerfiles, but may be simpler way is to "inherit" from 2.12 image and install/replace JDK version to run with.

reta avatar Feb 27 '24 20:02 reta

[Triage - attendees 1 2 3 4 5] @rlevytskyi Thanks for filing - we will keep this issue untriaged for 1 week and if it does not have a root cause we will close the issue.

The following were some recent investigations in the security plugin for your consideration.

  • https://github.com/opensearch-project/security/issues/3776
  • https://github.com/opensearch-project/security/issues/4031

peternied avatar Feb 28 '24 16:02 peternied

I am unable to build OpenSearch image yet. Moreover, Dockerfile ( https://github.com/opensearch-project/opensearch-build/blob/main/docker/release/dockerfiles/opensearch.al2.dockerfile ) is telling that

This dockerfile generates an AmazonLinux-based image containing an OpenSearch installation (1.x Only). Dockerfile for building an OpenSearch image. It assumes that the working directory contains these files: an OpenSearch tarball (opensearch.tgz), log4j2.properties, opensearch.yml, opensearch-docker-entrypoint.sh, opensearch-onetime-setup.sh.`

First of all, it tells "1.x Only" Second, it tells that I have to put some files there but I see no way to make sure I use exactly the same files you use.

So my question is if there is a way to build image exactly as yours to make sure we have the same configuration?

rlevytskyi avatar Feb 29 '24 10:02 rlevytskyi

@rlevytskyi I believe the new file is right next to the that one dockerfile. Take a look at the readme.md, maybe that will help if you are looking to construct a docker image from a custom configuration

Note; following "inherit" from 2.12 image and install/replace JDK version to run with. seems easier IMO

peternied avatar Mar 01 '24 00:03 peternied

@rlevytskyi I'm not sure if you've managed to capture and investigate a heap dump of the OpenSearch process, see this guide to capture that information in a docker environment [1]. This will steer the investigation towards what is causing memory to be consumed. They can also be used to compare a 2.9 vs 2.12 versions for the difference.

  • [1] https://iceburn.medium.com/thread-and-heap-dumps-in-docker-containers-9aada82226fb

peternied avatar Mar 01 '24 00:03 peternied

Thank you Peter, However, neither I am a Java programmer nor a Docker enthusiast, and that "inherit" from 2.12 image and install/replace JDK version to run with" doesn't seem to be clear to me. As far as I understand, it can be achieved by changing "ENTRYPOINT" to "/bin/bash", starting a container, install new Java inside, set JAVA_HOME and run Opensearch. However, you need to rebuild the image to change ENTRYPOINT, and we got in recursion...

rlevytskyi avatar Mar 01 '24 07:03 rlevytskyi

Re Heap Dump, I managed to get and even sanitize it using Paypal's tool https://github.com/paypal/heap-dump-tool . However, it's not feasible to get it right now because cluster is running smoothly now.

rlevytskyi avatar Mar 01 '24 07:03 rlevytskyi

Thank you again @peternied Peter for pointing out the https://github.com/opensearch-project/opensearch-build/blob/main/docker/release/README.md I managed to build the 2.12 image with JDK17 from 2.11.1. Have a nice weekend!

rlevytskyi avatar Mar 01 '24 13:03 rlevytskyi

I managed to create an image based on 2.12 using the following Dockerfile: FROM opensearchproject/opensearch:2.12.0 USER root RUN dnf install -y java-17-amazon-corretto USER opensearch ENV JAVA_HOME=/usr ` Running it at the test installation doesn't reveal any memory usage difference. Looking forward to run a big (prod) installation with it. Do you guys think if it is safe?

rlevytskyi avatar Mar 05 '24 13:03 rlevytskyi

[Triage - attendees 1 2 3 4 5]

Do you guys think if it is safe?

@rlevytskyi Without a root cause / and bugfix it is hard to qualify what next steps to take. I would recommend doing testing and have a mitigation plan if something happens, but your mileage my vary.

Thanks for filing - we will keep this issue untriaged for 1 week and if it does not have a root cause we will close the issue.

Since it has been a week and there is no root cause, we are closing out this issue. Feel free to open a new issue if you find a proximal cause from a heap analysis or a way to reproduce the leak.

peternied avatar Mar 06 '24 16:03 peternied

Want to chime in and say we were running into something similar after upgrading to 2.12. Suddenly all sorts of previously normal operations were causing the overall parent circuit breakers to trip, and there were significantly more GC logs emitted by opensearch overall. This problem was most exacerbated by the snapshot and reindex APIs.

I applied the image changes from @rlevytskyi to use JDK17 and it has completely solved the issues and symptoms we were seeing. Average heap dropped considerably and is much more stable.

tophercullen avatar May 05 '24 00:05 tophercullen

Sounds like upgrading to JDK 21 is the change that caused this. Seems like a real problem. I am going to reopen this and edit the title to say something to this effect. @tophercullen do you think you can help us debug what's going on? There are a few suggestions above to take some heap dumps and compare.

dblock avatar May 05 '24 21:05 dblock

Using the above paypal tool that sanitizes them, I've generated heap dumps from all nodes in a new standalone cluster (nothing else using it) while taking a full cluster snapshot at 1x JDK17 and 2x JDK21. This is 24 files and ~5GB compressed. I'm unsure what I'm supposed to be comparing between them.

From the stdout logging for the cluster, there were no GC logs with JDK17, and a bunch with JDK21. So it seems to be repeatable in an otherwise idle cluster, assuming that is not just a red herring.

Might also consider the reproducer in #12694. That seems fairly similar to our real use case, and the operations we were seeing/getting circuit breakers tripped. Snaphots never directly tripped breakers and/or failed, and were seemingly just exacerbating the problem

tophercullen avatar May 06 '24 03:05 tophercullen

Maybe @backslasht has some ideas about what to do with this next?

dblock avatar May 06 '24 11:05 dblock

Using the above paypal tool that sanitizes them, I've generated heap dumps from all nodes in a new standalone cluster (nothing else using it) while taking a full cluster snapshot at 1x JDK17 and 2x JDK21. This is 24 files and ~5GB compressed. I'm unsure what I'm supposed to be comparing between them.

May be sharing class histogram first could help (even as a screenshot) , thanks @tophercullen

reta avatar May 06 '24 12:05 reta

https://github.com/opensearch-project/OpenSearch/issues/12694 could be related

dblock avatar May 06 '24 20:05 dblock

This might be related to this issue in JDK: https://bugs.openjdk.org/browse/JDK-8297639 The G1UsePreventiveGC was introduced and set to true by default in JDK17 (introduced in this commit, renamed in this commit ) The related issue is https://bugs.openjdk.org/browse/JDK-8257774. This was introduced to solve

...bursts of short lived humongous object allocations. These bursts quickly consume all of the G1ReservePercent regions and then the rest of the free regions

In JDK 20, this flag was set to false by default and in JDK 21 it was completely removed in https://bugs.openjdk.org/browse/JDK-8293861.

Summarizing the observations and reproducing efforts by the community around this JDK issue: removing this flag might have caused memory increase when sending and receiving document with chunks > 2MB. In JDK 20 we can add the G1UsePreventiveGC flag back to bypass this issue but in JDK21 it is not an option anymore :( We either need go back to JDK 20 with that flag enabled, or we need to explore other possible ways to fix this.

ansjcy avatar May 06 '24 23:05 ansjcy

@ansjcy that was suggested before (I think on the forum) but we did not use -XX:+G1UsePreventiveGC (AFAIK)

reta avatar May 06 '24 23:05 reta

@rlevytskyi @tophercullen Do you still have your repro. Care you try with JDK 21 and -XX:+G1UsePreventiveGC, please?

dblock avatar May 07 '24 06:05 dblock

@dblock I can do what I did before: create a new cluster and populate it with data, run snapshots.

However based on what @ansjcy provided, that option is no longer available in JDK21. The issue tracker for openJDK links to a similar issue with Elasticsearch in this regard, which also has no solution using JDK21.

tophercullen avatar May 07 '24 07:05 tophercullen

However based on what @ansjcy provided, that option is no longer available in JDK21.

Yes, my bad for not reading carefully enough.

dblock avatar May 07 '24 09:05 dblock

but we did not use -XX:+G1UsePreventiveGC

No, but if I'm understanding correctly, this flag was enabled by default in g1_globals.hpp for G1GC in JDK 17.

Also, today I did some more experiments using https://github.com/kroepke/opensearch-jdk21-memory (Thanks, @kroepke! ). I ran bulk (20MB workload per request, ~5MB each document) with docker-based set up, each for 1 hour in the following scenarios:

  • 2.11 with JDK 17, G1UsePreventiveGC flags enabled [1].
  • 2.11 with JDK 17, G1UsePreventiveGC flags disabled [2].
  • 2.11 with JDK 21 [3]

captured the jvm usage results in the 1 hour run: image

  • for [1], the average jvm usage is 191707377 bytes
  • for [2], the average jvm usage is 196708634 bytes
  • for [3], the average jvm usage is 201973645 bytes

The results shows certain but not significant impact from disabling the flag G1UsePreventiveGC in JDK 17, but there might be some unknown factors impacting the jvm usage in JDK 21 as well. We need to run even longer and heavier benchmark tests to better understand this.

ansjcy avatar May 08 '24 23:05 ansjcy

@ansjcy - Do you think G1UsePreventiveGC is the root cause or it is something else?

@tophercullen - Can you please share the heap dumps?

@dblock - Is there a common share location where these heap dumps can be uploaded?

backslasht avatar May 18 '24 15:05 backslasht

@dblock - Is there a common share location where these heap dumps can be uploaded?

AFAIK no, we don't have a place to host outputs from individual runs - I would just make an S3 bucket and give access to the folks in this thread offline if they don't have a place to put these

dblock avatar May 20 '24 15:05 dblock

We're seeing background memory use climb over time (pointing to some kind of GC/memory leak as described) on our AWS managed OpenSearch clusters since the 2.13 upgrade. We went from 2.11 (where the issue was not manifesting), to 2.13. We've had to bump all our nodes from 8GB memory instances to 32GB memory instances just to keep the cluster from falling over every night.

Apart from the version upgrade, there have been no other changes.

Attached can be seen climbing min/avg/max JVM mem pressure over the last week (we've been on 2.13 for >1 week, some adverse cluster events can be seen on this chart too).

Screenshot 2024-06-18 at 11 27 48

Anything we can pull from our managed clusters to help resolve this? We're sorely over-provisioned now, so we're willing to put in some legwork to solve this.

zakisaad avatar Jun 18 '24 01:06 zakisaad