sentry-javascript icon indicating copy to clipboard operation
sentry-javascript copied to clipboard

Sentry upgrade from 7.118.0 to 8.26.0 leak memory

Open flav-code opened this issue 1 year ago • 40 comments

Is there an existing issue for this?

  • [x] I have checked for existing issues https://github.com/getsentry/sentry-javascript/issues
  • [x] I have reviewed the documentation https://docs.sentry.io/
  • [x] I am using the latest SDK release https://github.com/getsentry/sentry-javascript/releases

How do you use Sentry?

Self-hosted/on-premise

Which SDK are you using?

@sentry/node

SDK Version

8.26.0

Framework Version

No response

Link to Sentry event

No response

Reproduction Example/SDK Setup

my sentry init in 7.118.0

    init({
        dsn: process.env.SENTRY_DSN,
        environment: process.env.NODE_ENV || "none",
        release: require("../../package.json").version,
        serverName: `${cluster.bot}-c=${cluster.id}-first=${cluster.first}-last=${cluster.last}` || "none",
        integrations: [
            new Integrations.Postgres({ module: require("pg") }),
            new Integrations.Modules(),
            new Integrations.FunctionToString(),
            new Integrations.LinkedErrors(),
            new Integrations.Console(),
            new Integrations.Http({ breadcrumbs: true, tracing: true }),
            rewriteFramesIntegration({ root: path.join(__dirname, "..") }),
        ],
        // Performance Monitoring
        tracesSampleRate: 1.0, //  Capture 100% of the transactions
    });

my sentry init in 8.6.0

    Sentry.init({
        dsn: process.env.SENTRY_DSN,
        environment: process.env.NODE_ENV || "none",
        release: require("../../package.json").version,
        serverName: `${cluster.bot}-c=${cluster.id}-first=${cluster.first}-last=${cluster.last}` || "none",
        integrations: [
            Sentry.modulesIntegration(),
            Sentry.functionToStringIntegration(),
            Sentry.linkedErrorsIntegration(),
            Sentry.consoleIntegration(),
            Sentry.httpIntegration({ breadcrumbs: true }),
            Sentry.rewriteFramesIntegration({ root: path.join(__dirname, "..") }),
            Sentry.onUnhandledRejectionIntegration(),
            Sentry.onUncaughtExceptionIntegration(),
            Sentry.redisIntegration(),
            Sentry.postgresIntegration(),
        ],
        // To avoid sending too much data to Sentry, we can reduce the sample rate of traces and profiles
        tracesSampleRate: 1.0,
        profilesSampleRate: 1.0,
    });

Steps to Reproduce

I'll try removing some of the integrations to see what's causing the problem.

Expected Result

A normal memory usage

Image

Actual Result

anormal memory usage

Image

flav-code avatar Aug 17 '24 00:08 flav-code

Hey @flav-code thanks for writing in!

We'll look into your issue next week as this week is Hackweek at Sentry (see #13421).

Lms24 avatar Aug 19 '24 09:08 Lms24

Hi, @flav-code would you be able to provide a memory snapshot with the Node/v8 profiler so that we can look at what is holding the references causing the leak? Feel free to also shoot us an email or twitter dm if you don't want to publicly share it. Thanks!

lforst avatar Aug 26 '24 07:08 lforst

I can't give you an snapshot because I have a lot of private information.

flav-code avatar Aug 26 '24 13:08 flav-code

I made a snapshot, I'll examine it and show you

flav-code avatar Aug 26 '24 14:08 flav-code

Image

flav-code avatar Aug 26 '24 15:08 flav-code

Set, Span and NonRecordingSpan are from Sentry

flav-code avatar Aug 26 '24 15:08 flav-code

Yeah that looks like Sentry. Would you mind digging around a bit ant examine what holds the references to the spans?

lforst avatar Aug 26 '24 15:08 lforst

Image

flav-code avatar Aug 26 '24 15:08 flav-code

Image

flav-code avatar Aug 26 '24 15:08 flav-code

Set, Span and NonRecordingSpan are from Sentry

This indicates that Sentry objects are retained in memory. It doesn't mean they are causing this retention though! E.g. the expanded Span object has the distance (from root) of 15. The sentryRootSpan - distance of 12. It means something else is holding the reference to it, something that is closer to the root. You might want to explore the bottom part of the memory profiler - the Retainers section

amakhrov avatar Aug 26 '24 15:08 amakhrov

Image

flav-code avatar Aug 26 '24 16:08 flav-code

timerListMap[180000] - my understanding is that the app has started a timer for 180s (3min). Any chance it's your app code rather than Sentry?

amakhrov avatar Aug 26 '24 16:08 amakhrov

I don't think so. Before switching to sentry v8, someone warned me about the memory leak.

flav-code avatar Aug 26 '24 18:08 flav-code

I have a few questions to narrow this down further. I am not ruling out that our SDK is causing the leak:

  • What backend framework are you using - if at all?
  • Are you manually creating spans somewhere?
  • Can you share any more code that is relevant to Sentry?
  • What Node.js version are you using? We have found that there are memory leaks in Node.js with regards to timers?
  • Can you double-check whether you have any setTimeouts or setIntervals?
  • One way we have found things may "leak" is if you create a span and don't end it. That way children get attached and they will be held indefinitely.

lforst avatar Aug 27 '24 07:08 lforst

Hello, I use sentry on a discord bot not a website Node version: v20.17.0

Image

Image

flav-code avatar Aug 27 '24 16:08 flav-code

@flav-code Would you mind answering the questions I asked. It's important we have answers to them so we can rule out certain things.

lforst avatar Aug 28 '24 07:08 lforst

I've already double-checked my code When you talk to me about span, I think of HTML elements, but I don't do web.

flav-code avatar Aug 28 '24 09:08 flav-code

@flav-code I am talking about Sentry spans. Which you would start with Sentry.startSpan(). (https://docs.sentry.io/concepts/key-terms/tracing/distributed-tracing/#traces-transactions-and-spans)

Do you happen to have very long running requests in your process? Like server-sent-events, or streaming?

lforst avatar Aug 28 '24 09:08 lforst

On my side I never call Sentry.startSpan()

flav-code avatar Aug 28 '24 09:08 flav-code

@flav-code Do you have any very long running requests / request handlers in your code?

lforst avatar Aug 28 '24 09:08 lforst

I wouldn't say long requests, but I have a lot of requests per second, over 110req/s at times, and I receive a lot of messages via websocket.

flav-code avatar Aug 28 '24 10:08 flav-code

here you can see that the memory leak arrived at the same time as my sentry update, and since then I've had the performance tab working.

Image

Image

Image

flav-code avatar Aug 28 '24 10:08 flav-code

I believe you saying this is correlated with the update. Can you share a bit more about your program architecture? Is discord opening a websocket request on your server or is your server making a websocket request to discord?

Also, are you initializing Sentry more than once per process?

lforst avatar Aug 28 '24 11:08 lforst

From what I can tell from your profiler screenshots, something is creating a Span (a root span) that seemingly never finishes, and spans keep getting attached to that span and that ends up leaking memory. It would be nice to be able to figure out, what it is that is creating this root span.

lforst avatar Aug 28 '24 11:08 lforst

  • I initialize sentry only once
  • On each cluster I open 16 websocket connections to discord (clients)

I'll try to remove the profillerRate line, and we'll see if it comes from that.

flav-code avatar Aug 28 '24 11:08 flav-code

If you could somehow also try logging console.log(Sentry.getRootSpan()) somewhere in your websocket code, and share what is logged, that would be cool!

lforst avatar Aug 28 '24 11:08 lforst

Sentry.getRootSpan() requires and argument

flav-code avatar Aug 28 '24 11:08 flav-code

Sorry right. Can you try

const s = Sentry.getActiveSpan();
if (s) {
  console.log('root span', Sentry.getRootSpan(s));
}

lforst avatar Aug 28 '24 11:08 lforst

On the active process i used eval()

NonRecordingSpan {
  _spanContext: {
    traceId: 'ba6488d048422cfba347c2a2b9b1eca5',
    spanId: '5df94fad8fd85299',
    traceFlags: 0,
    traceState: [TraceState]
  }
}

flav-code avatar Aug 28 '24 11:08 flav-code

Thanks! We are struggling trying to understand what is creating this non-recording Span. Usually that shouldn't happen unless you set a traces sample rate of <1 or you manually continue a trace.

Can you try setting debug: true in your ˚Sentry.init()`? It will probably generate a lot of logs, but it may tell you what exactly is continued or not sampled and so on.

lforst avatar Aug 28 '24 11:08 lforst