Problems with Next.js SDK + OTEL
Just a collection of things I noticed when trying out the Next.js SDK with our newest OTEL implementation. This list is WIP:
Findings:
- Next.js is creating spans by itself as soon as we call
Sentry.init(). - Apparently, the
@vercel/otelpackage has support for Edge runtime fetch instrumentation. We need to try this out.
Problems:
- [x] Root spans have garbage names and will therefore pollute the Performance tab
- ✅ Kind of solved by setting
forceTransaction: truefor our own spans.
- ✅ Kind of solved by setting
- [x] Not using
instrument.tsto init the SDK will destroy parent-child relationships between Next.js spans and Sentry SDK spans. No friggin idea what is going on here.- ✅ Solved by forcing users to use
instrumentation.ts
- ✅ Solved by forcing users to use
- [ ] Since Next.js now creates root spans, the root span doesn't have much data at all (No headers, no tags, no anything).
- [x] Next.js created a lot of timed events which the SDK turned into error events
- ✅ Solved by removing our error event deletion logic: https://github.com/getsentry/sentry-javascript/pull/11221
- [ ] Flushing on Vercel doesn't work properly for spans created by OTEL.
- Apparently, there is a
waitUntilcoming in Next.js which could solve this. Right now I don't have an idea on how to fix this. I tried adding a SIGTERM handler, I tried adding a custom spanprocessor that flushes. Nothing works.
- Apparently, there is a
- [ ] For the time being we will still need out own auto-instrumentation for Edge API routes, and Edge Server components
- [x] Next.js crashes with the
--turboflag: https://github.com/vercel/next.js/issues/64022 - [ ] There is very weird log output when building and in the dev-server.
- Might be fixed by: https://github.com/open-telemetry/opentelemetry-js/pull/4593
- [ ] Prisma query spans aren't attached to transactions
- [ ] Dev server symbolification requests start traces (needs some mechanism to opt out of tracing for edge, browser, and node)
- [ ] Edge runtime doesn't use otel yet, however we have runtime agnostic code that runs on edge and node, causing problems if we depend on Next.js creating spans, for example for API routes
- [ ] Tracing doesn't work on Next.js 14 from version 14.0.1-canary.1, if we disable the Http integration (which we kinda have to). This PR in Next.js introduced this bug https://github.com/vercel/next.js/pull/57084
- [ ] We need to filter server transactions for all kinds of static resources: CSS, Favicon, Fonts, ...
- [ ] Suspended server components seem to be breaking out of the async context of the parent server component
I created a Next.js test application based on this PR where I excluded our HTTP integration. I tried different scenarios (httpIntegration added/excluded in SDK, Next.js version in test app, manually adding startSpan to client function or server route handler).
I will differentiate between up until Next.js v14.0.1-canary.0 and from v14.0.1-canary.1 ongoing, because that made the biggest difference in the outcome.
As it is about checking the spans created by route handlers, most of the logic and console.logs where happening in packages/nextjs/src/common/wrapRouteHandlerWithSentry.ts. In the test application, I fetched GET /api/delayed-res/3 (route handler) with a button click in a client component.
Until Next.js 14.0.1-canary.0 (also v13)
When fetching, the rootSpan is defined and looks something like this:
{
"span_id": "073d3f27a455bf13",
"trace_id": "85d3b8afd4e840cf9d2f79e496508240",
"data": {
"next.span_name": "GET /api/delayed-res/3",
"next.span_type": "BaseServer.handleRequest",
"http.method": "GET",
"http.target": "/api/delayed-res/3",
"sentry.sample_rate": 1,
"sentry.parentIsRemote": true
},
"description": "GET /api/delayed-res/3",
"parent_span_id": "a763e5c6d28d2bc7",
"start_timestamp": 1712306141.038
}
The traces show up correctly (except for the parameter) in Spotlight.
With a startSpan wrapper on the route handler:
Without a startSpan wrapper:
When including the httpIntegration, the spans are not sampled (just like how it is after v14.0.1-canary.1) and the root span also looks like how it looks there.
After Next.js 14.0.1-canary.1
When fetching, rootSpan is defined and looks like this:
{
"span_id": "e450a61e70c6d723",
"trace_id": "aa4b89228bce4a5481bb73f672ea3342"
}
Spotlight does not show any traces coming in. Wrapping the route handler code in a startSpan doesn't show traces either.
Only when adding startSpan around the client function which fetches the server route, the spans pop up in Spotlight. BUT they are missing the parent and the three requests seem to have the same trace ID.