sentry-javascript icon indicating copy to clipboard operation
sentry-javascript copied to clipboard

Improve paramaterization of transaction names

Open smeubank opened this issue 3 years ago • 6 comments

Describe the idea

We need to revisit how the JS SDK captures transactions (URL routes) and sends them to sentry.

Reasons for doing this and things we need to consider:

  • parameterization: We should parameterize URLs whenever possible

Because

  • grouping: transaction name influences transaction grouping; raw URL names lead to ungroupted transactions
  • indexing and high cardinality: Raw URL transaction names lead to a high cardinality of transactions (because they're not grouped)
  • PII: Raw URL transaction names can contain sensitive data/PII (e.g. IDs, auth tokens, etc.)
  • Dynamic Sampling: Propagating raw URLs in DSC vs. showing parameterized routes in the Sentry UI (and the DS settings) creates a lot of user confusion

Best effort has been decent so far, but the fallback to unparameterized or whole URL maybe sub-optimal.

Examples:

  • Low-cardinality transaction name:
    • {"transaction": "/users/{username}", "transaction_source": "route"}
  • Presumably a high-cardinality transaction name:
    • {"transaction": "/users/123235", "transaction_source": "uri"}
  • User-defined transaction name, cardinality unknown:
    • {"transaction": "my_transaction_name", "transaction_source": "custom"}

Requirement:

what should be sent and where?

baggage header

  • error envelope payload
  • in the envelope header

Possible implementation

There's a couple of things we can do or at least check:

Existing Routing Instrumentations with parameterization

As listed in #5345, we have a lot of popular routers covered with routing instrumentations. However, we might be able to improve paramenterizations in some of them. Hence, for each instrumentation

  • check if there is a way to parameterize earlier
  • try to match routes better (if it turns out that there are cases where our current matching fails)
  • (send source information; tracked in #5345)

Existing Routing Instrumentations without parameterization

TODO: Check to which routers this applies

There are some routing instrumentations that don't parameterize currently.

  • add parameterization whenever possible

TBD: Approximative Parameterization

This has been discussed quite a bit in the past but given that we have to make our best effort for parameterization, let's revisit this topic. The idea is seemingly simple: We could try to add a mechanism that takes a raw URL and tries to guess parts of that URL that might be parameters (e.g. IDs, tokens, etc). The mechanism would then replace these parts with a generic param placeholder.

Example:

/users/1235/credentials ==> users/:id/credentials

There are a lot of possible issues with this because obviously, there are going to be loads of edge cases, where this approximation might be off or miss parameters completely.

Why is this challenging?

  • In the product, we have to explain why DSC might have another (or no) transaction name than what's visible in the UI
  • We have to explain why unparameterized routes are sent and even full URLs.
  • Some frameworks and routers provide unique challenges
  • What about custom routing instrumentations we don't control?

Places to improve parameterization

  • [ ] nextjs (@lobsterkatie ) https://github.com/getsentry/sentry-javascript/issues/5505
  • [x] express integration (#5450, @Lms24)
  • [x] angular (#5416, @Lms24)
  • [x] vue (I think Vue is pretty much optimal, @lforst)
  • [x] react router (pretty much optimal)
    • [x] react router v6 https://github.com/getsentry/sentry-javascript/pull/5515
  • [x] without framework (https://github.com/getsentry/sentry-javascript/pull/5411, @lforst)
  • [x] remix https://github.com/getsentry/sentry-javascript/pull/5491
  • [x] Ember
  • [ ] Gatsby

smeubank avatar Jun 30 '22 12:06 smeubank

@Lms24 let me know when you have time, this needs refinement. We can loop in others of course

  • identify ways we could quantify this problem
  • identify problem areas we want and importantly could address

smeubank avatar Jul 06 '22 13:07 smeubank

Always happy to refine/discuss this. Since this became a very important issue, we should sync with everyone here who has context on how transaction names/parameterization/routing instrumentation currently works

Lms24 avatar Jul 06 '22 13:07 Lms24

@smeubank fleshed out the issue description. Feel free to add/change stuff

Lms24 avatar Jul 07 '22 08:07 Lms24

Apologies if there is a better place for me to ask about this, but I have been unable to get parameterization to work with react-router v6.

Here are some excerpts of what I believe to be the relevant code:

Sentry.init({
  ...
  integrations: [
    new BrowserTracing({
      routingInstrumentation: Sentry.reactRouterV6Instrumentation(
        useEffect,
        useLocation,
        useNavigationType,
        createRoutesFromChildren,
        matchRoutes,
      ),
    }),
  ],
});

const SentryRoutes = Sentry.withSentryReactRouterV6Routing(Routes);

const MyRoutes = () => {
  return (
    <SentryRoutes>
      <Route index element={<Navigate to="/projects" />} />

      <Route path="/account" element={<AccountPage />} />

      <Route path="/projects">
        <Route index element={<ProjectBrowser />} />

        <Route path=":projectId" element={<ProjectPage />}>
          <Route index element={<ProjectPageRoot />} />

          <Route element={<EditorPage />}>
            <Route path="views/:viewId" element={<ViewCanvas />} />
            <Route path="spaces/:spaceId" element={<SpaceCanvas />} />
          </Route>
        </Route>
      </Route>

      <Route path="*" element={<NoMatchPage />} />
    </SentryRoutes>
  );
};

export default MyRoutes

As far as I can tell, I'm not doing anything unusual, except possibly the use of react router's Outlets and Layout Routes.

With this configuration I would expect to see a single transaction for /projects/*/views/*, but instead I'm seeing separate transactions for each individual page.

Is there something obvious that I'm doing wrong here, or does the sentry integration not support parameterization with this usage?

jamesbvaughan avatar Jul 25 '22 20:07 jamesbvaughan

Hi @jamesbvaughan, would you mind opening a dedicated issue for this? Makes it easier to track this for us. cc @AbhiPrasad maybe you have a quick idea?

Lms24 avatar Jul 27 '22 10:07 Lms24

I've created https://github.com/getsentry/sentry-javascript/issues/5513 to track my issue separately. Thanks!

jamesbvaughan avatar Aug 02 '22 18:08 jamesbvaughan

This issue has gone three weeks without activity. In another week, I will close it.

But! If you comment or otherwise update it, I will reset the clock, and if you label it Status: Backlog or Status: In Progress, I will leave it alone ... forever!


"A weed is but an unloved flower." ― Ella Wheeler Wilcox 🥀

github-actions[bot] avatar Aug 30 '22 00:08 github-actions[bot]

Tracking the nextjs and remaining issues in #5505

vladanpaunovic avatar Sep 05 '22 14:09 vladanpaunovic