posthog-js icon indicating copy to clipboard operation
posthog-js copied to clipboard

`url_ignorelist` on Autocapture config doesn't seem to be working

Open jperezr21 opened this issue 1 year ago • 4 comments

I want to ignore events from the Vercel screenshots crawler (user agent vercel-screenshot/1.0).

These events have $current_url set to https://project-and-deployment-id.vercel.app/.

I tried adding all of the following:

    autocapture: {
      url_ignorelist: [/.*\.vercel\.app\/.*/, "vercel.app", "vercel.app/.*"],
    },

but I'm still receiving these events.

Am I doing something wrong or is this a bug?

jperezr21 avatar Sep 06 '24 17:09 jperezr21

Hey,

Are you receiving only $autocapture events? That ignore list is to stop only $autocapture events (like clicked on a div)

I'd expect it's better to edit the user agent blocker... So, in your config you can set additional user agents we'll detect as bots.

in your config you'd set custom_blocked_useragents: ['vercel-screenshot/1.0']

(ofc if you're fine with other events and its just autocapture you want to block we can check that too :))

pauldambra avatar Sep 06 '24 17:09 pauldambra

These are $pageleave events, which I have enabled in the config. So maybe they aren't $autocapture. Here's my whole config:

  posthog.init(env.NEXT_PUBLIC_POSTHOG_KEY, {
    api_host: "/ingest",
    ui_host: "https://app.posthog.com",
    person_profiles: "identified_only",
    capture_pageview: false,
    capture_pageleave: true,
    autocapture: {
      url_ignorelist: [/.*\.vercel\.app\/.*/, "vercel.app", "vercel.app/.*"],
    },
  });

I'll try adding custom_blocked_useragents param.

Thanks!

jperezr21 avatar Sep 06 '24 18:09 jperezr21

Turns out Vercel doesn't always set the user agent to that 🤦

Their latest request came with Mozilla/5.0 (iPhone; CPU iPhone OS 17_6_1 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Mobile/15E148.

I think the Posthog client should be a global url_ignorelist option.

jperezr21 avatar Sep 06 '24 18:09 jperezr21

Hey,

In the short term.. you can detect the URL and not initialize posthog which is equivalent to a url ignore list (if slightly more work on your part) and should work if you never want to collect on that URL... similar to how folk check if they're running on localhost and don't send data.

I think this is maybe another vote for a global onEvent or similar that lets folk mask / edit / reject events. In your case it'd be if (url === blah) return null 🤔

pauldambra avatar Sep 07 '24 10:09 pauldambra

we added before_send as an option on posthog-js which could be used for this

pauldambra avatar Dec 26 '24 14:12 pauldambra

This worked for me!

    posthog.init(process.env.NEXT_PUBLIC_POSTHOG_KEY as string, {
      api_host:
        process.env.NEXT_PUBLIC_POSTHOG_HOST || "https://us.i.posthog.com",
      ui_host: "https://us.posthog.com",
      person_profiles: "always", // or 'always' to create profiles for anonymous users as well
      capture_pageview: false, // Disable automatic pageview capture, as we capture manually

      // Filter out events from *.vercel.app domains
      before_send: (event) => {
        if (!event) {
          return null;
        }

        // Check if the current host is a vercel.app domain
        const currentHost = window.location.host;
        if (currentHost.endsWith(".vercel.app")) {
          // Return null to prevent the event from being sent
          return null;
        }

        // Otherwise, send the event
        return event;
      },
    });

kevin-hammond avatar Mar 14 '25 16:03 kevin-hammond