feat(insights): launch funnels as a Clickhouse UDF behind a feature flag
Problem
Funnels are not working with more than about 12 steps. They show exponential time growth with each step.
Changes
This is an experiment to rewrite funnels to run in a UDF. This PR launches that feature behind a feature-flag.
What's a UDF? It's a user-defined function. You can define a function and implement it in any language of your choosing. Clickhouse launches your function and has it sit idle, preloaded and hot, waiting for input on stdin. When you call the function from Clickhouse, it pipes the data to the function and listens for its output on stdout.
Why a UDF? This makes the funnels code a lot simpler and easier to reason about. The core of the functionality is in a small python function.
What are the downsides of this approach? The biggest one is that UDFs are slower than native clickhouse. You shouldn't use an UDF for anything clickhouse does well natively. A lot of this speed difference can be mitigated by optimizing your UDF (writing it in C / C++ for example).
How does it work
We use clickhouse to turn events into mostly the same structure we have now - a set of matching steps and exclusions.
The core of the code is in aggregate_funnel.py. For each aggregation_target, it iterates through all the matching events in time order, keeps track of funnel progress, and returns timings and results.
At the end, we use Clickhouse to do a couple aggregations on breakdowns and to calculate averages.
At least locally, it runs much faster than the existing queries.
Product Questions (about strict and unordered funnels)
While working on this, I dug a bit into usage of strict and unordered funnel. Both have shockingly low usage at 0.3% and 1.2% respectively.
select coalesce(di.filters->>'funnel_order_type', 'ordered') as fot, count(*) from posthog_dashboarditem as di where di.filters->>'insight' = 'FUNNELS' group by fot
select coalesce(di.filters->>'funnel_order_type', 'ordered') as fot, coalesce(di.filters->>'funnel_viz_type', 'steps') as viz, count(*) from posthog_dashboarditem as di where di.filters->>'insight' = 'FUNNELS' group by (fot, viz) order by count(*) desc
Strict mode is supposed to not allow any events between steps in the funnel. In reality, this doesn't make sense - if you start tracking something new, it could break all your funnel data. Strict was launched in July of 2021, so it's not exactly new. The fact the usage is so low isn't a great sign.
Unordered mode was also launched around the same time, and maybe has a plausible use case but I think needs some product focus and changes if we're going to support it. An example of an improvement that would make it more useful would be allowing sets of events to be unordered, vs having unordered be a global toggle for the whole funnel. Allowing sets of unordered events would allow people to track out of order user behavior, potentially gated on things like "checkout" vs just having a somewhat aimless unordered funnel that just tells you how many steps they completed.
I think maybe we could think about moving these to being deprecated. Next steps here would be looking to see if any of these strict or unordered funnels get frequent traffic. Thoughts?
Further Questions
How does this work at scale? It runs quickly (much quicker than the old code) locally, but how does it scale for the largest funnels we have?
Follow up
- Add support for unordered mode
- Verify strict mode is working correctly on current funnels (if we decide to keep it around at all)
Assuming that it works at scale, follow up items are
- Port adjacent queries (actors, etc) to the new queries
- Performance improvements. Optimize the tight loop / rewrite it in a faster language (C, rust)
- Clean up legacy code
Does this work well for both Cloud and self-hosted?
It might take some work to figure out how to deploy this for Cloud.
How did you test this code?
Unit testing. Local testing comparing the old funnel to the new funnel on dev.
Size Change: +50 B (0%)
Total Size: 1.12 MB
âšī¸ View Unchanged
| Filename | Size | Change |
|---|---|---|
frontend/dist/toolbar.js |
1.12 MB | +50 B (0%) |
đ¸ UI snapshots have been updated
1 snapshot changes in total. 0 added, 1 modified, 0 deleted:
-
chromium: 0 added, 1 modified, 0 deleted (diff for shard 2) -
webkit: 0 added, 0 modified, 0 deleted
Triggered by this commit.
@aspicer amazing work! I haven't noticed any problems during manual testing. The 20-step funnel takes 308s to complete locally without the feature flag. It takes 1.4s to complete the same query with the UDFs!
I love this, honestly something we should have done earlier. We can't ship this quite yet, but a lot of the work we are doing right now around shipping configs with ansible will enable us to ship UDFs like this very easily. We'll probably need to move the function definitions over to the infra repo so they are deployed in lock step with the function configs, but otherwise I am super excited for this.
This PR hasn't seen activity in a week! Should it be merged, closed, or further worked on? If you want to keep it open, post a comment or remove the stale label â otherwise this will be closed in another week.
This PR was closed due to lack of activity. Feel free to reopen if it's still relevant.
đ¸ UI snapshots have been updated
3 snapshot changes in total. 0 added, 3 modified, 0 deleted:
-
chromium: 0 added, 3 modified, 0 deleted (diff for shard 1, diff for shard 2) -
webkit: 0 added, 0 modified, 0 deleted
Triggered by this commit.
@aspicer amazing work! I haven't noticed any problems during manual testing. The 20-step funnel takes 308s to complete locally without the feature flag. It takes 1.4s to complete the same query with the UDFs!
Amazing, glad to hear this. Currently working on a funnel myself which will have 30+ steps. Given the speedup, would it also be possible to further increase the step limit of funnels (Since you currently can't add more than 20 steps)? Right now I'm just resolving this by splitting the funnel into multiple insights with 20 steps.
đ¸ UI snapshots have been updated
2 snapshot changes in total. 0 added, 2 modified, 0 deleted:
-
chromium: 0 added, 2 modified, 0 deleted (diff for shard 1) -
webkit: 0 added, 0 modified, 0 deleted
Triggered by this commit.
đ¸ UI snapshots have been updated
3 snapshot changes in total. 0 added, 3 modified, 0 deleted:
-
chromium: 0 added, 3 modified, 0 deleted (diff for shard 1) -
webkit: 0 added, 0 modified, 0 deleted
Triggered by this commit.
đ¸ UI snapshots have been updated
1 snapshot changes in total. 0 added, 1 modified, 0 deleted:
-
chromium: 0 added, 1 modified, 0 deleted (diff for shard 2) -
webkit: 0 added, 0 modified, 0 deleted
Triggered by this commit.
đ¸ UI snapshots have been updated
1 snapshot changes in total. 0 added, 1 modified, 0 deleted:
-
chromium: 0 added, 1 modified, 0 deleted (diff for shard 2) -
webkit: 0 added, 0 modified, 0 deleted
Triggered by this commit.