Files
posthog.com/contents/blog/how-we-turned-clickhouse-into-our-eventmansion.mdx
2025-11-04 10:27:24 +00:00

91 lines
9.5 KiB
Plaintext
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

---
date: 2021-11-06
title: How we turned ClickHouse into our event mansion
rootPage: /blog
sidebar: Blog
showTitle: true
hideAnchor: true
author:
- james-greenhill
featuredImage: >-
https://res.cloudinary.com/dmukukwp6/image/upload/posthog.com/contents/images/blog/clickhouse-announcement.png
featuredImageType: full
category: Engineering
tags:
- ClickHouse
---
Recently, PostHog was invited to speak at [OSA Con 2021](https://altinity.com/osa-con-2021/), an open source analytics conference organised by Altinity. It was a fantastic opportunity to talk to the community, alongside other engineers and leaders from projects such as [ClickHouse](https://clickhouse.com/), [Airflow](https://airflow.apache.org/), [Superset](https://superset.apache.org/) & [Preset](https://preset.io/), [Pinot](https://pinot.apache.org/), [Druid](https://druid.apache.org/), [Cube.js](https://cube.dev/), and [Presto](https://prestodb.io/).
Topics ranged from deep dives into reverse ETL processes to discussions about broad trends in the data engineering sector. At PostHog, we chose to talk about our experience [migrating PostHog from Postgres to ClickHouse](/blog/clickhouse-announcement) — why we did it, how it went and what impact it had. ClickHouse is a technology were heavily invested in, so we wanted to share some points from our presentation here too.
## The night before ClickHouse
The original version of PostHog ran on Postgres. In fact, the _very_ first version was an MVP created by our co-founder, Tim Glaser, over the course of [four intense weeks at Y Combinator](/blog/inflated-risk-seems-riskier).
These early revisions were straightforward web apps which relied entirely on Django ORM for queries and ran on Postgres because of its capacity to handle relational data. They had easy, one-click deployments set up via Heroku.
This approach was great for getting early traction, but we very quickly started to push Heroku Postgres to the absolute limit. As more teams adopted PostHog and as those teams ingested more events, our performance started to decline. The more users we had and the more they used us, the worse their experience got.
This is what spurred us to consider a new database. We needed a more scalable solution and it needed to be one which fit with both the characteristics of our product — that its open-source and can be self-hosted — and the needs of our users — that it can offer real-time results and users can define and filter on their own event properties.
## Choosing ClickHouse
Faced with this need, we started investigating replacements for Postgres. We looked at a wide range of OLAP solutions, including [Pinot](https://pinot.apache.org/), [Presto](https://prestodb.io/), [Druid](https://druid.apache.org/), [TimescaleDB](https://www.timescale.com/), CitusDB, and ClickHouse. Some of our team had used these tools before at other companies, such as Uber where Pinot and Presto are both used extensively.
While assessing each tool, we looked at on three main factors:
- _Speed:_ Our users want results in real-time, so our new database needed to scale really well and give fast results. Ideally it wouldnt be too expensive either.
- _Complexity:_ PostHog users can self-host and install our product themselves, so we didnt want it to be too complicated for users to manage or deploy. We didnt want users to have to install an entire Hadoop stack, for example.
- _Query interface:_ We like standardised tools. We eliminated tools such as Druid because, while it does have a SQL wrap around it, its not _exactly_ SQL. That can get messy.
ClickHouse was a good fit for all of these factors, so we started doing a more thorough investigation. We read up on benchmarks and researched the experience of companies such as [Cloudflare that uses ClickHouse to process 6m requests per second](https://blog.cloudflare.com/http-analytics-for-6m-requests-per-second-using-clickhouse/). Next, we set up a test cluster to run our own benchmarks.
ClickHouse repeatedly performed an order of magnitude better than other tools we considered and we discovered other perks, such as the fact that it is column-orientated and written in C++.
- _Compression:_ ClickHouse has _excellent_ compression and the size-on-disk was incredible. ClickHouse even beat out serialization formats such as ORC and Parquet.
- _Process from disk:_ Some OLAP solutions, like Presto, require data to live in memory. Thats fast, but you need to have a _lot_ of memory for big datasets. ClickHouse processes from disk, which is better for smaller instances too.
- _Real-time data updates:_ ClickHouse basically processes data as it arrives, so theres no need to pre-aggregate data. Its faster for us, and our users.
Eventually, we decided we knew enough to proceed and so we spun our test cluster out into an actual production cluster. Its just part of how [we like to bias for speed](/careers).
## Turning ClickHouse into our EventMansion
It wasnt all smooth sailing. [Mutations](https://clickhouse.com/docs/en/sql-reference/statements/alter/#mutations), in particular, tripped us up, as we initially used them a lot instead of taking the time to model the data perfectly for ClickHouse. The documentation suggests not running more than one or two mutations an hour. We were running several hundred per minute.
Our advice? Dont do that. It will get you so far, but its not a sustainable approach and eventually lead to us experiencing an outage.
The reason for this is that, as you are mutating the data, ClickHouse essentially waits until it merges the mutated table parts in a single, larger table part. The more mutations you run, the further behind ClickHouse gets and the slower queries can get. This process is very good for small numbers of large files, but not for large numbers of small files. It took us a long time to figure this out.
We also had some issues with [ClickHouses indexes](https://clickhouse.com/docs/en/sql-reference/statements/alter/index/). If you know exactly what data you need to pull and can specify the exact key and row then that can end up slower than other requests because ClickHouse builds sparse indexes which arent a true index of the entire dataset.
However, beyond this the experience of migrating to ClickHouse was incredibly positive — especially if you compare to the way we solved our original priorities:
- _Real-time results_: Our users want PostHog to stay as fast as it was in the beginning. ClickHouses performance improvements were such that, with a little cleverness in the way we architect, were still able to provide answers in real-time.
- _Future-proofing:_ We need to constantly grow our feature set. [ClickHouse SQL](https://clickhouse.com/docs/en/sql-reference/) was developed to resemble Postgres and MySQL syntax, which means weve been able to continue adding new features without ClickHouse becoming a bottleneck.
- _Filter on user-defined properties:_ We leverage ClickHouses JSON deserialization functionality and features such as [materialized columns](/blog/clickhouse-materialized-columns) to provide this, albeit at the cost of some CPU overhead.
## Making the switch
Normally, the process of switching to a new database model could be quite tricky. Luckily, however, wed just created [feature flags](/docs/feature-flags) within PostHog and were able to leverage this when switching.
First, we deployed ClickHouse in parallel to Postgres and began ingesting data simultaneously into both. We then put all of our features and internal tools behind a feature flag in PostHog, reimplemented our analytic queries one by one and then used the feature flags to switch over.
Once everything was migrated we were able to simply stop storing data on Postgres while our users began deploying PostHog through our [Helm Chart](https://github.com/PostHog/charts-clickhouse), which we use because of the amazing [ClickHouse-Operator made by Altinity](https://altinity.com/kubernetes-operator/). Easy.
## The future of ClickHouse on PostHog
Its fair to say that we bet the farm on ClickHouse, but its definitely turned out to be a good bet so far. ClickHouse has recently [raised a significant amount of funding](https://clickhouse.com/blog/en/2021/clickhouse-raises-250m-series-b/) to secure its long-term future and accelerate its product development, but more importantly, its created a better experience for PostHog users.
So, naturally were thinking about how to do even more with ClickHouse and build deeper connections with the technology.
One of the things were interested in exploring, for example, is the possibility of building functions at the source instead of writing queries for the functionality. Our hope is that adding these functions to a master code base will be more efficient and we wont need to rely on SQL to do so much heavy lifting. Were looking to [hire a C++ Engineer](https://posthog.com/careers) who can help us achieve this.
Were also currently experimenting with [MaterializedPostgresSQL engine](https://clickhouse.com/docs/en/engines/database-engines/materialized-postgresql/) as a way to consume the logical replication log coming from Postgres and materializes it in a mergetree table on ClickHouse. Data drift is a huge pain point for any near real-time data product and were hopeful we can find a way to guarantee to some extent the quality of data in ClickHouse compared to our source of truth in Postgres.
But that may be a topic for a future blogpost...
> PostHog is an open source developer platform that helps people build successful products. We help you debug and ship your product faster.
<NewsletterForm />