Postgres is great until it isn’t
For most of my career I’ve reached for Postgres first and asked questions later. It’s reliable, it’s everywhere, the docs are great, and there’s an extension for almost anything I’d want to do. For app data, it’s still my default.
The more I worked with telemetry and event data though, the more I felt Postgres push back. Queries that should’ve been fast got slow. Disk usage exploded. Adding indexes helped a little, until it didn’t. At some point I realized I wasn’t writing bad SQL, I was just using the wrong tool.
That’s when ClickHouse clicked for me.
Row vs column, in plain terms
Postgres stores data row by row. Every row sits together on disk, with B-tree indexes pointing into it, and MVCC wrapped around the whole thing for safe concurrent writes. It’s an OLTP database, built for lots of small transactions, point lookups, single-row updates, and strong ACID guarantees. If you’re running a checkout flow or a SaaS backend, you want this.
ClickHouse stores data column by column. Every field is its own file on disk, so when you run an aggregation you only touch the columns you care about instead of dragging entire rows off disk. It’s OLAP, built for scanning huge tables on a few columns at a time.
You can’t really tune one engine to be great at both. The disk layout pushes them in opposite directions.
When I reach for each
Postgres is still my answer for anything that needs ACID or row-level locking. Payments, orders, user records, app state. If the unit of work is a single record at a time, Postgres almost always wins.
ClickHouse is what I reach for when the data starts looking like events. Logs, metrics, traces, product analytics, anything time-series, anything where I’m running aggregations over a lot of rows. Once you have a few billion rows and you just want to GROUP BY something, you start noticing the difference pretty quickly.
ClickHouse for observability
Observability is probably the use case I keep coming back to with ClickHouse. Telemetry tends to be a lot of mostly append-only writes, with reads that look like aggregations and filters over short time windows, which lines up pretty well with how columnar storage works.
A few numbers that surprised me when I first dug in:
- Telemetry compresses really well in ClickHouse. Timestamps with Delta encoding routinely hit 10x to 20x, and low-cardinality fields like service name or HTTP status can land in the 50x to 100x range. The compression guide goes deeper on the codecs if you’re curious.
- ClickHouse’s own observability platform has scaled past 100 PB across ~500 trillion rows, up from the 19 PiB / 1.13 PiB compressed they were running a year earlier.
- A 10 billion row GROUP BY finishes in under 2 seconds on 64 cores, and newer benchmarks push 100B-row aggregations into the sub-second range on larger clusters.
- Most teams I’ve seen migrate off Datadog or Elastic at scale report 90%+ savings. ClickHouse’s own team puts the number at roughly 200x cheaper than Datadog for their workload, and that gap hasn’t really narrowed in 2026 even with Datadog’s newer Flex Logs tier.
The other thing I’ve found useful is how ClickHouse handles high cardinality. Prometheus and most TSDBs I’ve worked with start to struggle when you add labels like user IDs, pod names, or request IDs. ClickHouse handles them with sparse primary indexes, LowCardinality dictionary encoding, and materialized views you can use to pre-aggregate the hot paths. There’s still a tradeoff: Prometheus pays most of the cardinality cost at write/index time, while ClickHouse defers more of it to query time. Neither is free, it’s just a question of where you’d rather absorb it.
A quick word on OpenTelemetry
If OpenTelemetry is new to you, it’s worth ten minutes to understand the basics.
OTel is a vendor-neutral spec for how you instrument and ship telemetry. There are three pieces that matter:
- The spec itself (OTLP is the wire format).
- SDKs for basically every language you’d want to use.
- The OTel Collector, which sits in front of your backend and receives, processes, and forwards telemetry.
It covers three signals: traces (the path of a request through your services), metrics (numeric time series), and logs (structured events). They share context, so once you have all three you can jump between a slow trace, the metric that spiked at the same time, and the log line that explains why.
The part I like most is that you instrument your code once. If you ever want to switch backends later (off Datadog, out of Splunk, onto ClickHouse), you change the exporter, not your application code.
Pointing OTel at ClickHouse
The OTel Collector Contrib distribution ships a ClickHouse exporter that writes logs, traces, and metrics straight into ClickHouse and can auto-create the schema for you (though for production they recommend turning that off and managing schemas yourself). Logs and traces are in beta, metrics is still alpha as of mid-2026, so worth knowing if you’re putting it in front of production.
This is the same shape a lot of the newer observability tools are built on: SigNoz, Uptrace, HyperDX, and ClickStack (ClickHouse’s own UI on top of the HyperDX acquisition). Dash0 and monday.com have both posted publicly about going the same route.
Where ClickHouse pays off for testing
The thing I didn’t expect to love about ClickHouse is what it unlocks in test environments.
Load testing or shadow testing against real production traffic has usually been painful for me, mostly because the analytics store I’m pointing it at can’t keep up with both the ingest rate and the query load at the same time. ClickHouse has held up well for me in that spot.
A pattern I’ve been using: dual-write events to my existing analytics store and to ClickHouse, then run the same queries on both and diff the results. It’s a low-risk way to validate a migration without flipping the switch blind. ClickStack’s demo environment does a fun version of this where it replays around 40 hours of real OTel telemetry every day with timestamps shifted into the current window, so the test environment always looks like a live system.
Where ClickHouse isn’t the answer
ClickHouse isn’t a Postgres replacement. Per-row updates are awkward, the transactional guarantees aren’t really there, and replication is eventually consistent. I wouldn’t put a checkout flow on it.
It’s also worth being honest that running your own ClickHouse cluster isn’t free in operational terms. Self-hosted setups need real SRE time for tuning, sharding, upgrades, and capacity planning. If you don’t have anyone to own that, Datadog (or Managed ClickStack on ClickHouse Cloud, which gives you the ClickHouse story without the operational lift) is probably the saner starting point. The 200x-cheaper number only kicks in once your Datadog bill is bigger than the people-cost of operating the stack yourself.
That’s kind of the whole reason I wanted to write this. Postgres is good at what it was built for, ClickHouse is good at what it was built for, and the thing I keep noticing is how often one of them gets stretched into doing the other’s job until it breaks.
Use the right tool for the job, future you will thank you.
PostgreSQL
ClickHouse
OpenTelemetry