Real-time analytics used to require a Databricks contract, a data engineering team, and a six-figure annual budget. That is no longer true. The open-source tooling has matured to the point where a small team can build a production-grade real-time analytics stack for under $500 per month.
Here is exactly how we built ours, with the specific services, configurations, and costs that got us there.
The Architecture
Our stack has four layers:
- Ingestion: Kafka (Confluent Cloud, 1 CKU) — $200/mo
- Storage + Query: ClickHouse Cloud (Development tier) — $120/mo
- Transformation: dbt Cloud (Team plan) — $100/mo
- Visualisation: Metabase (self-hosted on a $20 Hetzner VPS) — $20/mo
Total: $440/mo. This handles 50 million events per day with sub-second query latency on aggregations across 90-day windows.
Why ClickHouse?
ClickHouse is a column-oriented database optimised for analytical queries. It compresses time-series data extremely well, executes aggregations across billions of rows in milliseconds, and has excellent Kafka integration via the Kafka table engine.
-- Example: ClickHouse Kafka table engine
CREATE TABLE events_queue (
event_id UUID,
user_id UInt64,
event String,
ts DateTime
) ENGINE = Kafka
SETTINGS kafka_broker_list = 'kafka:9092',
kafka_topic_list = 'events',
kafka_group_name = 'clickhouse',
kafka_format = 'JSONEachRow';
dbt for Real-Time Transformations
dbt is not just for batch transformations. Using dbt + ClickHouse materialised views, you can maintain pre-aggregated tables that update in near-real-time as new events land, giving you fast dashboard queries without the cost of re-scanning raw event data.
What This Stack Cannot Do
Be honest about the limitations. This architecture is not suitable for millisecond-latency use cases (trading, fraud detection), datasets over ~1TB without moving to a larger ClickHouse tier, or complex ML feature pipelines. For those use cases, you need Flink or Spark, and the budget to match.