PostgreSQL vs ClickHouse: where to store the firehose logs without pain

Was this helpful?

Your logs are not “data.” They’re a moving liability: a firehose that shows up 24/7, demands retention, and only becomes valuable when production is already on fire. You don’t get to pause writes to “optimize later.” You get to pick a storage engine that fails in a way you can live with.

PostgreSQL and ClickHouse both work for logs—until they don’t. The trick is knowing which pain you’re signing up for: row-store bloat and vacuum debt, or merge debt and part explosions. Pick wrong and your on-call gets a new hobby: explaining to executives why “just logs” took down a primary database.

The decision frame: what you’re really choosing

When people say “store logs,” they often mean three different things:

  • Ingestion buffer: accept writes at high rate, even during bursts. Durability matters, latency mostly doesn’t.
  • Operational search: find “that one request id” or a specific error message in the last hour. Low latency matters. Indexing matters.
  • Analytics: group-bys over billions of rows, dashboards, anomaly detection, long retention, and heavy aggregations.

PostgreSQL is a general-purpose database with excellent correctness, rich indexing, and transactional semantics. It’s comfortable when you need strong constraints and point-lookups that must be right. It’s less comfortable when you ask it to retain and churn petabytes of mostly-append-only event data while also serving OLTP queries.

ClickHouse is a columnar analytics database optimized for fast scans and aggregations. It eats logs for breakfast—until you design your partitions like it’s Postgres, or you ignore merges, or you treat it like a strict transactional system. It’s not a worse Postgres; it’s a different animal.

So here’s the opinionated rule of thumb:

  • If logs are part of your transactional product (auditing, compliance trails with strict guarantees, or you join them constantly with OLTP tables), Postgres can be a sane default—with time partitioning and disciplined retention.
  • If logs are primarily for search + dashboards + aggregations, and the volume is “firehose,” ClickHouse is the default—with proper MergeTree design and operational guardrails.
  • If you’re trying to use Postgres as a log warehouse because “we already have it,” you’re not saving money; you’re borrowing pain at a variable interest rate.

One paraphrased idea from Werner Vogels (Amazon CTO): you build reliability by designing for failure, not by hoping it won’t happen. Logs are where this philosophy gets tested—because your system fails more often than your budget admits.

Interesting facts and historical context

  1. PostgreSQL’s lineage goes back to the POSTGRES research project at UC Berkeley in the 1980s, with a design bias toward correctness and extensibility.
  2. MVCC in Postgres means updates/deletes create new row versions; for log tables with churn (retention deletes), this becomes storage and vacuum work you must pay for.
  3. JSONB landed in Postgres 9.4 and made “semi-structured logs in a relational DB” feel easy—sometimes too easy, because it tempts teams to skip schema discipline.
  4. ClickHouse was built at Yandex to power real-time analytics at huge scale, which is exactly the shape of “logs-as-analytics.”
  5. Columnar storage isn’t new; column stores have been studied for decades. The modern win is combining compression, vectorized execution, and cheap disks.
  6. MergeTree engines in ClickHouse rely on background merges; ignoring merge health is like ignoring autovacuum in Postgres—eventually it collects its debt.
  7. TTL in ClickHouse can enforce retention at the storage layer. Postgres can do retention, but it’s usually an application-level or job-level practice (drop partitions, delete, vacuum).
  8. TimescaleDB (a Postgres extension) exists largely because time-series patterns stress vanilla Postgres; it’s a signal that “logs as time-series” is special.

Workload patterns that make or break each system

1) Burst writes with predictable keys

Ingestion isn’t about average throughput. It’s about surviving the 99th percentile: deploy storms, retries, cascading failures, and that one customer who turns debug logging on in production “just for an hour.” If you have bursts, you need:

  • Write path that doesn’t fall apart under contention.
  • Backpressure strategy (queue, buffering, batch inserts).
  • Storage layout that keeps writes sequential-ish, or at least amortized.

ClickHouse likes batch inserts and can absorb huge volumes efficiently. Postgres can ingest fast too, but you’ll hit WAL volume, index maintenance, and vacuum overhead earlier.

2) Needle-in-haystack debugging

Operators ask for: “Show me logs for request_id = X” and “Give me the last 200 lines around the error.” Postgres with B-tree indexes on request_id and timestamp can be extremely fast for this, especially if the working set is in memory.

ClickHouse can do it too, but it prefers to scan columns and filter. You can make it good with appropriate primary key ordering and skip indexes (data skipping), but you must design for it.

3) Aggregations and dashboards

If your daily reality is GROUP BY service, status, endpoint, 5-minute bucket across weeks of data, Postgres is fighting its nature. You can add materialized views, rollups, and partitions. It works—until you miss one maintenance window and end up with a backlog of vacuum/analyze and a disk that looks like it ate a duplicate of itself.

ClickHouse is built for this. It compresses well, scans fast, and parallelizes. The operational question becomes: can you keep merges healthy and parts under control?

4) Retention: delete vs drop vs TTL

Retention policy is not “we keep 30 days.” It’s: “How do we delete 29-day-old data without destroying the system?”

  • In Postgres, dropping partitions is clean and fast. Row-by-row deletes are expensive and create vacuum debt.
  • In ClickHouse, TTL can remove old data automatically, but it’s still physical work done by background processes. It’s usually cheaper than Postgres deletes, but it can still saturate disks if misconfigured.

PostgreSQL for logs: what it’s good at, what it punishes

When Postgres is the right call

  • You need strict transactions: logs are part of a business process (audit events, ledger-like trails).
  • You need relational joins across logs and OLTP entities, with consistent semantics.
  • Your volume is moderate: “a lot” but not “firehose,” or the retention window is short and partition drops are routine.
  • You need flexible indexing: B-tree, GIN on JSONB, trigram search, partial indexes.

What Postgres punishes you for

Postgres stores rows, not columns. For log analytics you often read a few fields across a huge time range. That becomes I/O heavy. And with MVCC, “deleting old logs” means dead tuples and vacuum work. Partitioning helps, but it’s not optional.

There’s also WAL. WAL is your friend—until your log table becomes the top contributor to WAL volume, replication lag, and backup churn.

How to make Postgres less painful for logs

  • Partition by time (daily or hourly depending on volume) and drop partitions for retention.
  • Minimize indexes on hot ingest tables. Every index is a write amplification tax.
  • Use JSONB carefully: store the frequently queried keys as typed columns; keep the rest in JSONB.
  • Separate concerns: don’t co-locate heavy log ingestion with critical OLTP on the same primary if you can avoid it.

Short joke #1: Treating Postgres like a log warehouse is like using a sports car to haul gravel—possible, but the suspension will file a complaint.

ClickHouse for logs: what it’s good at, what it punishes

When ClickHouse is the right call

  • Firehose ingestion with batch inserts (or a broker in front) and lots of append-only data.
  • Analytics-first queries: dashboards, aggregations, percentile latencies, top-N, cardinality estimates.
  • Long retention with compression and storage tiering.
  • Concurrency of many readers doing scans and group-bys.

What ClickHouse punishes you for

ClickHouse punishes you for thinking “partition key” means “the thing I filter on most.” In ClickHouse, partitioning is about managing parts and data lifecycle; ordering (primary key) is about query pruning and locality. Get it wrong and you’ll create a small-parts apocalypse where merges never catch up.

It also punishes you for pretending it’s a strict transactional system. Inserts are atomic per batch, but you don’t get the same semantics as Postgres. If you need “exactly once,” you build it above the database with idempotency and deduplication strategies.

How to make ClickHouse less painful for logs

  • Batch inserts (bigger blocks, fewer parts). If your pipeline inserts single rows, you’re paying a tax in parts and merges.
  • Choose ORDER BY for your most common filters (usually timestamp + service + something like host or trace id).
  • Keep partitions coarse (often by day) unless volume forces smaller. Too many partitions increases metadata and merge overhead.
  • Watch merges and parts like you watch CPU. Merge debt is operational debt.

Short joke #2: A ClickHouse cluster with uncontrolled parts is like a garage full of “organized piles”—technically sorted, emotionally devastating.

Schema and table design that survives reality

Log data is not fully schemaless

Pure schemaless logs are a fantasy. You always have a few fields you query constantly: timestamp, service, environment, severity, host, request_id/trace_id, status, and maybe user_id. Those fields should be typed columns in both Postgres and ClickHouse. Everything else can go into a JSON payload or a Map-like structure.

Postgres: partitioning strategy

Use declarative partitioning by time. Daily partitions are a good start. Hourly partitions can work for very high volume, but you’ll pay in table count and operational overhead. The winning move is making retention a DROP TABLE operation, not a DELETE operation.

ClickHouse: MergeTree strategy

For logs, a common pattern is:

  • PARTITION BY toDate(ts) (or toYYYYMM(ts) for very long retention)
  • ORDER BY (service, environment, toDateTime(ts), trace_id) or similar
  • TTL ts + INTERVAL 30 DAY

The right ORDER BY depends on your filters. If you always filter by service and time range, put service early. If you have a lot of point-lookups by request_id, include it—but don’t pretend it will behave like a unique index.

Compression and data types

In ClickHouse, picking appropriate types (LowCardinality for service names, Enum for log level, IPv4/IPv6 types, DateTime64 for timestamps) buys real compression and speed. In Postgres, careful normalization can help, but too much normalization can hurt ingestion. Logs are a place where “slightly denormalized but typed” often wins.

Practical tasks: commands, outputs, decisions

These are real checks you can run in production. Each one includes: command, what the output means, and what decision you make from it.

Task 1: Confirm Postgres table and index bloat

cr0x@server:~$ sudo -u postgres psql -d app -c "SELECT relname, n_live_tup, n_dead_tup, ROUND(100.0*n_dead_tup/GREATEST(n_live_tup,1),2) AS dead_pct FROM pg_stat_user_tables ORDER BY dead_pct DESC LIMIT 5;"
  relname       | n_live_tup | n_dead_tup | dead_pct
----------------+------------+------------+----------
logs_2025_12_28 |   98000000 |   34000000 |   34.69
logs_2025_12_27 |   99000000 |   21000000 |   21.21
...

Meaning: Dead tuples are storage and vacuum work. 35% dead in a log partition usually means deletes/updates are happening or autovacuum can’t keep up.

Decision: Stop deleting rows for retention; move to partition drops. If you already partition, investigate why tuples are dying (updates, or delayed vacuum) and tune autovacuum for the partitions.

Task 2: Check whether autovacuum is actually running on hot partitions

cr0x@server:~$ sudo -u postgres psql -d app -c "SELECT relname, last_autovacuum, last_autoanalyze FROM pg_stat_user_tables WHERE relname LIKE 'logs_%' ORDER BY last_autovacuum NULLS FIRST LIMIT 10;"
   relname      |     last_autovacuum     |     last_autoanalyze
---------------+--------------------------+--------------------------
logs_2025_12_28 |                          |
logs_2025_12_27 | 2025-12-30 08:14:02+00   | 2025-12-30 08:15:31+00
...

Meaning: NULL last_autovacuum on a huge active partition is a red flag: vacuum may be blocked, disabled, or starved.

Decision: If this is an ingest-heavy table, consider lowering autovacuum thresholds for that table, or rethink retention and indexing. If vacuum is blocked, find blockers.

Task 3: Find blocking sessions in Postgres

cr0x@server:~$ sudo -u postgres psql -d app -c "SELECT a.pid, a.state, a.query, pg_blocking_pids(a.pid) AS blockers FROM pg_stat_activity a WHERE a.datname='app' AND a.wait_event_type IS NOT NULL;"
 pid  | state  |                 query                  | blockers
------+--------+----------------------------------------+----------
18422 | active | VACUUM (ANALYZE) logs_2025_12_28;      | {17201}
...

Meaning: Vacuum is waiting on a lock held by another PID. If that PID is a long-running transaction, it can prevent cleanup.

Decision: Fix the app behavior (no long idle-in-transaction sessions), or kill offenders. Then re-check vacuum progress.

Task 4: Measure WAL pressure from log ingestion

cr0x@server:~$ sudo -u postgres psql -d app -c "SELECT pg_size_pretty(pg_wal_lsn_diff(pg_current_wal_lsn(),'0/0')) AS wal_bytes_since_boot;"
 wal_bytes_since_boot
----------------------
 812 GB

Meaning: This is a crude signal, but if WAL grows explosively during log spikes, it will impact replication and backups.

Decision: Consider moving logs off the primary OLTP cluster, reduce indexes, batch inserts, or switch to ClickHouse for analytics logs.

Task 5: Check replication lag (Postgres)

cr0x@server:~$ sudo -u postgres psql -d app -c "SELECT application_name, state, pg_size_pretty(pg_wal_lsn_diff(sent_lsn, replay_lsn)) AS lag FROM pg_stat_replication;"
 application_name |   state   |  lag
------------------+-----------+--------
 standby-a        | streaming | 14 GB

Meaning: 14 GB lag during a log burst usually means the replica can’t keep up with WAL generation or disk/network throughput.

Decision: If logs are the main contributor, decouple them. If not, investigate I/O saturation on replica and WAL receiver settings.

Task 6: Confirm Postgres partition count and size skew

cr0x@server:~$ sudo -u postgres psql -d app -c "SELECT inhparent::regclass AS parent, count(*) AS partitions FROM pg_inherits WHERE inhparent::regclass::text='logs';"
 parent | partitions
--------+------------
 logs   | 62

Meaning: Too many partitions can hurt planning and maintenance; too few can make retention expensive.

Decision: If partitions exceed what your operational cadence can manage, move from hourly to daily. If partitions are huge and deletes occur, go smaller and drop.

Task 7: Check ClickHouse ingestion health via system.parts

cr0x@server:~$ clickhouse-client --query "SELECT table, count() AS parts, sum(rows) AS rows FROM system.parts WHERE database='logs' AND active GROUP BY table ORDER BY parts DESC LIMIT 5;"
log_events	12480	9812234451
log_errors	2210	88233412

Meaning: 12k active parts for one table is often too many. Small parts imply too many small insert batches or too granular partitioning.

Decision: Increase insert batch size, adjust partitioning, and consider settings that reduce part churn. Also verify merge backlog next.

Task 8: Check ClickHouse merge backlog

cr0x@server:~$ clickhouse-client --query "SELECT database, table, sum(rows) AS rows_in_merges, sum(bytes_on_disk) AS bytes_in_merges FROM system.parts WHERE database='logs' AND active=0 GROUP BY database, table ORDER BY bytes_in_merges DESC LIMIT 5;"
logs	log_events	1823311220	412339922944

Meaning: Non-active parts represent parts being merged or old parts. Huge bytes “in merges” suggests merge pressure and potential query slowdowns.

Decision: If merges can’t keep up, reduce insert fragmentation, increase resources for background merges, or rework table design (order key, partition key).

Task 9: Check ClickHouse query hot spots

cr0x@server:~$ clickhouse-client --query "SELECT query_duration_ms, read_rows, read_bytes, result_rows, substring(query,1,120) AS q FROM system.query_log WHERE type='QueryFinish' ORDER BY read_bytes DESC LIMIT 5;"
8421	9122333441	188233992110	120	SELECT service, count() FROM log_events WHERE ts >= now()-INTERVAL 7 DAY GROUP BY service

Meaning: A single dashboard query reading 188 GB is not “a bit slow,” it’s a design review. Either the filter is too broad or the ORDER BY is wrong for the access pattern.

Decision: Add pre-aggregation, tighten filters, revise ORDER BY, or build materialized views for dashboards.

Task 10: Validate ClickHouse partition distribution

cr0x@server:~$ clickhouse-client --query "SELECT partition, count() AS parts, sum(rows) AS rows FROM system.parts WHERE database='logs' AND table='log_events' AND active GROUP BY partition ORDER BY parts DESC LIMIT 5;"
20251230	980	812233441
20251229	910	799334221

Meaning: Hundreds of parts per day partition is a smell. It usually means insert batches are too small or you have too many parallel writers.

Decision: Batch inserts and/or funnel writes through fewer concurrent insert streams per shard.

Task 11: Check disk I/O saturation on a Linux host

cr0x@server:~$ iostat -xm 1 3
Linux 6.5.0 (server) 	12/30/2025 	_x86_64_	(32 CPU)

Device            r/s     w/s   rMB/s   wMB/s  await  %util
nvme0n1         320.0   900.0   180.0   420.0   28.5  99.2

Meaning: 99% util and high await says you’re I/O bound. Both Postgres and ClickHouse will look “slow” for different reasons, but the root is disk.

Decision: Reduce write amplification (indexes/parts), add disks, separate WAL, move to faster storage, or tune merge/vacuum concurrency to avoid self-inflicted I/O storms.

Task 12: Check memory pressure and cache behavior

cr0x@server:~$ free -h
               total        used        free      shared  buff/cache   available
Mem:           256Gi       210Gi       3.1Gi       2.0Gi        43Gi        39Gi
Swap:           16Gi        12Gi       4.0Gi

Meaning: Heavy swap usage on a database host is a performance cliff. For Postgres it can wreck shared buffers effectiveness; for ClickHouse it can tank merges and query execution.

Decision: Reduce memory footprint (lower concurrency, adjust caches), add RAM, or isolate workloads. If you’re swapping during merges/vacuum, throttle background work.

Task 13: Inspect Postgres slow queries and whether indexes help

cr0x@server:~$ sudo -u postgres psql -d app -c "EXPLAIN (ANALYZE, BUFFERS) SELECT count(*) FROM logs_2025_12_28 WHERE ts >= now()-interval '1 hour' AND service='api';"
Aggregate  (cost=...)
  Buffers: shared hit=120 read=54000
  ->  Seq Scan on logs_2025_12_28 ...

Meaning: A sequential scan reading 54k buffers for a one-hour window suggests either the table is too big, the filter is not selective, or indexes are missing/unused due to poor stats.

Decision: Consider BRIN index on timestamp for big append-only partitions, or re-check partition pruning and stats. If analytics dominates, consider moving that workload to ClickHouse.

Task 14: Check ClickHouse table definition for ORDER BY/partition/TTL sanity

cr0x@server:~$ clickhouse-client --query "SHOW CREATE TABLE logs.log_events"
CREATE TABLE logs.log_events
(
    `ts` DateTime64(3),
    `service` LowCardinality(String),
    `env` LowCardinality(String),
    `level` LowCardinality(String),
    `trace_id` String,
    `message` String
)
ENGINE = MergeTree
PARTITION BY toDate(ts)
ORDER BY (service, env, ts)
TTL ts + INTERVAL 30 DAY
SETTINGS index_granularity = 8192

Meaning: This is generally sane for “service/env/time range” filters. If your top queries filter by trace_id only, this ORDER BY won’t help much.

Decision: Align ORDER BY with your real query predicates. If you need trace-id lookups, consider secondary data skipping indexes or a dedicated lookup store.

Fast diagnosis playbook

When the log store is “slow,” you don’t have time for ideology. You need a three-pass method to find the bottleneck without guessing.

First: Is it the host (CPU, memory, disk, network)?

  • Disk saturation: run iostat -xm 1. If %util is near 100% and await climbs, you’re I/O bound.
  • Memory pressure: run free -h and check swap. Swap usage on DB hosts is a performance confession.
  • CPU: run mpstat -P ALL 1 (not shown above). If CPU is pegged and iowait low, you’re compute bound (often compression, parsing, regex, or group-bys).
  • Network: on clusters, check if replication or distributed queries saturate NICs.

Decision: If the host is the limit, tune DB later; first stop self-harm (reduce merge/vacuum concurrency, reduce query concurrency, batch inserts), then add capacity or separate roles.

Second: Is it write path or read path?

  • Write path symptoms: ingestion lag, increasing queue depth, growing parts count (ClickHouse), rising WAL and replication lag (Postgres).
  • Read path symptoms: dashboards timing out, high read_bytes in query logs (ClickHouse), sequential scans and buffer reads (Postgres).

Decision: If writes are failing, prioritize batching, partition sanity, and background maintenance health. If reads are failing, prioritize schema/order keys, pre-aggregation, and query constraints.

Third: Identify maintenance debt (vacuum vs merges)

  • Postgres: dead tuples, long transactions blocking vacuum, autovacuum not running or not aggressive enough for log partitions.
  • ClickHouse: too many parts, merges not keeping up, disk busy with merge I/O, TTL cleanup piling up.

Decision: Maintenance debt doesn’t fix itself. You either change ingestion patterns (batching) or change lifecycle patterns (drop partitions / TTL), or you keep paying forever.

Three corporate mini-stories (all anonymized, painfully plausible)

Incident: the wrong assumption (“logs are append-only, so Postgres won’t bloat”)

A mid-sized SaaS company decided to store application logs in Postgres “temporarily.” The assumption was simple: logs are append-only, so MVCC overhead won’t matter. They used one big table with a timestamp index and a JSONB payload. Ingestion was fine for weeks. Dashboards were fine too—until retention kicked in.

Retention was implemented as a nightly DELETE FROM logs WHERE ts < now() - interval '30 days'. It ran for hours. Then for days. Autovacuum began chasing a moving target: dead tuples piling up faster than it could clean them. Disk usage climbed. Replication lag climbed. The primary started spending more time writing WAL and less time serving product queries.

On a Monday morning, a deploy increased log verbosity for a noisy component. WAL volume spiked. The replica fell behind enough that failover safety margins disappeared. Then the disk hit a threshold and the incident response turned into a scavenger hunt: delete old data, vacuum, hope it finishes, repeat.

The fix wasn’t heroic. They partitioned the logs table by day and changed retention to drop partitions. Autovacuum settings were tuned per partition. The “temporary” log system stopped threatening the OLTP database. The bigger lesson: “append-only” is a lie the moment you implement retention as delete.

Optimization that backfired: “Let’s partition ClickHouse by hour for faster deletes”

A different org moved logs into ClickHouse for analytics. Early performance was great. Someone noticed that TTL cleanup was occasionally spiky and decided to “help” by partitioning by hour instead of day, assuming smaller partitions would delete faster.

Ingestion came from many services, each inserting small batches. Hourly partitions multiplied the number of active partitions. Each partition accumulated parts. Parts multiplied into the tens of thousands per table. Merges started falling behind. Query latency rose because the system had to open and consider a huge number of parts, and background merges were saturating disk.

The team responded by increasing background merge threads. That made disks busier, which made queries worse, which made dashboards time out, which caused a rash of “retry” logic in clients. Now ingestion was spiky too, and the whole system developed a strobe-light rhythm: merge storm, query storm, retry storm.

They rolled back to daily partitions, increased insert batch sizes, and reduced the number of concurrent insert streams per shard. TTL became boring again. The moral: smaller partitions aren’t automatically cheaper—parts and merges are the real unit of pain.

Boring but correct practice that saved the day: “separate log storage, enforce budgets, and practice retention”

A financial services platform had strict audit requirements, but also huge operational logs. They refused to mix the two. Audit events lived in Postgres with tight constraints and careful schema. Operational logs went to ClickHouse, fed via a queue and a batcher.

The boring practice was a quarterly “retention drill.” Not a tabletop exercise. A real one: they verified that dropping Postgres partitions worked, that ClickHouse TTL removed the right ranges, and that backups and restores didn’t accidentally resurrect expired data. They also enforced a query budget: dashboards had limits on time range, and anything wider required a pre-aggregated table.

One day, a noisy deployment sent log volume soaring. The queue absorbed the burst. ClickHouse ingestion lagged but stayed stable. Operators had a clear SLO: “logs searchable within X minutes.” They met it by scaling ingestion workers temporarily—without touching the OLTP database.

The outcome wasn’t dramatic, which is the point. The correct practice was separation of concerns plus rehearsed lifecycle operations. When the firehose got louder, the architecture didn’t panic.

Common mistakes: symptoms → root cause → fix

1) Postgres disk keeps growing even after deleting old logs

Symptoms: Disk usage increases; deletes run nightly; vacuum runs constantly; queries slow down over time.

Root cause: MVCC dead tuples + retention deletes. Space isn’t reclaimed quickly; vacuum can’t keep up; bloat accumulates.

Fix: Partition by time and drop partitions. If you must delete, do it in small batches and vacuum aggressively, but treat it as a stopgap.

2) ClickHouse query latency spikes during ingestion bursts

Symptoms: Dashboards time out; disks show high util; parts count rises fast; merges backlog grows.

Root cause: Too many small inserts creating too many parts; merges consume I/O and CPU; read queries compete.

Fix: Batch inserts (larger blocks), reduce concurrent writers per table per shard, keep partitions coarse (daily), and monitor parts/merges as first-class metrics.

3) Postgres replication lag increases when logs spike

Symptoms: Streaming replica falls behind; failover unsafe; WAL disk grows; backups slow.

Root cause: WAL volume dominated by logs; index maintenance amplifies writes; replica I/O can’t keep up.

Fix: Move logs off the OLTP cluster, minimize indexes, batch inserts, or use ClickHouse for the bulk logs and keep only critical audit events in Postgres.

4) ClickHouse has “too many parts” warnings and won’t merge fast enough

Symptoms: Rising active parts; metadata overhead; slow queries; background merges constantly busy.

Root cause: Partitioning too fine (hourly/minutely), or tiny insert batches, or too many shards with uneven load.

Fix: Coarser partitions (day/month), fewer/larger insert batches, and re-evaluate sharding key to avoid skew.

5) “We can’t find specific request logs quickly in ClickHouse”

Symptoms: Point lookups by trace_id/request_id scan huge ranges; latency unpredictable.

Root cause: ORDER BY optimized for service/time analytics, not for id lookups; no skipping index suitable for that predicate.

Fix: Add a dedicated lookup structure: a smaller table keyed by trace_id with pointers (ts/service), or data skipping index; or keep last N hours in Postgres/Redis for needle searches.

6) Postgres planner stops pruning partitions correctly

Symptoms: Queries touch many partitions even with time filters; planning time increases; CPU spikes on coordinator.

Root cause: Non-sargable predicates (functions on ts), mismatched data types/time zones, or too many partitions.

Fix: Keep predicates simple (ts >= constant), standardize timestamp type, reduce partition count, and keep statistics current.

Checklists / step-by-step plan

Step-by-step: choosing Postgres vs ClickHouse for logs

  1. Classify your log use cases: operational search vs analytics vs audit trail. If it’s audit trail, Postgres stays in the picture.
  2. Write down your top 10 queries with real filters and time ranges. If most queries are group-bys over days, ClickHouse is favored.
  3. Quantify ingestion: steady rate, burst rate, average message size, peak concurrent writers.
  4. Set a retention and searchability SLO: “searchable within 5 minutes,” “retain 30 days hot, 180 days cold,” etc.
  5. Decide on lifecycle mechanism: Postgres partition drops or ClickHouse TTL. If you can’t operationalize retention, you will eventually store everything forever by accident.
  6. Pick a minimal schema: typed columns for common predicates; keep payload in JSON or string.
  7. Build a staging load test that includes retention and dashboard queries. Ingest is easy; ingest + retention + queries is where systems go to die.

Operational checklist: Postgres log storage (if you insist)

  • Partition by day; automate creation and dropping.
  • Prefer BRIN on timestamp for massive append-only partitions; keep B-tree indexes minimal.
  • Watch dead tuples and autovacuum lag per partition.
  • Keep logs off the same primary as critical OLTP if at all possible.
  • Establish a hard cap on query time range for ad-hoc analytics; push heavy analytics elsewhere.

Operational checklist: ClickHouse log storage (recommended default for firehose)

  • Batch inserts; enforce minimum batch size and maximum insert frequency.
  • Use daily partitions unless you have a strong reason not to.
  • Set ORDER BY aligned with your most common filters; don’t treat it like a unique index.
  • Monitor: active parts, bytes in merges, query read_bytes, disk util, and insert errors.
  • Define TTL and validate it deletes what you think it deletes.
  • Define limits: max query time range on dashboards; use rollups/materialized views for long windows.

Migration plan: Postgres logs → ClickHouse without drama

  1. Dual-write or replay: start sending logs to ClickHouse while keeping Postgres as-is.
  2. Backfill recent partitions first (last 7–30 days). Don’t backfill the whole history before you prove the new system works.
  3. Validate queries: compare counts and key aggregations. Expect small differences if you had duplicates or late events; decide how to handle them.
  4. Move dashboards to ClickHouse, keep Postgres for point-lookups if needed.
  5. Cut retention in Postgres to a shorter window; drop old partitions.
  6. Keep a rollback story: if ClickHouse ingestion fails, logs still land in a queue and can replay.

FAQ

1) Can Postgres handle “a lot” of logs?

Yes, if “a lot” still fits the operational model: time partitions, drop-based retention, minimal indexes, and you’re not using it as your analytics warehouse. If you need multi-week group-bys across billions of rows, you’re asking for ClickHouse behavior from a system optimized for different guarantees.

2) Can ClickHouse replace Elasticsearch for logs?

Often, for analytics and structured filtering, yes. For full-text search across arbitrary strings with relevance ranking and fuzzy matching, ClickHouse can do some of it but it’s not its core personality. Many teams use ClickHouse for metrics-like log analytics and keep a separate search tier for “grep at scale” use cases.

3) What about JSON logs—Postgres JSONB vs ClickHouse Map/String?

Postgres JSONB is great for flexible indexing and containment queries, but it’s expensive at scale if you index too much. ClickHouse usually works best when the frequently queried fields are typed columns and the rest is a raw JSON string or a Map-like structure, depending on your ingestion tooling. The winning pattern in both: promote hot keys to columns.

4) How do I enforce retention safely?

In Postgres: drop partitions, don’t delete rows. In ClickHouse: TTL is usually the right lever, but validate it and watch merge pressure. Also rehearse retention like it’s a production feature—because it is.

5) What if I need “exactly once” ingestion?

Neither system gives you end-to-end exactly once without help. Postgres can enforce uniqueness constraints, but that can be expensive on high-volume logs. ClickHouse is typically “at least once” friendly; you implement idempotency upstream or use dedup strategies (like storing an event id and deduplicating in queries or via specialized engines/settings).

6) Should I shard ClickHouse early?

Shard when a single node can’t meet your ingestion or query needs with headroom. Sharding adds operational complexity: rebalancing, distributed queries, and failure modes. Start with a single node or a small replicated setup if you can, but don’t wait until you’re already out of disk.

7) Is TimescaleDB the middle ground?

It can be, especially if you’re committed to Postgres tooling and need time-series optimizations. But it’s still Postgres underneath: retention and heavy analytics remain a cost center. If your core is log analytics at scale, ClickHouse is still the more direct fit.

8) What’s the biggest “gotcha” people miss with ClickHouse?

Parts and merges. If you insert tiny batches, you create tiny parts. Tiny parts create merge debt. Merge debt becomes disk saturation and query latency. Design and operate around this from day one.

9) What’s the biggest “gotcha” people miss with Postgres?

Deletes aren’t free. MVCC means churn creates dead tuples, and vacuum is real work competing with your workload. If retention is row deletes on massive tables, you’ve already chosen a future incident.

Next steps you can actually do this week

  • Write down your top 10 log queries and classify them: point lookup, short-window search, or long-window analytics. That list decides the database more than opinions do.
  • If you’re on Postgres: implement time partitioning and switch retention to partition drops. Then measure dead tuple percentage again.
  • If you’re on ClickHouse: check active parts and merge backlog today. If parts are exploding, fix batching before you add hardware.
  • Set operational limits: dashboards shouldn’t default to 30 days if your cluster is sized for 24 hours. Make “expensive” queries explicit.
  • Separate audit logs from operational logs: keep the former correct and relational; make the latter fast and cheap to aggregate.

The correct answer is rarely “Postgres or ClickHouse.” It’s usually “Postgres for what must be correct and relational, ClickHouse for the firehose.” If you try to make one system be both, you’ll eventually get neither—just a lot of graphs and a calendar full of incident reviews.

← Previous
Sony’s Rootkit: When a Music CD Behaved Like Malware
Next →
Liquid cooling GPUs: smart move or expensive cosplay?

Leave a comment