Incremental Materialized View: How to Keep Derived State Fresh in Real Time
Standard materialized views recompute everything on a schedule. Incremental materialized views apply only the delta — bounding refresh cost to what actually changed. Here's how they work across Postgres, ClickHouse, Databricks, streaming databases, and Tacnode, what each approach can and can't do, and where they all hit the same structural limit.
TL;DR: Standard materialized views recompute everything on a schedule — fine for dashboards, dangerous for decisions. Views that support incremental refresh bound the refresh cost to what actually changed. Postgres doesn't support incremental refresh natively (pg_ivm is experimental). ClickHouse has two flavors with different consistency tradeoffs. Databricks and BigQuery do incremental refresh on batch cadences over delta tables. Streaming databases (Materialize, RisingWave) do true continuous maintenance. Tacnode does Postgres-compatible incremental refresh at row-level granularity with the full PostgreSQL query surface and every DML operation propagating incrementally. They all close the preparation gap. Only a context lake also closes the retrieval gap — serving fresh derived state alongside raw state in one consistent snapshot under concurrent load.
You define a materialized view — a velocity counter, a rolling balance, an exposure aggregate — and everything works. Then load increases. Transactions arrive concurrently. Your `REFRESH MATERIALIZED VIEW` runs every five minutes, and in those five minutes, a fraud check reads a velocity counter that's three transactions behind. The fraud passes. The funds move. By the time the view refreshes, the window has closed.
This is the preparation gap: derived state that must be computed from raw events lags behind the events themselves. The materialized view was correct when it was last refreshed. It's wrong now.
The fix is incremental refresh — applying only the delta as source data changes, rather than recomputing the entire result on a schedule. The concept is simple. The implementations vary wildly.
What Is an Incremental Materialized View?
A materialized view is a precomputed query result stored as a table. You define it as a SQL query — an aggregation, a join, a window function — and the database stores the output. Queries hit the stored result instead of recomputing from the underlying data every time.
The problem is how that stored result stays current as source data changes.
In a standard materialized view, a refresh operation reruns the entire query against the base tables and replaces the stored result. This default refresh behavior is fine when source data changes slowly and staleness is measured in hours. It breaks when the derived state is consumed by automated systems making real-time decisions — fraud velocity counters, credit exposure aggregates, session-bound eligibility checks — where "five minutes stale" means wrong.
An incremental materialized view maintains the stored result by applying only the delta. When source data changes, the view updates only the rows of the result that depend on the change — incrementing counters, adjusting sums, recomputing only the affected output rows. No full rescan. The cost is proportional to the volume of changed data, not the entire dataset. Different systems implement this differently — some refresh continuously as events arrive, others apply the delta when you call `REFRESH` — but the key property is the same: cost tracks the change, not the table size.
This distinction matters at scale. If your source table has 100 million rows and 50 new transactions arrive, a full refresh rereads 100 million rows. An incremental refresh processes only the changed data — the view is incrementally refreshed rather than rebuilt from scratch. Under high write throughput, the difference between these two approaches is the difference between a view that's current and a view that's perpetually catching up.
Why Incremental Refresh Matters for Real-Time Systems
Materialized views were originally designed for data warehouse workloads and analytical dashboards — precompute expensive aggregations so the dashboard loads fast. Staleness measured in hours was acceptable because a human was reading the result on a Monday morning.
That contract has changed. Today, materialized views are consumed by automated decision systems that act on derived state in real time. The default refresh that runs overnight in a data warehouse is too slow when decisions happen in milliseconds:
Fraud velocity counters — transaction count per account over the last 60 seconds, evaluated during every card authorization
- Credit exposure aggregates — total outstanding balance per borrower, checked before every new approval
- Portfolio positions — current holdings computed from trade history, queried immediately after every execution
- Session-bound eligibility — real-time qualification status during checkout, based on aggregated behavior within the session
- Risk signals — derived features combining multiple event streams into a single score, consumed by ML models at inference time
In every case, the decision reads derived state and acts on it — often irreversibly. A stale velocity counter means fraud passes. A stale exposure aggregate means a credit limit gets exceeded. A stale portfolio position means the user sees a balance that doesn't reflect what just happened.
The failure mode isn't a slow dashboard. It's a wrong decision. The underlying data changed, but the materialized view refreshes hadn't completed before the decision evaluated it.
This is what makes incremental refresh an infrastructure requirement rather than a performance optimization. When derived state feeds automated decisions, the freshness of that state is a correctness property, not a latency metric. You need to optimize materialized views for decision-time correctness, not just query performance.
Incremental Materialized Views in PostgreSQL
PostgreSQL has native materialized views — but does not support incremental refresh. `CREATE MATERIALIZED VIEW` stores the result of a query. `REFRESH MATERIALIZED VIEW` reruns the query against the base tables and replaces the stored result. Every refresh recomputes from the full source table, regardless of how little changed data there actually is.
`REFRESH MATERIALIZED VIEW CONCURRENTLY` is the closest native approximation. It computes the new result alongside the old one and diffs them, so reads aren't blocked during refresh. But it still runs the full query — the cost is proportional to source table size, not changed data volume. And it requires a unique index on the view, which limits the query structure you can use.
pg_ivm: Experimental incremental refresh support
pg_ivm is a PostgreSQL extension that adds incremental refresh support to PostgreSQL. It hooks into the write path: when source table data changes, pg_ivm incrementally refreshes the view by updating only the affected rows. No full rescan. The cost tracks with the volume of changed data.
pg_ivm supports a subset of SQL — aggregate functions (`SUM`, `COUNT`, `AVG`), `GROUP BY`, joins, and subqueries. It doesn't support all window functions or common table expressions. It's a real implementation of incremental refresh, but it's an extension, not core PostgreSQL, and it's early-stage for production workloads at scale.
The structural limit: Even with pg_ivm, PostgreSQL's materialized views are maintained within a single Postgres instance. At high concurrent write throughput — hundreds or thousands of transactions per second modifying shared state — the incremental refresh competes with transactional writes for the same resources. Postgres was designed for transactional workloads and operational databases, not for maintaining continuously fresh derived state at scale alongside those transactions.
Bottom line: PostgreSQL materialized views work well for analytical precomputation on a default refresh schedule. For derived state consumed by real-time decisions under concurrent load, the native tooling no longer supports incremental refresh (pg_ivm is promising but unproven at scale).
Incremental Materialized Views in ClickHouse
ClickHouse has two distinct materialized view mechanisms, and the terminology can be confusing because both are called "materialized views" but the refresh semantics differ significantly.
Incremental materialized views (the default `CREATE MATERIALIZED VIEW`) are triggered on insert. When new rows land in a source table, ClickHouse runs the view's query against only the new rows and appends results to a destination table. The incremental refresh processes changes from each insert batch, not the full source table.
The catch: these views only see the new data, not the full source. An aggregate query like `SUM` doesn't recompute the global sum — it computes the sum of the new batch. To get correct global aggregates, you pair this with the `AggregatingMergeTree` engine, which merges partial aggregates at query time. This means the view's stored result is not always immediately correct — it becomes correct after background merges complete. Under high write throughput, batch query results can temporarily reflect partial state.
Refreshable materialized views (`CREATE MATERIALIZED VIEW ... REFRESH EVERY`) are the full-recompute model — similar to PostgreSQL's `REFRESH MATERIALIZED VIEW`, run on a configurable schedule. Simpler mental model, but the same scalability problem: cost is proportional to source table size, not the volume of changed data.
The structural limit: ClickHouse incremental materialized views are eventually consistent. The `AggregatingMergeTree` merge process is asynchronous — reads may return partially merged results. There's no snapshot isolation guarantee: a batch query might see some recent inserts reflected in the aggregate but not others. For dashboards and a data warehouse, this is fine. For automated decisions where correctness at the moment of evaluation matters — a fraud check, a credit limit enforcement — "eventually correct after merges" is a gap.
ClickHouse also fans out non-primary-key queries to all shards. At high QPS on filtered queries, adding nodes adds work per query, not capacity. This limits throughput for the concurrent, filtered query patterns that real-time decision systems require.
Bottom line: ClickHouse incremental materialized views are excellent for near-real-time analytics and dashboards where eventual consistency is acceptable. The system doesn't strictly require incremental refresh to be strongly consistent — it trades consistency for write throughput. For decision-time derived state that must be correct when read under concurrent writes, the merge-based model leaves a gap.
Incremental Materialized Views in Databricks and BigQuery
Databricks and BigQuery both support incremental refresh for materialized views — but on batch cadences, not continuous streams. The refresh semantics are fundamentally batch query oriented.
Databricks performs incremental refresh over delta tables using Unity Catalog managed tables. The system tracks which source table data has changed since the last refresh and processes only the delta. An optimal refresh policy depends on your workload — Databricks can do incremental or full refresh depending on the query structure. This is more efficient than a full recompute, but the cadence is scheduled — not continuous as data arrives. The minimum practical interval is minutes, not milliseconds.
BigQuery materialized views support automatic incremental maintenance for a subset of query patterns (single-table aggregations, some joins). When base tables receive fresh data, BigQuery incrementally refreshes the view. The refresh is automatic but not instantaneous — there's a propagation delay, and the view may temporarily store query results that are stale during high-volume writes. The automatic caching behaviors differ from explicit refresh — BigQuery may serve cached results rather than the freshest underlying data.
The structural limit: Both systems are designed for data warehouse and analytical workloads where "minutes fresh" is a major improvement over "hours fresh." Neither targets the sub-second freshness that real-time decision systems require. Incremental maintenance reduces compute cost — you're not rescanning all rows on every refresh — but the cadence is batch, not streaming.
For fraud velocity counters, credit exposure checks, or any derived state consumed at transaction time, a batch-cadence materialized view — even one with incremental maintenance — is still a scheduled snapshot, not a live signal.
Incremental Materialized Views in Streaming Databases
Streaming databases — Materialize, RisingWave, and to some extent ksqlDB — are built specifically around incremental materialized views as the core primitive. These systems offer the best support for continuous maintenance available today.
You define a view as a SQL query. The streaming database maintains it continuously as events arrive from Kafka or CDC sources. Every new event triggers an update — the view's stored result reflects the change immediately. Queries return the pre-maintained result in milliseconds. No scheduled refresh. No full recompute.
Materialize, built on Timely Dataflow and Differential Dataflow, provides particularly strong refresh semantics: materialized views reflect a consistent point in time, and the system can compose views on top of views — enabling multi-stage derivation pipelines expressed entirely in SQL. RisingWave takes a similar approach with cloud object storage as the persistence layer.
For the preparation gap — keeping derived state current as source table data changes — streaming databases are the strongest available solution. They replace the Kafka → Flink → Redis pipeline with a single system that ingests events and maintains derived state continuously, queryable via standard SQL.
The structural limit: Streaming databases sit downstream of your source systems. They ingest from queried data sources via Kafka or CDC, which means there's a pipeline — source database → WAL → Kafka → streaming database → materialized view. Each hop adds latency. Under high throughput, consumer lag compounds with view maintenance lag.
More fundamentally, streaming databases solve the preparation gap but not the retrieval gap. If a decision needs derived state (the velocity counter, incrementally refreshed in the streaming database) and raw state (the account record from Postgres) and a similarity match (from a vector store), those three reads come from three systems at three different points in time. The derived state is fresh. The decision context is still inconsistent — assembled from multiple independent snapshots that never coexisted.
For workloads where the only problem is stale derived state, streaming databases are the right category. For workloads where derived state must be served alongside raw state and other retrieval patterns in one consistent snapshot, the streaming database closes half the gap.
Comparison: Incremental Refresh Support Across Systems
How each system handles incremental refresh, the refresh behavior and semantics, and where the tradeoffs land:
System
Supports Incremental Refresh?
Granularity
Query Surface
DML
Freshness
Best For
PostgreSQL (native)
No — full refresh overwrites everything
—
Full Postgres
—
Minutes to hours (scheduled)
Analytical precomputation
PostgreSQL (pg_ivm)
Yes — on write
Row-level
Subset (basic aggregates, joins, subqueries)
Insert primarily; updates/deletes limited
Sub-second (on commit)
Low-to-moderate write throughput
ClickHouse (incremental MV)
Yes — on insert batch
Per insert batch
Limited (joins constrained, no full window functions)
Insert-only (append model)
Seconds (after merge)
Near-real-time analytics
ClickHouse (refreshable MV)
No — full refresh on schedule
—
Full ClickHouse SQL
Insert
Minutes (scheduled)
Scheduled analytical rollups
Databricks
Yes — over delta tables
Per partition (Delta)
Subset (some joins, certain windows)
Insert/update via Delta
Minutes (scheduled)
Data warehouse batch pipelines
BigQuery
Yes — automatic
Per partition
Limited (single-table aggregations, some joins)
Insert primarily
Minutes (propagation delay)
Cloud data warehouse workloads
Materialize
Yes — continuous
Row-level
Near-full PostgreSQL
Full DML
Sub-second (event-driven)
Streaming-first derived state
RisingWave
Yes — continuous
Row-level
Wide subset of PostgreSQL
Full DML
Sub-second (event-driven)
Cloud-native streaming workloads
Tacnode
Yes — on REFRESH, computed at row granularity
Row-level
Near-full PostgreSQL (joins, window fns, CTEs, subqueries, set ops)
Full DML — insert, update, delete, truncate all propagate incrementally
Sub-second on REFRESH
Real-time decisions on derived + raw state under concurrent load
The Deeper Problem: Fresh Views, Fragmented Context
Every system in the table above addresses the same question: how do you keep a precomputed query result current as source tables change? And every system — from Postgres to Materialize — answers it within its own boundary.
That's the part nobody talks about.
An incremental materialized view keeps derived state fresh inside the system that maintains it. The velocity counter is incrementally refreshed in the streaming database. The aggregated balance is current in ClickHouse. The feature vector is current in the feature store. Each system's refresh operation keeps its own materialized view source tables in sync.
But the decision doesn't read from one system. A fraud check evaluates a velocity counter (derived), an account balance (raw data from Postgres), a device fingerprint (raw), and a behavioral similarity match (vector). If the velocity counter lives in Materialize, the account balance lives in Postgres, and the similarity match lives in Pinecone, the decision is reading from three queried data sources at three different points in time.
Each individual piece of context may be fresh. The composite — everything the decision evaluates together — never existed as a consistent whole. The velocity counter reflects events through timestamp T₁. The account balance reflects state at T₂. The similarity match was computed at T₃. The decision treats them as simultaneous. They aren't.
This is the retrieval gap: the context a decision needs requires multiple retrieval patterns — point lookups, aggregations, similarity search — split across systems with non-overlapping query capabilities. Each system is internally consistent. The cross-system composite is not.
Incremental materialized views solve the preparation gap — they keep derived state current as source table data changes. They don't solve the retrieval gap. And for automated decisions that act on derived state alongside raw data, the retrieval gap is where the wrong decisions happen.
Incremental Materialized Views in Tacnode
Tacnode is a PostgreSQL-compatible context lake — a single system that ingests operational state via CDC, holds raw and derived data together, and serves every retrieval pattern (point lookups, aggregations, full-text search, vector similarity) from one consistent snapshot.
Incremental materialized views are a single option flag on top of standard Postgres syntax:
sql
CREATE MATERIALIZED VIEW sales_mv
WITH (refresh_mode='incremental') AS
SELECT
product_name,
SUM(quantity) AS total_quantity
FROM order_details
GROUP BY product_name;
Three properties separate Tacnode's incremental refresh from other systems that claim it:
Row-level granularity. Most engines that support incremental refresh recompute at the partition level — change one row, the whole partition gets recomputed. Tacnode goes one level finer. The unit of work is a single row. Change one record, only the result rows that depend on it get recomputed. The smallest possible delta.
Near-full PostgreSQL query surface. Filter and project, all standard aggregates, inner / outer / self joins, set operations, subqueries, CTEs, window functions — all maintained incrementally. Most systems with incremental refresh narrow the surface (ClickHouse joins are constrained, BigQuery limits to single-table aggregations, pg_ivm doesn't support window functions). Tacnode supports the patterns you'd write for any read query, and they all work as incremental views.
Full DML coverage. Insert, update, delete, even `TRUNCATE` — every DML operation propagates incrementally at row granularity. Most incremental engines are append-only — change a row, they fall back to a full refresh. Tacnode handles the entire DML surface without falling back.
How refresh works. Tacnode's incremental refresh isn't continuous in the streaming-database sense — you still call `REFRESH MATERIALIZED VIEW` to bring the result current. What changes is the cost of that refresh. Instead of re-scanning the full base table, the engine computes only what changed and applies that delta. The refresh transaction commits the incremental update; the system catalog `pg_stat_refresh_history` records whether each refresh ran incrementally or fell back to a full recompute. Refresh cost is bound by what changed, not by the size of the table.
Beyond the preparation gap. Most of the systems above keep their own materialized views fresh. They don't help when a decision reads derived state from one system, raw state from another, and a similarity match from a third — the retrieval gap. Because Tacnode holds raw records, derived views, and vector indexes in the same system, a fraud check that needs a velocity counter, an account balance, and a behavioral similarity match reads all three from one consistent snapshot. Each derived feature can also be configured with its own refresh policy — kept second-fresh when the business depends on it, allowed to lag when it doesn't.
If your materialized views feed dashboards or a data warehouse, the systems above work well — pick the one that fits your infrastructure. If your materialized views feed automated decisions that act on derived state alongside raw state under concurrent load, the question isn't which system refreshes fastest. It's whether the view and the rest of the context the decision needs are served from the same moment in time.
How to Optimize Materialized Views for Incremental Refresh
Regardless of which system you use, there are data source considerations and query plan choices that determine whether a view can be incrementally refreshed — or whether the system falls back to a full recompute.
Base tables and dimension tables matter. Views work best when base tables are append-heavy (new events, new transactions) rather than heavily updated in place. If your source table receives mostly inserts, the incremental refresh processes changes efficiently. If rows are frequently updated or deleted, some systems must fall back to full refresh semantics because the delta is harder to compute.
Query structure determines refresh behavior. Not every query supports incremental refresh. Systems like Databricks and BigQuery may fall back to full refresh if the query plan includes certain patterns — non-deterministic functions, certain outer joins, or expressions that the optimizer can't decompose into incremental refresh select expressions. When the create operation fails to produce an incremental plan, the system silently defaults to full recomputation. Check your system's documentation for explicitly controlled refresh behavior versus automatic fallback.
Refresh policy controls freshness guarantees. In batch systems, the policy controls how frequently materialized view refreshes occur — query periodically or on a schedule. A serverless pipeline automatically manages refresh attempts in BigQuery, but the cadence isn't sub-second. In streaming databases, the refresh is continuous by default, but you can still configure how the system handles late-arriving data and whether to retain data history or archive data from older windows.
Monitor for operations that silently become full refreshes. Some systems fall back from incremental to full refresh under certain conditions — schema changes, data retention threshold breaches, or when the view definition changes. When a system no longer supports incremental refresh for a given view, the refresh overwrites the stored result with a full recompute, and latency spikes. Store query results metadata and monitor to catch these regressions early.