Can one database handle both OLTP and OLAP workloads?

HTAP (Hybrid Transactional/Analytical Processing) databases like TiDB and SingleStore attempt this. They handle both workload types but face limitations for automated decision workloads that need all context from one consistent snapshot at millisecond latency. HTAP solves the pipeline problem but not the decision-time consistency problem. For automated decisions, a Context Lake provides the architectural properties that HTAP databases don't: incrementally maintained derived context, native vector search, and serializable consistency across all retrieval patterns in a single boundary.

What is the Composition Impossibility Theorem?

Formally proved in arXiv:2601.17019, the Composition Impossibility Theorem states that composing separate systems — each maintaining its own state on its own propagation timeline — cannot achieve decision coherence for automated systems making concurrent, irreversible decisions over shared resources. The consistency gap between independently propagating systems is structural, not a performance problem that caching or pre-computation can solve.

What is a Context Lake?

A Context Lake is a single system where transactional writes, analytical computation (incremental materialized views), and multi-pattern retrieval (point lookups, aggregations, vector similarity, full-text search) all happen inside one consistent transactional boundary. It is designed for automated decision systems that need consistent context at the moment they execute. It is not a database category (not OLTP, OLAP, HTAP, or NewSQL) — it is infrastructure purpose-built for the decision-time workload. See the Context Lake architecture overview for the full definition.

When should I use a Context Lake instead of OLTP + OLAP?

When automated systems — not humans — are the primary consumer of your data, and those systems make irreversible decisions (approve/deny, block/allow, price/route) that require both transactional state and analytical context at the moment of execution. Common examples: fraud detection, credit decisioning, real-time personalization, AI agent coordination, and dynamic pricing. If your analytics workload is human-driven dashboards and reporting, OLTP + OLAP is the right architecture.

Does a Context Lake replace my existing databases?

No. A Context Lake complements your existing OLTP and OLAP databases. Your OLTP database remains the source of record for application state. The Context Lake ingests from it via CDC (change data capture) and maintains decision-time context. Your OLAP database continues serving dashboards, reports, and model training. Each system serves its intended consumer: applications (OLTP), analysts (OLAP), automated decisions (Context Lake).

Back to Blog

Data Engineering

OLTP vs OLAP: The False Choice for the Agentic Era

Q: What is the difference between OLTP and OLAP?

OLTP (Online Transaction Processing) databases optimize for high-frequency transactional reads and writes to individual records with ACID consistency. OLAP (Online Analytical Processing) databases optimize for scanning and aggregating large datasets for analytics and reporting. OLTP uses row-oriented storage for fast point lookups; OLAP uses columnar storage for fast analytical scans. The standard architecture runs both with an ETL pipeline between them.

Every architecture guide frames OLTP vs OLAP as a choice: optimize for transactions or optimize for analytics. But automated decision systems — fraud checks, credit approvals, agent actions — need both transactional consistency and analytical power at the same moment. The Composition Impossibility Theorem proves you can't stitch separate OLTP and OLAP systems together to get there. Here's what comes after the tradeoff.

Xiaowei Jiang

CEO & Chief Architect

Mar 17, 2026

14 min read

TL;DR: OLTP handles transactions. OLAP handles analytics. The standard approach is to run both with a pipeline between them. But automated decision systems — fraud checks, credit approvals, agent actions — need transactional consistency and analytical context at the same moment, not minutes later. The Composition Impossibility Theorem proves that composing separate OLTP and OLAP systems cannot achieve decision coherence: the pipeline between them introduces split state, temporal misalignment, and consistency gaps that break automated decisions. The architectural resolution is a Context Lake — a single system where transactional writes, analytical computation, and decision-time serving happen inside one consistent boundary.

OLTP vs OLAP is one of the oldest distinctions in data infrastructure. Every architecture guide, every database course, every vendor comparison starts here: OLTP databases handle transactions; OLAP databases handle analytics. Choose one. Or run both and build a pipeline between them.

For decades, this framing worked. Humans ran the analytics. They tolerated minutes or hours of delay. They cross-referenced dashboards, noticed inconsistencies, adjusted their judgment. The pipeline between OLTP and OLAP could be slow, lossy, and eventually consistent — because a human analyst at the end of it could compensate.

That assumption is now wrong for a growing class of workloads. Fraud detection systems evaluate transactions in milliseconds. Credit decisioning engines approve or deny loans before the applicant's page loads. AI agents coordinate across services, making irreversible choices hundreds of times per second. These systems need transactional consistency and analytical context — not one or the other, and not eventually, but at the exact moment the decision executes.

The standard OLTP vs OLAP framing doesn't have a good answer for this. HTAP tries. NewSQL tries. Both fall short for reasons that are architectural, not incremental. In my formal analysis of decision coherence, I proved that composing separate transactional and analytical systems cannot achieve what automated decisions require. The limitation is structural.

This post explains what OLTP and OLAP actually optimize for, why the tradeoff between them creates a specific failure mode for automated decisions, and what architectural approach resolves it.

What OLTP and OLAP Actually Optimize For

OLTP and OLAP are not database brands. They are optimization targets — each making a set of tradeoffs that serve a specific workload pattern well and serve other patterns poorly.

OLTP (Online Transaction Processing) databases — PostgreSQL, MySQL, Oracle, SQL Server — optimize for high-frequency reads and writes to individual records. They provide ACID guarantees: atomicity, consistency, isolation, durability. When you insert an order, update an account balance, or delete a session record, the database guarantees the operation either fully commits or fully rolls back. Concurrent transactions see consistent state. This is what makes OLTP databases the backbone of operational applications.

The tradeoff: OLTP databases evaluate queries at read time using row-oriented storage. They're fast for point lookups and small-range scans but slow for analytical aggregations across millions of rows. Running `SELECT user_id, SUM(amount) FROM transactions WHERE timestamp > now() - interval '1 hour' GROUP BY user_id` on an OLTP database serving production traffic will either be slow or destabilize the system.

OLAP (Online Analytical Processing) databases — ClickHouse, Snowflake, BigQuery, Redshift — optimize for scanning and aggregating large datasets. They use columnar storage, vectorized execution, and compression to make analytical queries fast. When a data analyst asks "what were last quarter's sales by region and product category," an OLAP database returns the answer in seconds, even over billions of rows.

The tradeoff: OLAP databases sacrifice transactional semantics for scan performance. Most are append-optimized with limited or no support for updates and deletes. They provide eventual consistency, not ACID isolation. Two concurrent queries may see different snapshots. Data arrives via batch ETL or micro-batch pipelines, meaning results reflect state as of the last load — minutes to hours old.

Property	OLTP	OLAP
Optimized for	Transactional reads/writes to individual records	Analytical scans across large datasets
Storage model	Row-oriented	Columnar
Consistency	ACID (serializable isolation)	Eventual (no cross-query snapshot guarantee)
Write pattern	High-frequency inserts, updates, deletes	Batch/micro-batch append
Query latency	Sub-millisecond for point lookups	Seconds to minutes for aggregations
Data freshness	Real-time (data is queryable on commit)	Minutes to hours (pipeline lag)
Typical consumer	Application services	Human analysts, BI tools
Examples	PostgreSQL, MySQL, Oracle	ClickHouse, Snowflake, BigQuery, Redshift

The Standard Architecture: Run Both

Because OLTP and OLAP optimize for different workloads, the standard practice is to run both and connect them with a data pipeline:

1. OLTP database handles operational writes (orders, payments, account updates)

2. ETL/ELT pipeline extracts changes and loads them into the OLAP database

3. OLAP database serves dashboards, reports, and ad-hoc analytics

This architecture has worked for decades because the consumers at each end have different tolerance for latency and consistency. The application needs millisecond reads on current state — OLTP delivers. The analyst needs aggregated views across historical data — OLAP delivers. The pipeline between them can run hourly or daily without anyone noticing, because the analyst is looking at trends, not making split-second decisions.

The architecture is not broken for this use case. It is specifically designed for it. The problem emerges when a new class of consumer appears — one that needs both transactional consistency and analytical context simultaneously.

Where the Tradeoff Breaks: Automated Decisions

Consider a credit approval system at a BNPL lender. When a customer applies for a loan, the system must evaluate — in under 200 milliseconds, with no human in the loop:

The applicant's current balance and exposure across all active loans (transactional state — OLTP)
The applicant's transaction velocity over the past hour, day, and week (analytical aggregation — OLAP)
The applicant's behavioral similarity to known fraud patterns (vector similarity — neither OLTP nor OLAP)
The lender's total portfolio exposure to this risk segment (cross-account aggregation — OLAP)

This decision needs data from both sides of the OLTP/OLAP divide. Not approximately. Not eventually. At the exact moment the decision executes, with consistency guarantees across all of these reads.

The standard architecture can't deliver this. The OLTP database has the current balance but can't efficiently compute velocity aggregations across millions of transactions. The OLAP database has the aggregations but they're minutes old — computed from the last pipeline run, not reflecting the three transactions that just posted. The vector store has embeddings but they're on a different propagation timeline than either system.

Teams work around this with caching layers, pre-computed features stored in Redis, and increasingly complex coordination logic. Each workaround adds another propagation stage, another consistency boundary, and another place where the decision reads context that doesn't reflect the current moment.

This is not a performance problem. It is a consistency problem created by the OLTP/OLAP separation itself.

HTAP: Almost, But Not Quite

HTAP (Hybrid Transactional/Analytical Processing) databases — TiDB, SingleStore, CockroachDB with analytics extensions — attempt to serve both workloads in a single system. The pitch: one database that handles OLTP and OLAP, eliminating the pipeline between them.

HTAP is a meaningful step. But it defines the problem as combining two workload types in one system. The automated decision workload is not a combination of OLTP and OLAP — it is a different workload with different requirements:

Decision-time aggregations must be incrementally maintained, not computed on read. A fraud velocity counter that recomputes on every query doesn't scale at 50,000 decisions per second. It must be pre-maintained and transactionally consistent with the underlying writes.
Multiple retrieval patterns must be served from one consistent snapshot. The credit decision above needs a point lookup, a time-windowed aggregation, and a vector similarity query — all reflecting the same moment in state. HTAP databases typically don't support native vector search, requiring an external vector store and reintroducing the consistency split.
The system must handle reads and writes within the same transactional boundary at high concurrency. When 10,000 fraud checks per second each need to read and potentially update velocity counters for the same set of accounts, the system must provide serializable isolation without collapsing under contention. Many HTAP databases separate their transactional and analytical engines internally, which reintroduces the consistency gap at the storage layer.

HTAP solves the pipeline problem (no more ETL between OLTP and OLAP). It does not solve the decision-time problem (all context a decision needs, from one consistent snapshot, at millisecond latency, under high concurrency).

Why You Can't Compose Your Way Out: The Composition Impossibility Theorem

The instinct when the OLTP/OLAP tradeoff breaks is to compose more systems: add Redis for fast lookups, add a feature store for pre-computed features, add a vector database for similarity search, add Kafka + Flink for real-time aggregation. Each system handles one retrieval pattern well.

The problem is not any individual system. The problem is the composition. In my formal analysis, I proved what I call the Composition Impossibility Theorem: you cannot compose separate systems — each maintaining its own state, each on its own propagation timeline — to achieve decision coherence.

Decision coherence means: every agent or automated process evaluating interacting decisions sees a consistent representation of reality at the moment the decision is made. Not eventually consistent. Not "close enough." Consistent in the ACID sense — serializable across all context the decision depends on.

The composed stack — Postgres for transactions, Kafka + Flink for streaming aggregation, Redis for serving, Pinecone for vectors — reconstructs context asynchronously while decisions happen synchronously. Each system maintains context at a different propagation stage. The Redis cache is seconds behind the Postgres write. The Flink-maintained counter is behind the Kafka offset. The vector embeddings reflect yesterday's model run.

At any given moment, the decision is evaluating context that never existed as a consistent whole. It is reading balance-as-of-time-T1, velocity-as-of-time-T2, and embeddings-as-of-time-T3, and treating them as if they represent a single moment. Under moderate load, the gaps are small enough to tolerate. Under high state velocity and concurrency — the conditions that define the hardest decision-time problems — the gaps widen and the consequences compound.

This is not an engineering failure. It is a structural limitation. No amount of caching, pre-computation, or operational discipline can eliminate the consistency gap between independently propagating systems. The gap is architectural.

The Four Gaps in Practice

The Composition Impossibility Theorem manifests as four concrete gaps in the OLTP + OLAP + serving composed stack:

Split state. Transactional state lives in the OLTP database. Analytical state lives in the OLAP database. Serving state lives in Redis. These are copies that sync periodically. Any read from the serving or analytics layer is, by definition, behind the transactional source. How far behind depends on the pipeline — milliseconds to minutes — but the gap cannot be zero because the sync is asynchronous.

Temporal misalignment. A decision that needs to evaluate five entities (user balance, velocity counter, portfolio exposure, risk score, behavioral embedding) reads each from a different system at a different propagation stage. The decision evaluates a composite state that never existed at any single moment. For a human analyst, this is a minor annoyance. For an automated fraud check running 10,000 times per second, it is a correctness bug.

Mutation fragility. OLAP databases are append-optimized. Corrections, late-arriving data, and retroactive updates are awkward operations. Streaming systems handle flow, not correction — replaying a stream to fix a bug means reprocessing everything downstream. When the decision-time system needs to update a derived value (a velocity counter, a risk score), the pipeline between systems turns a simple write into a multi-system coordination problem.

Serving mismatch. Decision-time reads are point lookups at high concurrency: "give me the context for this user, right now." OLAP databases optimize for analytical scans across many rows. Adding a caching layer (Redis) for point lookups adds another propagation stage, another consistency boundary, and another system to operate.

Each gap individually is manageable. Together, they form a compound failure mode: the decision reads context that is stale, inconsistent, hard to correct, and served from the wrong access pattern. Under high state velocity and concurrency — which is precisely when correct decisions matter most — every gap widens.

What Comes After the Tradeoff: The Context Lake

The OLTP vs OLAP framing assumes two consumers: an application that writes records and an analyst who queries aggregates. The pipeline between them is acceptable because these consumers have different latency tolerances.

When the consumer is an automated decision system — operating at millisecond latency, requiring ACID consistency across transactional and analytical context, evaluating multiple retrieval patterns against a single consistent snapshot — the two-system architecture and the pipeline between them become the constraint.

The architectural resolution is to collapse the pipeline. Not by making OLTP databases faster at analytics (they'll always be row-oriented). Not by making OLAP databases transactional (they'll always be eventually consistent). Not by composing more systems (the Composition Impossibility Theorem proves this fails). But by building a system designed for the decision-time workload from the ground up.

This is what a Context Lake provides. A Context Lake is a single system where:

Transactional writes and analytical computation happen inside the same boundary. No pipeline. No sync lag. A write is queryable in the same transaction that committed it.
Derived context is incrementally maintained, not computed on read. Velocity counters, exposure aggregations, behavioral features update continuously as source data changes — not on a batch schedule, and not as a separate stream processing job, but inside the same transactional boundary that handles the source writes.
All retrieval patterns — point lookups, aggregations, vector similarity, full-text search — serve from one consistent snapshot. The fraud check that needs a balance, a velocity counter, and a behavioral embedding gets all three from the same moment in state. Not three systems at three propagation stages.
PostgreSQL compatibility preserves the tools and skills teams already have. The system speaks the PostgreSQL wire protocol. Existing ORMs, connection pooling, query runners, and BI tools connect without modification.

The Context Lake is not a database category. It is not OLTP, OLAP, HTAP, or NewSQL. It is infrastructure for a specific workload — automated decisions that need consistent context at the moment they execute — that none of the existing categories were designed to serve.

OLTP vs OLAP vs Context Lake: When to Use Each

OLTP, OLAP, and Context Lake each serve a specific consumer with specific requirements. The right choice depends on who is consuming the data and what they need from it.

Property	OLTP	OLAP	Context Lake
Primary consumer	Application services	Human analysts, BI tools	Automated decision systems, AI agents
Core optimization	Transactional writes	Analytical scans	Decision-time context serving
Query latency	Sub-ms (point lookups)	Seconds to minutes	Milliseconds (pre-maintained)
Data freshness	Real-time (on commit)	Minutes to hours (pipeline)	Real-time (incremental, in-boundary)
Consistency	ACID (serializable)	Eventual	ACID (serializable, cross-pattern)
Analytical queries	Slow (row-oriented scans)	Fast (columnar, vectorized)	Fast (incremental materialized views)
Vector search	Via extension (pgvector)	Requires external system	Native (HNSW/IVFFlat)
Write pattern	High-frequency CRUD	Batch/micro-batch append	Streaming CDC + direct writes
Mutation support	Full (ACID updates/deletes)	Limited (append-optimized)	Full (ACID within boundary)
Concurrent decision load	Moderate (row locks)	Not designed for this	High (workload isolation)
When to choose	Operational applications, CRUD	Business intelligence, reporting	Fraud detection, credit decisioning, agent actions, real-time personalization

Choosing the Right Architecture

Keep your OLTP database for operational record-keeping — it's the source of truth for application state. PostgreSQL, MySQL, and similar databases are excellent at what they do: ACID-compliant reads and writes at the application layer. A Context Lake doesn't replace your OLTP database; it ingests from it via CDC to maintain decision-time context.

Keep your OLAP database for business intelligence — dashboards, quarterly reporting, ad-hoc exploration, model training on historical data. ClickHouse, Snowflake, and BigQuery are optimized for this and do it well. These workloads tolerate batch latency and eventual consistency because a human is interpreting the results.

Add a Context Lake when automated systems — fraud engines, credit decisioning, pricing systems, AI agents — need to make decisions on context that spans transactional state and analytical aggregations, with ACID consistency, at millisecond latency, under high concurrency. This is the workload that the OLTP/OLAP architecture was not designed for and that composition cannot resolve.

The most common pattern we see in production: OLTP database (PostgreSQL) as the operational source of record, Context Lake (Tacnode) ingesting via CDC and serving decision-time context to automated systems, OLAP database (ClickHouse/Snowflake) serving dashboards and model training from the same data. Each system serves its intended consumer. No system tries to be everything.