A learning path ready to make your own.

Saga pattern

Overview Saga pattern is an approach for managing long-lived, distributed transactions across multiple services by composing a sequence of local ACID transactions plus compensating actions to achieve eventual consistency. It replaces blocking, strongly-consistent distributed transactions (e.g., 2PC) with service autonomy, higher availability, and added compensation complexity. Why use Sagas Solves multi-service business flows (orders, inventory, payments, shipping) where 2PC is impractical. Trades global atomicity for availability and scalability (BASE: Eventually consistent). Suited when you can define compensations or mitigations for failed steps. Core concepts Local transaction: an atomic, durable operation inside one service that may publish events. Saga: a sequence of local transactions; completes if all succeed, otherwise runs compensations for prior steps. Compensation: application-level undo/mitigation (not necessarily strict inverse) and must be idempotent when possible. Coordinator: optional component that tracks saga state—present in orchestration, absent in choreography. Idempotency and durable saga state are essential for correct recovery and retries. Execution model Forward flow: start → step1 (local tx) → step2 → … → final step → success. On failure: run compensations (typically in reverse order) to return to a consistent business state. Lifecycle states: Started, In Progress, Compensating, Compensated, Completed, Failed. Supports sequential and parallel steps; parallel branches require explicit compensation ordering or policy. Implementation styles Choreography — event-driven, no central coordinator: low coupling and scalable but harder to observe and reason about; can lead to "event spaghetti". Orchestration — central saga manager: easier control, persistence, retries and observability; adds a coordination component (can be distributed for availability). Hybrid approaches combine both: use orchestration for complex flows and choreography for simple ones. Practical patterns Outbox pattern — atomically write events to an outbox table in the same DB transaction and publish them asynchronously. Inbox / idempotent consumer — deduplicate messages by tracking processed IDs. CDC (e.g., Debezium) — publish DB changes reliably into event streams. Design compensations as idempotent or guarded by deduplication; manage timeouts and human steps with timers and escalation. Common example (e-commerce) Steps: create order → reserve inventory → charge payment → schedule shipping → complete order. If payment fails, orchestrator or event-driven flow triggers inventory release and order cancellation as compensations. Implementations: orchestration via Temporal/AWS Step Functions, choreography via Kafka topics with consumer events. Failure modes & correctness concerns Message loss, duplication, reordering — mitigate with outbox, durable brokers, idempotency and dedup stores. Non-idempotent compensations and irrevocable side effects (e.g., shipped packages) require mitigation strategies (refunds, notifications) and possibly human intervention. Concurrency & race conditions — use optimistic concurrency or domain conflict resolution; ensure durable saga state for recovery. Long-running sagas risk resource leaks — use TTLs, reclamation processes and monitoring. Tools & frameworks Orchestrators: Temporal (Cadence), AWS Step Functions, Camunda, Netflix Conductor, Azure Durable Functions. Event platforms & integrations: Apache Kafka, Debezium (CDC), outbox/inbox libraries, Spring Cloud, NServiceBus, MassTransit. Testing, monitoring & observability Testing: unit tests for local tx/compensations, integration and end-to-end tests with failure injection, property-based or model-checking where appropriate, chaos testing. Observability: distributed tracing (OpenTelemetry), saga dashboards, metrics (active sagas, durations, compensation counts), structured logging with correlation IDs, dead-letter queues and alerts. Best practices & anti-patterns Best practices: design idempotent compensations, use outbox, persist saga state, propagate correlation IDs, keep steps local, choose orchestration for complex flows, set SLAs/timeouts and cleanup policies. Anti-patterns: attempt to force global ACID semantics, use irrecoverable compensations, rely on long-held distributed locks, overuse choreography for complex flows (event spaghetti). Advanced topics Nested sagas and sub-workflows; parallel and conditional branches with coordinated compensation. Compensation policies: best-effort retries, escalation to humans, or alternative mitigations. CRDTs can complement sagas by enabling convergence without explicit compensation in some domains. Formal verification, model checking and research on automated compensation synthesis and richer DSLs for safer sagas. Future directions Richer developer ergonomics and typed/DSL-based sagas; AI-assisted compensation generation and verification. Hybrid consistency models that selectively use consensus; better transactional messaging yielding simpler exactly-once semantics. Serverless-first durable workflow platforms and standardized observability primitives for sagas. Decision guide (quick) Use Sagas when multi-service transactions cannot use 2PC and eventual consistency is acceptable. Prefer orchestration for complex workflows needing centralized control, retries and observability. Prefer choreography for simple, naturally event-driven flows and low coupling. Avoid sagas when effects are irrevocable or strong cross-service consistency is mandatory. Conclusion The Saga pattern is a pragmatic approach for distributed business transactions: it enables scalable, available microservice architectures by coordinating local transactions and compensations. Successful adoption requires careful compensation design, durable state, strong observability, idempotency, and rigorous testing to manage the complexity of eventual consistency.

Open full tree

Follow the trail that experts already trust.

Resources