Garbage collection (GC) is the automatic reclamation of memory that a program no longer needs. Modern GC algorithms are central to many managed languages (Java, C#, Go, JavaScript, Haskell, etc.), and they also appear in hybrid or optional forms (ARC/RC in Swift/Obj‑C, Boehm GC for C/C++). This article is a deep dive into garbage‑collection algorithms: history, fundamental concepts, core and advanced algorithms, implementation techniques, performance tradeoffs, use cases, and future directions.

Table of contents

  • Introduction and scope
  • Historical milestones
  • Fundamental concepts and terminology
  • Tracing vs reference counting: conceptual comparison
  • Core algorithms
    • Reference counting
    • Mark-and-sweep
    • Mark-and-compact
    • Copying collectors (Cheney)
    • Generational collection
    • Incremental and concurrent collectors
    • Parallel collectors
    • Real-time (hard real-time) collectors
    • Region-based and arena allocation
    • Conservative garbage collection
    • Hybrid strategies
  • Implementation details and engineering
    • Root set discovery and stack scanning
    • Tri-color invariant and barriers (write/read)
    • Card marking and remembered sets
    • Large object handling and humongous objects
    • Compaction, fragmentation, pinning
    • Finalizers, weak/soft/phantom references
    • Interaction with OS (virtual memory, page faults, NUMA)
  • Practical examples and pseudo-code
    • Simple reference counting
    • Mark-and-sweep (pseudocode)
    • Cheney’s copying (pseudocode)
    • Basic write-barrier examples
  • Real-world collectors and language implementations
    • Java HotSpot (Serial, Parallel, CMS, G1, Shenandoah, ZGC)
    • .NET CLR GC
    • Go runtime GC
    • CPython (reference counting + cycle collector)
    • Boehm conservative GC
    • V8 and JavaScript engines
    • Functional language GCs (GHC)
    • Swift/Obj‑C ARC
  • Metrics and evaluation: throughput vs latency vs footprint
  • Tuning strategies and heuristics
  • Limitations, safety, and security considerations
  • Future directions and research trends
  • Practical guidance: choosing/designing a GC for different workloads
  • Summary

Introduction and scope Garbage collection relieves programmers from managing memory explicitly (malloc/free, free()). A GC’s role is to detect unreachable objects and reclaim their storage while balancing tradeoffs: throughput (compute time dedicated to GC), latency (pause times/duration), memory footprint, fragmentation, and implementation complexity. This article covers classic algorithms, their modern variants (parallel, concurrent, generational), implementation techniques (barriers, remembered sets), and practical considerations for production systems.

Historical milestones

  • 1960s – John McCarthy proposed an automatic memory management scheme for Lisp; mark-and-sweep-like ideas appeared early.
  • 1969–1970 – Cheney’s algorithm for copying collection (nonrecursive compaction) introduced.
  • 1970s – Reference counting and cycle issues were studied and practical improvements proposed.
  • 1970s–1980s – Tri-color abstraction and on-the-fly/incremental GC (Dijkstra et al.) established formal invariants for concurrent collectors.
  • 1990s–present – Generational collection and parallel/concurrent collectors became mainstream for multi-core servers and low-latency workloads.
  • 2010s–present – Large-heap, low-latency collectors (G1, Shenandoah, ZGC) with concurrent compaction and region-based heaps.

Fundamental concepts and terminology

  • Heap: region of memory used for dynamic allocation.
  • Root set: references directly accessible from program state (global/static variables, CPU registers, stack frames, thread-local storage).
  • Live (reachable) objects: reachable from root set via chains of references. Anything else is garbage.
  • Tracing collector: identifies live objects by tracing from roots and marks reachable objects.
  • Reference counting: each object maintains a count of inbound references; count reaching zero triggers deallocation.
  • Mark-and-sweep: trace to mark live objects then sweep to reclaim unmarked objects.
  • Copying collector: moves live objects to a new space (evacuation), leaving behind compacted memory.
  • Compaction: moving objects to reduce fragmentation and improve locality.
  • Fragmentation: free memory scattered into small blocks; internal vs external fragmentation.
  • Pause time: time the mutator (application threads) is stopped for GC (stop-the-world).
  • Throughput: fraction of total time devoted to application vs GC.
  • Tri-color invariant: abstraction for incremental GC partitioning objects into white (unvisited), gray (discovered but children not processed), black (processed).
  • Barriers: code emitted on field reads/writes to maintain invariants in concurrent/incremental collectors.

Tracing vs reference counting: conceptual comparison

  • Tracing (mark-based): Pros: detects arbitrary cycles, can compact memory and reduce fragmentation, typically requires less per-object overhead. Cons: can require stop-the-world pauses or complicated concurrent algorithms and write-read barrier overhead.
  • Reference counting (RC): Pros: reclamation is immediate and incremental, simple to implement, low latency for single-threaded contexts. Cons: per-object metadata overhead (reference count), cost of updating counts on every pointer write, inability to reclaim cyclic structures without extra mechanisms, concurrency complexity.

Core algorithms

Reference counting Description:

  • Each object has a counter for number of references to it.
  • On assignment of a pointer, increment the new target’s count and decrement the old target’s count; when any decremented count hits zero, object is reclaimed, and its outgoing references are decremented recursively (cascading deallocation).

Pseudocode (simplified):

JavaScript
1function assign(slot, obj): 2 old = slot.value 3 if obj != null: 4 obj.count += 1 5 slot.value = obj 6 if old != null: 7 old.count -= 1 8 if old.count == 0: 9 finalize_and_free(old)

Pros:

  • Deterministic reclamation time; low-latency deallocations.
  • Good for real-time or incremental reclamation.

Cons:

  • Cannot collect cycles without cycle-detection algorithms (trial deletion, periodic cycle collector).
  • Per-write overhead: O(1) updates for each pointer write (can be expensive).
  • Concurrency requires atomic inc/dec or locks; contention on counts.

Practical uses:

  • CPython uses RC for immediate deallocation, supplemented by a cycle detector for container cycles.
  • Swift/Obj‑C use ARC (Automatic Reference Counting) which is compile-time assisted RC with runtime support.

Mark-and-sweep Description:

  • Stop-the-world (classically) tracing collector. Two phases: mark (trace from roots marking reachable objects); sweep (scan heap and reclaim unmarked objects).
  • Optionally followed by a compaction phase to eliminate fragmentation.

Pseudocode:

JavaScript
1function collect(): 2 mark_phase() 3 sweep_phase() 4 5function mark_phase(): 6 for root in roots: 7 if not root.marked: 8 mark(root) 9 10function mark(obj): 11 obj.marked = true 12 for child in obj.fields: 13 if not child.marked: 14 mark(child) 15 16function sweep_phase(): 17 for each object in heap: 18 if not object.marked: 19 free(object) 20 else: 21 object.marked = false // reset for next GC

Pros:

  • Handles cycles naturally.
  • Low per-allocation overhead.

Cons:

  • Stop-the-world pauses (unless made incremental/concurrent).
  • Sweep can leave fragmentation unless compacted.

Mark-compact

  • Variation: after marking, relocate live objects to compact heap, then update references. Eliminates fragmentation but moving objects requires updating all references to moved objects (via forwarding pointers or two-pass updates, or using card-marking remembered sets to limit updates).

Copying collectors (Cheney’s algorithm) Description:

  • Divide heap into two semispaces: from-space and to-space. Allocate in from-space sequentially. When from-space fills, copy live objects to to-space by breadth-first forwarding, then swap spaces.
  • Cheney’s algorithm is non-recursive and uses a scanning pointer: copied objects are scanned to copy their children.

Pseudocode (high-level):

Plain Text
1scan = to-space.start 2free = to-space.start 3 4copy_root(root): 5 if root.forwarding is set: return root.forwarding 6 new = free 7 copy_object(new, root) 8 root.forwarding = new 9 free += object_size 10 return new 11 12while scan < free: 13 for each field in object_at(scan): 14 if field != null: 15 field = copy_root(field) 16 scan += object_size(scan)

Pros:

  • Compaction is natural (to-space is compact).
  • Allocation is pointer bump (very fast).
  • No fragmentation.

Cons:

  • Requires double memory (unless semi-space is resized; but generational copying avoids full doubling).
  • Copying overhead for live objects.
  • Not well-suited for large heaps without generational techniques.

Generational collection Core idea:

  • Most objects die young (weak generational hypothesis). Partition heap into generations, typically nursery (young) and tenured (old). Do frequent minor collections on nursery (fast copying), occasional major collections on tenured space.
  • Track intergenerational pointers from old to young using write barriers and remembered sets; roots include old->young pointers.

Pros:

  • Good throughput and low pause times for many workloads: most collections are minor and fast.
  • Fast allocation (bump-pointer in nursery).

Cons:

  • Complexity (barriers, remembered sets, promotion policies).
  • Workloads with long-lived short-lived objects or huge volumes of promoted objects can defeat benefits.

Incremental and concurrent collectors Motivation:

  • Long stop-the-world pauses unacceptable; design collectors that run concurrently with mutator to reduce pause times.

Key techniques:

  • Tri-color abstraction to maintain correctness during concurrent marking: objects are white (unprocessed), gray (discoverd but children not yet processed), black (fully processed).
  • Barriers (write barriers and read barriers) to ensure that program modifications do not invalidate collector invariants.
  • Snapshot-at-the-beginning (SATB) and incremental update approaches to implement consistent concurrent marking.

Types:

  • Concurrent mark-sweep (CMS): concurrent marker, stop-the-world remarking, concurrent sweep. Requires avoidance of fragmentation via either mark-compact later or acceptance of fragmentation.
  • Concurrent compacting (Shenandoah/ZGC): concurrent evacuation/compaction using region-based heaps and remembered sets; aims to keep pause times low and independent of heap size.

Pros:

  • Low and predictable pause times.
  • Good for interactive or low-latency servers.

Cons:

  • Overheads from barriers (some percentage of mutator time).
  • Complexity and corner cases (e.g., object pinning, integration with native code).

Parallel collectors

  • Utilize multiple CPU cores to parallelize marking, copying, and sweeping. Most modern server-class GCs do this.
  • E.g., Java's Parallel GC (throughput collector), parallel GCs in GHC, .NET parallel GC during stop-the-world phases.

Real-time collectors

  • Guarantee upper bounds on pause times (hard real-time). Approaches include micro‑scheduling GC work interleaved with mutator and incremental compaction.
  • Examples: Metronome (IBM research), real-time enhancements to Java (Real-Time Specification for Java) use scoped memory or specialized collectors.
  • Typically incur higher overhead for strict guarantees.

Region-based and arena allocation

  • Memory is allocated in arenas or regions: allocation is fast, deallocation occurs by freeing the whole region en masse.
  • Useful for scoped lifetimes (function stack-like lifetimes), efficient where object lifetimes are similar.
  • Regions may be combined with GC for objects escaping region lifetimes.

Conservative garbage collection

  • For languages like C/C++, where pointer metadata is not available, conservative GC scans memory and treats any bit pattern that looks like a valid pointer as a root (inhibits some reclaiming).
  • Boehm-Demers-Weiser GC is a well-known conservative collector.
  • Pros: adds automatic memory management to C/C++ without language changes. Cons: may leak memory (false pointers), hard to compact or move objects safely.

Hybrid strategies

  • Many production GCs are hybrids: generational + concurrent + parallel + region-based (e.g., G1, Shenandoah, ZGC).
  • Hybrid designs aim to get benefits of multiple techniques and mitigate individual downsides.

Implementation details and engineering

Root set discovery and stack scanning

  • Roots are discovered by scanning thread stacks, registers, global/static memory.
  • Precise GC requires metadata (stack maps) so the collector knows which words are pointers; conservative GC treats ambiguous words as potential pointers.
  • Managed runtimes often produce precise stack maps at safepoints (JVM, CLR) to allow precise roots.

Tri-color invariant and barriers

  • Tri-color model ensures that, during concurrent marking, a black object never points to a white one (no visited object points to an unvisited one), preserving correctness.
  • Write barrier types:
    • Incremental update (Dijkstra): on a write, if ptr->field changed, ensure old target not made unreachable incorrectly.
    • Snapshot-at-the-beginning (Yuasa/SATB): record old values or record newly created references so that the collector's logical snapshot of reachable objects is consistent.
  • Barriers can be implemented inline in generated code and cost some cycles per pointer write/read.

Card marking and remembered sets

  • To avoid scanning the entire old gen for pointers to young gen, heap is divided into cards (e.g., 512B). A write to an old object marks the card dirty via write barrier. During minor collections, only dirty cards are scanned to find pointers into the young generation.
  • Remembered sets maintain sets of locations needing scanning during evacuation/compaction.

Large object handling

  • Many GCs place large objects in a separate large object space (LOS) to avoid copying/compacting huge objects constantly. Large objects may be allocated with different strategies (e.g., immortal, pinned).

Compaction and object movement

  • Compaction reduces fragmentation and improves locality but requires updating references. Forwarding pointers, pointer-bumping, and remembered sets are used to limit the cost of updating references.
  • Pinning (when native code needs stable addresses) complicates compaction and can degrade performance.

Finalizers, weak/soft/phantom references

  • Finalizers: functions called before object reclamation (dangerous: unpredictable timing, resurrection possible).
  • Weak references: do not prevent object collection; used for caches.
  • Soft references (JVM): reclaimed only under memory pressure.
  • Phantom references: allow post-mortem cleanup after object is enqueued and prior to memory reclaim.

Interaction with OS and VM

  • GCs must cooperate with OS virtual memory, avoid page thrashing, and handle NUMA placement and large pages.
  • Some collectors use OS-level primitives (madvise, mmap) to manage spaces.

Practical examples and pseudo-code

Reference counting (simplified):

JavaScript
1function inc_ref(obj): 2 atomic_increment(obj.count) 3 4function dec_ref(obj): 5 if atomic_decrement_and_test(obj.count): 6 for child in obj.fields: 7 dec_ref(child) 8 free(obj)

Mark-and-sweep (simplified single-threaded):

JavaScript
1function gc(): 2 // Mark 3 stack = empty 4 for root in roots: 5 if not root.marked: 6 root.marked = true 7 stack.push(root) 8 while stack not empty: 9 obj = stack.pop() 10 for child in obj.fields: 11 if not child.marked: 12 child.marked = true 13 stack.push(child) 14 // Sweep 15 for object in heap: 16 if not object.marked: 17 free(object) 18 else: 19 object.marked = false

Cheney’s copying (simplified):

Plain Text
1from_space = alloc_half() 2to_space = alloc_half() 3free = to_space.start 4scan = to_space.start 5 6for root in roots: 7 if root points into from_space: 8 root = copy(root) 9 10function copy(obj): 11 if obj.forwarding != null: 12 return obj.forwarding 13 new = free 14 copy_bytes(new, obj) 15 obj.forwarding = new 16 free += size(obj) 17 return new 18 19while scan < free: 20 for i in 0..fields(scan)-1: 21 field = scan.fields[i] 22 if field != null and in from_space: 23 scan.fields[i] = copy(field) 24 scan = scan + size(scan) 25swap(from_space, to_space)

Write-barrier examples (SATB - Snapshot at the Beginning)

  • On pointer write p->f = v, record the old value if it points to an object in the heap:
JavaScript
1function write_barrier(ptr, old, new): 2 if old != null and color(old) == BLACK: 3 // Ensure old remains reachable for concurrent collector which started 4 mark_gray(old) 5 ptr.f = new

(This is conceptual; actual barriers differ.)

Real-world collectors and language implementations

  • Java HotSpot:
    • Serial GC: single-threaded, simple, used for small heaps.
    • Parallel (Throughput) GC: parallel stop-the-world collector focusing on throughput.
    • CMS (Concurrent Mark-Sweep): low-pause concurrent collector (deprecated).
    • G1 (Garbage First): region-based generational collector attempting predictable pauses and large-heap scalability.
    • Shenandoah (Red Hat) and ZGC (Oracle): concurrent region-based collectors offering very low pause times (sub-10ms or sub-millisecond in some cases) with concurrent compaction.
  • .NET CLR GC: generational, concurrent, with server vs workstation modes; compacting and parallel.
  • Go runtime: concurrent mark-and-sweep with write barriers; recent versions moved towards low-pause concurrent GC with goroutine-aware safepoints.
  • CPython: primary reference counting with cyclic GC for container cycles.
  • Boehm GC: conservative tracing GC for C/C++, supporting parallel threads and incremental modes.
  • V8: high-performance generational collector with concurrent/incremental stages and specialized spaces for large objects and code.
  • Haskell (GHC): generational, per-capability parallel GC, tuned for functional workloads.
  • Swift/Objective-C: ARC (compile-time reference counting) with runtime support for weak references and occasional runtime cycle detection utilities.

Metrics and evaluation: throughput vs latency vs footprint Key metrics:

  • Pause time (distribution, worst-case)
  • Throughput: percentage of CPU time available to application vs GC
  • Memory footprint: heap size overhead, working set
  • Allocation speed and allocation throughput
  • Scalability: how collector performance scales with heap size and cores
  • Predictability: jitter and tail-latency (important for real-time or latency-sensitive systems) Tradeoffs:
  • Lower pause times often come at cost of higher throughput overhead (barriers) and more memory.
  • Copying collectors trade memory overhead for faster allocation and less fragmentation.
  • Generational GCs optimize for short-lived objects but require barrier overhead and remembered sets.

Tuning strategies and heuristics

  • Size nursery and tenured spaces according to allocation rate and promotion rate.
  • Adjust GC threads for parallel collectors to match available cores.
  • Tune tenuring thresholds to avoid premature promotion of short-lived objects.
  • Monitor allocation patterns and use allocation profiling to optimize.
  • Use escape analysis when possible to allocate on stack and avoid heap allocation.

Limitations, safety, and security considerations

  • Security improvements: GC eliminates many classes of use-after-free/dangling pointer bugs.
  • Finalizers can create unpredictability and potential resource leaks.
  • Conservative collectors may retain objects unintentionally due to false pointer interpretation.
  • Interfacing with native code and pinned memory introduces complexity and potential memory pressure or fragmentation.
  • GC can interact poorly with real-time constraints; choose appropriate collectors or use region/pool-based approaches when strict timing constraints exist.

Future directions and research trends

  • Low-latency, large-heap collectors: More focus on collectors that have pause times independent of heap size (region-based concurrent compaction).
  • Hardware support: tagged memory, pointer metadata, hardware write barriers, or memory reclamation primitives to reduce barrier overhead.
  • Non-volatile memory (NVM): persistence semantics and recovery-aware GC; reclamation in persistent heaps is a new challenge.
  • Deterministic and real-time GC: stronger guarantees for embedded and financial systems.
  • Integration with ownership/borrowing type systems (Rust-style): exploring hybrid systems that offer GC where needed and static ownership elsewhere for fewer runtime costs.
  • Distributed GC: reclamation across distributed nodes and for actor systems (Erlang-like) remains an active area.
  • ML/AI and ephemeral process workloads: optimizing GC for heavy allocation/deallocation patterns in data processing and ML with large temporary buffers.
  • Memory-safe languages on systems programming stacks: blending GC and zero-cost abstractions to offer both memory safety and performance.

Practical guidance: choosing/designing a GC for different workloads

  • Short-lived, throughput-oriented batch jobs: parallel generational collectors (e.g., Parallel GC).
  • Low-latency interactive systems: concurrent region-based collectors (Shenandoah/ZGC) or tuned G1.
  • Embedded or hard real-time: avoid general-purpose GC; use region, arena, or specialized real-time collectors; consider language features like scoped allocation.
  • Systems programming and low-level C/C++: use reference counting or conservative GC when needed; prefer explicit memory pools where determinism required.
  • Functional languages with immutability and high allocation rates: generational copying with parallelism is often effective.

Summary Garbage collection algorithms span a spectrum of tradeoffs: immediacy vs global correctness (reference counting vs tracing), pause time vs throughput (concurrent/incremental vs stop-the-world), memory overhead vs fragmentation (copying vs mark-sweep vs compacting), and implementation complexity vs simplicity. Modern production systems combine generational hypotheses, concurrency, parallelism, and region-based designs, augmented by barriers, remembered sets, and sophisticated heuristics to deliver low latency and high throughput at scale.

Choosing the “right” GC involves understanding an application’s allocation patterns, latency requirements, memory constraints, and platform characteristics. Continued research and engineering innovation—particularly for low-latency large heaps, hardware-assisted barriers, and persistent memory—will drive the next generation of GC designs.

Further reading (classic pointers)

  • John McCarthy: early Lisp memory management ideas.
  • Cheney, C. J.: copying algorithm (nonrecursive compacting).
  • Dijkstra, L. Lamport, and colleagues: on-the-fly/incremental GC and tri-color invariant.
  • Baker: real-time (incremental) collectors.
  • S. M. Blackburn and colleagues: generational hypothesis studies.
  • Research and implementation articles on G1, Shenandoah, ZGC, and Go’s GC.

If you want, I can:

  • Provide a focused comparison table of specific real-world collectors (G1 vs ZGC vs Shenandoah vs CMS).
  • Walk through writing a minimal toy garbage collector in C/C++/Go (example implementation).
  • Suggest tuning parameters for a specific runtime (e.g., HotSpot/Java application profile).