Structured Logging — A Deep Dive
Structured logging is a foundational practice for modern observability. Instead of emitting free-form text messages, applications emit logs as structured, typed data (typically JSON or another machine-readable format). This enables powerful querying, correlation, enrichment, automated alerting, analytics, and reliable machine consumption. This article covers the history, concepts, theory, practical implementation, tools, best practices, pitfalls, migrations, examples, and future directions.
Table of contents
- Introduction and motivation
- Brief history and evolution
- Core concepts and theoretical foundations
- Common structured log formats and standards
- Practical implementation (patterns and code examples)
- Logging pipelines, collection, storage and query
- Observability integration: traces, metrics, and logs
- Best practices and schemas
- Performance, costs, and scaling considerations
- Security, privacy and compliance
- Migration strategies from unstructured logs
- Common pitfalls and how to avoid them
- Future directions
- Checklist / quick guidelines
- References and resources
Introduction and motivation
Unstructured logs are free-form text lines written by humans. While readable, they are hard for machines to parse reliably. Structured logging encodes logs as key-value maps with typed fields (strings, numbers, booleans, arrays, nested maps). Machine-readable logs enable:
- Fast, precise querying (find events by field instead of regex)
- Enrichment (add host, service, trace IDs centrally)
- Correlation across services (trace_id/request_id)
- Reliable alerting and metrics extraction
- Lower parsing/CPU cost in ingestion
- Better analytics and dashboards
- Easier compliance/PII redaction
- Integration with modern observability stacks (ELK, Splunk, Loki, Grafana, OTEL)
Practical benefits: when troubleshooting an error, you can filter by user_id, endpoint, and trace id in seconds. For security, structured logs can be parsed and fed to SIEM rules reliably.
Brief history and evolution
- Early systems: plain text logs, syslog format.
- Log aggregation tools (syslog-ng, Fluentd) used to parse textual logs with regex/grok.
- JSON became a common machine-readable log format in the 2010s.
- Specialized log formats emerged: GELF (Graylog), RFC5424 syslog, Elastic Common Schema (ECS).
- Centralized logging stacks (ELK — Elasticsearch, Logstash, Kibana) popularized structured ingestion.
- Cloud providers and observability vendors embraced structured logs (CloudWatch Logs Insights, Stackdriver).
- OpenTelemetry (OTel) broadened standards to unify metrics, traces, and logs.
- Increasing emphasis on schema, semantic conventions, and context propagation (trace_id/span_id, service.name).
Core concepts and theoretical foundations
Key concepts:
- Event vs. message: a log is a structured event with a timestamp, severity, and fields.
- Schema vs. schema-less: structured logs are schematized implicitly by field names and types. Some systems maintain explicit schemas.
- Semantic conventions: agreed-upon field names (e.g., service.name, http.method) improve interoperability and querying.
- Correlation: use identifiers (request_id, trace_id) across distributed systems for joins.
- Enrichment: add consistent metadata (host, region, environment) at emit or ingest time.
- Immutable append-only stream model: logs as time-ordered events for reconstruction.
- Observability triangle: metrics (aggregates), traces (distributed causal paths), logs (event detail).
Theoretical benefits include data normalization (structured fields), reduced ambiguity (types), improved signal-to-noise ratio (structured alerting), and deterministic parsing.
Common structured log formats and standards
- JSON: de-facto standard due to ubiquity and language support. Example: {"timestamp":"2026-05-10T12:34:56Z","level":"error","message":"...","user_id":123}
- GELF (Graylog Extended Log Format): JSON-like over UDP/TCP/HTTP with specific fields for Graylog.
- RFC5424 syslog: supports structured data blocks.
- CEF (Common Event Format) and LEEF: used by SIEM vendors.
- Elastic Common Schema (ECS): recommended canonical field names for Elastic stack.
- OpenTelemetry Logs: OTLP for logs and semantic conventions for fields.
- custom key=value pairs (less robust but sometimes used in CLI logs).
Important standards and semantic conventions:
- RFC3339 timestamps (e.g., 2026-05-10T12:34:56.123Z).
- Severity conventions: syslog levels (0-emergency to 7-debug) or named levels (trace, debug, info, warn, error, fatal).
- ECS and OpenTelemetry semantic conventions define names for service.name, http.method, db.statement, etc.
Practical implementation (patterns and code examples)
Principles:
- Emit structured objects, not formatted strings.
- Always include a stable timestamp, service identity, environment.
- Include correlation ids (request_id, trace_id).
- Keep a human-readable message field alongside structured fields for quick inspection.
Example structured JSON log: { "timestamp":"2026-05-10T12:34:56.123Z", "level":"error", "message":"payment processing failed", "service.name":"payments", "env":"prod", "trace_id":"abcd1234efgh", "request_id":"req-98765", "user_id":12345, "error":"card_declined", "http.status_code":402 }
Code snippets for common languages:
Python (structlog — recommended for structured logging):
1import structlog, logging
2from pythonjsonlogger import jsonlogger
3
4# Standard logging + JSON formatter
5handler = logging.StreamHandler()
6handler.setFormatter(jsonlogger.JsonFormatter('%(timestamp)s %(level)s %(message)s %(name)s %(module)s'))
7logging.basicConfig(handlers=[handler], level=logging.INFO)
8
9# Structlog config
10structlog.configure(
11 processors=[
12 structlog.processors.TimeStamper(fmt="iso"),
13 structlog.processors.add_log_level,
14 structlog.processors.StackInfoRenderer(),
15 structlog.processors.format_exc_info,
16 structlog.processors.JSONRenderer(),
17 ],
18 context_class=dict,
19 logger_factory=structlog.stdlib.LoggerFactory(),
20 wrapper_class=structlog.stdlib.BoundLogger,
21)
22
23log = structlog.get_logger("payments")
24log = log.bind(service="payments", env="prod")
25
26log.info("charge.created", user_id=123, amount=12.50)Node.js (pino — high performance):
1const pino = require('pino')
2const logger = pino({
3 level: process.env.LOG_LEVEL || 'info',
4 base: { service: 'payments', env: process.env.NODE_ENV || 'dev' }
5})
6
7logger.info({user_id: 123, amount: 12.5}, 'charge.created')Node.js (winston):
1const { createLogger, format, transports } = require('winston')
2const logger = createLogger({
3 level: 'info',
4 format: format.combine(
5 format.timestamp(),
6 format.json()
7 ),
8 defaultMeta: { service: 'payments' },
9 transports: [
10 new transports.Console()
11 ]
12})
13logger.info('charge.created', { user_id: 123, amount: 12.5 })Go (zerolog — low-allocation):
1import (
2 "github.com/rs/zerolog"
3 "github.com/rs/zerolog/log"
4 "os"
5)
6
7func main() {
8 log.Logger = zerolog.New(os.Stdout).With().Timestamp().Str("service","payments").Logger()
9 log.Info().Int("user_id", 123).Float64("amount", 12.5).Msg("charge.created")
10}Java (Logback + Logstash encoder): logback.xml example (Logstash Logback Encoder):
1<configuration>
2 <appender name="STDOUT" class="ch.qos.logback.core.ConsoleAppender">
3 <encoder class="net.logstash.logback.encoder.LoggingEventCompositeJsonEncoder">
4 <providers>
5 <timestamp/>
6 <logLevel/>
7 <loggerName/>
8 <threadName/>
9 <message/>
10 <stackTrace/>
11 <mdc/>
12 </providers>
13 </encoder>
14 </appender>
15 <root level="INFO">
16 <appender-ref ref="STDOUT"/>
17 </root>
18</configuration>Then use MDC to inject request_id/trace_id.
C# (.NET) Serilog:
1using Serilog;
2Log.Logger = new LoggerConfiguration()
3 .Enrich.WithProperty("Service", "Payments")
4 .WriteTo.Console(new Serilog.Formatting.Json.JsonFormatter())
5 .CreateLogger();
6
7Log.Information("charge.created {@Payment}", new { UserId = 123, Amount = 12.5 });Tips: use log context mechanisms (MDC in Java, contextvars in Python, request-scoped middleware) to attach request-level data automatically.
Logging pipelines: collection, processing, enrichment, storage, querying
Typical pipeline:
- Emit logs (JSON) from services to stdout / file / syslog.
- Collect with an agent/collector:
- Fluentd, Fluent Bit, Vector, Filebeat, Promtail (for Loki)
- Process/enrich:
- Add geoIP, host metadata, Kubernetes metadata, Kubernetes pod labels, environment, tags, trace IDs.
- Redact sensitive fields, transform/normalize schema.
- Buffer and batch, forward to storage:
- Elasticsearch/OpenSearch, Loki, Splunk, S3 (object store), BigQuery, ClickHouse, InfluxDB (rare), or proprietary vendor ingesters.
- Indexing and querying:
- Map key fields to indices or columnar tables. For full-text search or aggregation, ensure appropriate indexing and mapping.
- Visualization and alerting:
- Kibana, Grafana, Splunk UI, Datadog Logs, Sumo Logic.
Collectors and processors:
- Fluentd / Fluent Bit: flexible piping; many plugins.
- Vector (by Timber.io): high-performance Rust-based collector and transformer.
- Filebeat: lightweight Beats agents for Elastic.
- Promtail: for Loki ingestion.
- Logstash: heavy but powerful pipeline.
Storage choices and trade-offs:
- Elasticsearch/OpenSearch: full-text search and analytics; mapping complexity and cost at scale.
- Loki: logs as streams with labels; efficient and low-cost for wide usage; query language LogQL.
- Splunk: commercial, feature-rich, powerful ingest and analytics.
- Object storage (S3) with partitioned Parquet/JSON: low-cost long-term storage; slower queries but cheap retention.
- ClickHouse: excellent for analytical queries at scale.
- BigQuery: serverless analytics on massive logs.
Indexing strategies:
- Index time vs query time parsing: structured logs reduce parsing at index time.
- Avoid indexing high-cardinality fields as primary indices.
- Create indices/partitions by time range (e.g., daily) and service.
Example query patterns:
- Kibana/Elasticsearch: filter by service.name, trace_id, or nested fields.
- Loki LogQL: {service="payments"} | json | user_id=123 | = "charge.created"
- Splunk SPL: index=prod service=payments user_id=123 | stats count by error
Observability integration: linking logs, traces and metrics
One of the biggest values of structured logging is easy correlation with traces and metrics.
- Correlate logs and traces using trace_id/span_id or request_id. Emit these IDs in every structured log line for a request or job.
- OpenTelemetry provides semantic conventions for field names and OTLP protocol for logs/traces/metrics.
- Tracing libraries and SDKs can add trace context to logs automatically (context propagation).
- Use logs as event-level detail while metrics provide aggregated signal and traces provide latency and causal relationships.
Example:
- Transaction fails -> trace shows latency spike -> logs filtered by trace_id show the exact error, payload, and user_id.
Best practices and schema design
Schema design is critical for long-term value.
Field naming and conventions:
- Use canonical names: service.name, service.version, env, host.name, process.id, trace_id, span_id.
- Follow ECS or OpenTelemetry semantic conventions where possible.
- Use lower_snake_case or dot-separated (be consistent).
- Avoid nesting too deeply; shallow structures are easier to query.
Timestamps:
- Include an explicit timestamp field in ISO 8601 / RFC3339 UTC (e.g., 2026-05-10T12:34:56.123Z).
- Use the emitter's clock and ensure synchronized clocks (NTP).
Levels and severity:
- Use standard levels: trace, debug, info, warn, error, fatal/critical.
- Also include a numeric level for severity mapping (syslog).
Message field:
- Include a concise human-readable message field. Keep structured fields as primary machine source of truth.
Correlation:
- request_id for a logical request, trace_id/span_id for distributed traces.
- Include user/session identifiers when useful.
Context enrichment:
- Enrich logs with environment, region, k8s pod metadata, version, deployment, build id.
PII and sensitive data:
- Avoid emitting PII unredacted. Mask or omit sensitive fields at emit time or redact in the pipeline.
Field types:
- Prefer typed fields (numeric for durations, boolean for flags). Strings for textual data.
- Arrays for lists, nested maps for structured payloads (but avoid excessive depth).
Cardinality:
- Avoid indexing or aggregating on high-cardinality fields (e.g., user_id, session_id) unless necessary.
- Use labels/tags for low-cardinality dimensions (env, region, status_code_class).
Versioning and schema evolution:
- Allow fields to be optional.
- Adopt a schema registry or documentation to coordinate changes across teams.
- Use feature flags or backward-compatible additions.
Example ECS-aligned log: { "@timestamp":"2026-05-10T12:34:56.123Z", "log.level":"error", "message":"payment failed", "service.name":"payments", "trace.id":"abcd1234", "user.id":12345, "http.response.status_code":402 }
Performance, costs, and scaling
Structured logs can increase log size (field names), raising ingest costs and storage. Strategies to mitigate:
- Use compact field names where helpful, but balance readability (ECS vs abbreviated keys).
- Batch and compress logs in transit (gzip).
- Use binary/compact encodings for internal high-throughput systems (protobuf via OTLP).
- Sample logs (trace sampling) for noisy endpoints; but ensure errors and critical events are always logged.
- Rate-limit noisy loggers; use deduplication in the pipeline.
- Use tiered storage: hot (searchable), warm/cold (less index), archival (S3/Parquet).
- Consider Loki or clickhouse for cost-effective storage of large volumes.
- Avoid expensive index mappings on high-cardinality fields; use partitions and selective indices.
Async logging:
- Use non-blocking, asynchronous appenders to avoid slowing application threads.
- Buffer with bounded queues and drop policies for overload scenarios.
Benchmarks and choices:
- Libraries like pino, zerolog, and logrus with JSON output vary in CPU/memory overhead; choose low-allocation options for high throughput.
Security, privacy, and compliance
- Encrypt logs in transit (TLS) and at rest (AES).
- Implement access control (RBAC) on log storage and queries.
- Retention policies aligned with compliance (GDPR: right to erasure; HIPAA: retention).
- Mask or redact sensitive fields (passwords, tokens, PII) at emit time or in ingestion processors.
- Audit who accessed logs and when. Logs contain sensitive info — treat logs as sensitive data.
- Implement secure log ingestion endpoints and avoid logging secrets in cleartext.
Migration: from unstructured to structured logging
Steps:
- Choose a standard (ECS or OTel semantic conventions).
- Pick libraries for each language (structlog, pino, zerolog, Serilog).
- Start emitting minimal structured fields: timestamp, level, message, service, env.
- Add correlation IDs and adopt context propagation middleware.
- Centralize enrichment in the collector for host/k8s metadata.
- Gradually replace existing log formatting; dual-writing can help (emit both legacy text and structured for a transition window).
- Update dashboards, alerts, and parsers to use structured fields.
- Reprocess historical logs into structured form if needed for analytics (costly).
- Educate teams and enforce conventions via linters/CI checks.
Example migration tactic:
- Phase 1: add JSON logger to new services and features.
- Phase 2: convert critical services to structured logs; update pipelines.
- Phase 3: deprecate regex parsing of text logs.
Examples: Troubleshooting with structured logs
Scenario: A customer reports a failed charge; service returns 402.
Steps with structured logs:
- Search logs for service.name="payments" and user.id=12345 in the time range.
- Filter by trace_id/request_id returned to user.
- Inspect error-level events with fields error, http.status_code, and payment_gateway_response.
- Correlate with spans by trace_id in traces UI to find failing external call.
- Create alert on repeated error field "card_declined" for rate-based detection.
Example Kibana query (ECS): GET /_search { "query": { "bool": { "must": [ { "match": { "service.name": "payments" }}, { "match": { "user.id": 12345 }}, { "range": { "@timestamp": { "gte": "now-15m" }}} ] } } }
Loki LogQL example: {job="payments"} | json | user_id=12345 | status_code=402
Common pitfalls and how to avoid them
- Logging PII or secrets: enforce redaction and checks in CI.
- High-cardinality indexing: avoid indexing fields like user_id as a primary index.
- Diverse schemas across services: adopt central semantic conventions (ECS/OTel).
- Over-logging: sample low-value logs and throttle noisy sources.
- Blocking log writers: use async non-blocking appenders.
- Not including trace IDs: ensure context propagation adds trace_id to logs.
- Relying solely on human message strings: keep key data in structured fields (machines must be able to query them).
Future directions
- Unified telemetry: OpenTelemetry (OTel) standardizes logs, traces, and metrics; expect more cross-product integration and OTLP adoption.
- Semantic schema registries: reusable schemas for logs enabling validation and evolution (inspired by Avro/Protobuf registries).
- AI-assisted log analysis: ML models that cluster, summarize, and surfacing root causes from structured logs.
- More cost-effective storage and query engines designed for logs at exabyte scales (vectorized query engines, columnar formats, and object-store-based analytics).
- Event-driven observability: logs as events that can trigger serverless workflows or automated remediations.
- Standardized enrichment and context propagation protocols across cloud providers.
Checklist / Quick guidelines
- Emit JSON or another machine-readable structured format.
- Include: timestamp (RFC3339), service.name, env, host/pod, level, message.
- Include correlation: request_id and trace_id/span_id.
- Follow semantic conventions (ECS or OTel) where possible.
- Avoid logging secrets/PII; redact at emission or ingestion.
- Use context mechanisms (MDC/contextvars) to bind request context.
- Use asynchronous, batched, compressed exporters.
- Enrich in collectors for consistent metadata.
- Monitor and control cardinality and retention for cost management.
- Document and version log schema; add CI checks to enforce.
References and resources
- OpenTelemetry (semantic conventions and OTLP): https://opentelemetry.io
- Elastic Common Schema (ECS): https://www.elastic.co/ecs/
- Fluentd / Fluent Bit: https://www.fluentd.org / https://www.fluentbit.io/
- Vector: https://vector.dev
- Logstash: https://www.elastic.co/logstash/
- Loki and LogQL: https://grafana.com/oss/loki/
- Graylog / GELF: https://www.graylog.org/
- Serilog, Logstash Logback Encoder, structlog, pino, zerolog documentation for language-specific guidance.
Structured logging is essential for modern, scalable, and reliable observability. Its benefits—improved searchability, automated alerting, robust correlation, and lower parsing costs—far outweigh the initial costs of adoption. By following consistent schemas, protecting sensitive data, and designing scalable pipelines, teams can unlock powerful debugging, analytics, and automation capabilities across complex distributed systems.