Structured Logging — A Deep Dive
Structured logging is a foundational practice for modern observability. Instead of emitting free-form text messages, applications emit logs as structured, typed data (typically JSON or another machine-readable format). This enables powerful querying, correlation, enrichment, automated alerting, analytics, and reliable machine consumption. This article covers the history, concepts, theory, practical implementation, tools, best practices, pitfalls, migrations, examples, and future directions.
Table of contents
- Introduction and motivation
- Brief history and evolution
- Core concepts and theoretical foundations
- Common structured log formats and standards
- Practical implementation (patterns and code examples)
- Logging pipelines, collection, storage and query
- Observability integration: traces, metrics, and logs
- Best practices and schemas
- Performance, costs, and scaling considerations
- Security, privacy and compliance
- Migration strategies from unstructured logs
- Common pitfalls and how to avoid them
- Future directions
- Checklist / quick guidelines
- References and resources
Introduction and motivation
Unstructured logs are free-form text lines written by humans. While readable, they are hard for machines to parse reliably. Structured logging encodes logs as key-value maps with typed fields (strings, numbers, booleans, arrays, nested maps). Machine-readable logs enable:
- Fast, precise querying (find events by field instead of regex)
- Enrichment (add host, service, trace IDs centrally)
- Correlation across services (traceid/requestid)
- Reliable alerting and metrics extraction
- Lower parsing/CPU cost in ingestion
- Better analytics and dashboards
- Easier compliance/PII redaction
- Integration with modern observability stacks (ELK, Splunk, Loki, Grafana, OTEL)
Practical benefits: when troubleshooting an error, you can filter by user_id, endpoint, and trace id in seconds. For security, structured logs can be parsed and fed to SIEM rules reliably.
Brief history and evolution
- Early systems: plain text logs, syslog format.
- Log aggregation tools (syslog-ng, Fluentd) used to parse textual logs with regex/grok.
- JSON became a common machine-readable log format in the 2010s.
- Specialized log formats emerged: GELF (Graylog), RFC5424 syslog, Elastic Common Schema (ECS).
- Centralized logging stacks (ELK — Elasticsearch, Logstash, Kibana) popularized structured ingestion.
- Cloud providers and observability vendors embraced structured logs (CloudWatch Logs Insights, Stackdriver).
- OpenTelemetry (OTel) broadened standards to unify metrics, traces, and logs.
- Increasing emphasis on schema, semantic conventions, and context propagation (traceid/spanid, service.name).
Core concepts and theoretical foundations
Key concepts:
- Event vs. message: a log is a structured event with a timestamp, severity, and fields.
- Schema vs. schema-less: structured logs are schematized implicitly by field names and types. Some systems maintain explicit schemas.
- Semantic conventions: agreed-upon field names (e.g., service.name, http.method) improve interoperability and querying.
- Correlation: use identifiers (requestid, traceid) across distributed systems for joins.
- Enrichment: add consistent metadata (host, region, environment) at emit or ingest time.
- Immutable append-only stream model: logs as time-ordered events for reconstruction.
- Observability triangle: metrics (aggregates), traces (distributed causal paths), logs (event detail).
Theoretical benefits include data normalization (structured fields), reduced ambiguity (types), improved signal-to-noise ratio (structured alerting), and deterministic parsing.
Common structured log formats and standards
- JSON: de-facto standard due to ubiquity and language support. Example:
{"timestamp":"2026-05-10T12:34:56Z","level":"error","message":"...","user_id":123}
- GELF (Graylog Extended Log Format): JSON-like over UDP/TCP/HTTP with specific fields for Graylog.
- RFC5424 syslog: supports structured data blocks.
- CEF (Common Event Format) and LEEF: used by SIEM vendors.
- Elastic Common Schema (ECS): recommended canonical field names for Elastic stack.
- OpenTelemetry Logs: OTLP for logs and semantic conventions for fields.
- custom key=value pairs (less robust but sometimes used in CLI logs).
Important standards and semantic conventions:
- RFC3339 timestamps (e.g., 2026-05-10T12:34:56.123Z).
- Severity conventions: syslog levels (0-emergency to 7-debug) or named levels (trace, debug, info, warn, error, fatal).
- ECS and OpenTelemetry semantic conventions define names for service.name, http.method, db.statement, etc.
Practical implementation (patterns and code examples)
Principles:
- Emit structured objects, not formatted strings.
- Always include a stable timestamp, service identity, environment.
- Include correlation ids (requestid, traceid).
- Keep a human-readable message field alongside structured fields for quick inspection.
Example structured JSON log: { "timestamp":"2026-05-10T12:34:56.123Z", "level":"error", "message":"payment processing failed", "service.name":"payments", "env":"prod", "traceid":"abcd1234efgh", "requestid":"req-98765", "userid":12345, "error":"carddeclined", "http.status_code":402 }
Code snippets for common languages:
Python (structlog — recommended for structured logging): ```python import structlog, logging from pythonjsonlogger import jsonlogger
Standard logging + JSON formatter
handler = logging.StreamHandler() handler.setFormatter(jsonlogger.JsonFormatter('%(timestamp)s %(level)s %(message)s %(name)s %(module)s')) logging.basicConfig(handlers=[handler], level=logging.INFO)
Structlog config
structlog.configure( processors=[ structlog.processors.TimeStamper(fmt="iso"), structlog.processors.addloglevel, structlog.processors.StackInfoRenderer(), structlog.processors.formatexcinfo, structlog.processors.JSONRenderer(), ], contextclass=dict, loggerfactory=structlog.stdlib.LoggerFactory(), wrapper_class=structlog.stdlib.BoundLogger, )
log = structlog.get_logger("payments") log = log.bind(service="payments", env="prod")
log.info("charge.created", user_id=123, amount=12.50) ```
Node.js (pino — high performance): ```javascript const pino = require('pino') const logger = pino({ level: process.env.LOGLEVEL || 'info', base: { service: 'payments', env: process.env.NODEENV || 'dev' } })
logger.info({user_id: 123, amount: 12.5}, 'charge.created') ```
Node.js (winston): ``javascript const { createLogger, format, transports } = require('winston') const logger = createLogger({ level: 'info', format: format.combine( format.timestamp(), format.json() ), defaultMeta: { service: 'payments' }, transports: [ new transports.Console() ] }) logger.info('charge.created', { user_id: 123, amount: 12.5 }) ``
Go (zerolog — low-allocation): ```go import ( "github.com/rs/zerolog" "github.com/rs/zerolog/log" "os" )
func main() { log.Logger = zerolog.New(os.Stdout).With().Timestamp().Str("service","payments").Logger() log.Info().Int("user_id", 123).Float64("amount", 12.5).Msg("charge.created") } ```
Java (Logback + Logstash encoder): logback.xml example (Logstash Logback Encoder): ```xml
``` Then use MDC to inject requestid/traceid.
C# (.NET) Serilog: ```csharp using Serilog; Log.Logger = new LoggerConfiguration() .Enrich.WithProperty("Service", "Payments") .WriteTo.Console(new Serilog.Formatting.Json.JsonFormatter()) .CreateLogger();
Log.Information("charge.created {@Payment}", new { UserId = 123, Amount = 12.5 }); ```
Tips: use log context mechanisms (MDC in Java, contextvars in Python, request-scoped middleware) to attach request-level data automatically.
Logging pipelines: collection, processing, enrichment, storage, querying
Typical pipeline:
- Emit logs (JSON) from services to stdout / file / syslog.
- Collect with an agent/collector:
- Fluentd, Fluent Bit, Vector, Filebeat, Promtail (for Loki)
- Process/enrich:
- Add geoIP, host metadata, Kubernetes metadata, Kubernetes pod labels, environment, tags, trace IDs.
- Redact sensitive fields, transform/normalize schema.
- Buffer and batch, forward to storage:
- Elasticsearch/OpenSearch, Loki, Splunk, S3 (object store), BigQuery, ClickHouse, InfluxDB (rare), or proprietary vendor ingesters.
- Indexing and querying:
- Map key fields to indices or columnar tables. For full-text search or aggregation, ensure appropriate indexing and mapping.
- Visualization and alerting:
- Kibana, Grafana, Splunk UI, Datadog Logs, Sumo Logic.
Collectors and processors:
- Fluentd / Fluent Bit: flexible piping; many plugins.
- Vector (by Timber.io): high-performance Rust-based collector and transformer.
- Filebeat: lightweight Beats agents for Elastic.
- Promtail: for Loki ingestion.
- Logstash: heavy but powerful pipeline.
Storage choices and trade-offs:
- Elasticsearch/OpenSearch: full-text search and analytics; mapping complexity and cost at scale.
- Loki: logs as streams with labels; efficient and low-cost for wide usage; query language LogQL.
- Splunk: commercial, feature-rich, powerful ingest and analytics.
- Object storage (S3) with partitioned Parquet/JSON: low-cost long-term storage; slower queries but cheap retention.
- ClickHouse: excellent for analytical queries at scale.
- BigQuery: serverless analytics on massive logs.
Indexing strategies:
- Index time vs query time parsing: structured logs reduce parsing at index time.
- Avoid indexing high-cardinality fields as primary indices.
- Create indices/partitions by time range (e.g., daily) and service.
Example query patterns:
- Kibana/Elasticsearch: filter by service.name, trace_id, or nested fields.
- Loki LogQL: {service="payments"} | json | user_id=123 | = "charge.created"
- Splunk SPL: index=prod service=payments user_id=123 | stats count by error
Observability integration: linking logs, traces and metrics
One of the biggest values of structured logging is easy correlation with traces and metrics.
- Correlate logs and ...