The Developer's Guide to Logging

Most developers only think about logging when something has already gone wrong. A production incident hits, you open the logs, and you find either a wall of noise or, worse, nothing useful at all. Good logging is a skill that separates reactive debugging from proactive observability.

In my experience working across multiple production systems over the past decade, the teams that invest in logging early are the ones that sleep through the night. I have personally spent countless hours sifting through poorly structured logs at 3am, and those experiences shaped every recommendation in this guide. What follows are practical logging strategies that will help you debug faster, monitor your systems effectively, and avoid the common pitfalls that make logs useless when you need them most.

Why Logging Matters More Than You Think

Logging is your primary window into what your application is doing in production. Unlike local development, you cannot attach a debugger to a live server serving thousands of requests. Your logs are often the only evidence you have when diagnosing a 3am incident.

Beyond debugging, well-structured logs enable capacity planning, security auditing, and business analytics. They feed into alerting systems that catch problems before your users do. According to the Splunk State of Observability report ↗, organisations with mature observability practices resolve incidents 69% faster than those without. A separate New Relic Observability Forecast ↗ found that teams practising full-stack observability experience 60% fewer outages annually. Investing in logging early saves you from expensive firefighting later.

Choosing the Right Log Levels

Every logging framework supports severity levels, but surprisingly few teams use them consistently. Here is how to think about each level.

LevelPurposeProduction DefaultExample
TRACEExtremely granular detailOffVariable contents inside a loop
DEBUGDiagnostic informationOffMethod entry/exit, intermediate calculations
INFORoutine operationsOnServer startup, job completed, user authenticated
WARNUnexpected but handledOnRetry succeeded, cache miss, deprecated endpoint hit
ERRORFailures needing investigationOnDatabase timeout, API error, file write failure
FATALUnrecoverable, app shutting downOnOut of memory, critical config missing

TRACE and DEBUG

These are for development and deep diagnostics. TRACE captures extremely granular detail, such as the contents of every variable in a loop. DEBUG records diagnostic information like method entry and exit points or intermediate calculation results.

Keep these turned off in production by default. The volume they generate can overwhelm your storage and make it harder to find the signals that matter. I have seen teams generate over 500GB of logs per day simply because DEBUG was left on in production after a debugging session. One team I worked with discovered their monthly logging bill had ballooned to over 4,000 pounds before they realised the root cause.

INFO

INFO is your workhorse level. Use it for events that confirm your application is behaving normally: server startup, configuration loaded, scheduled job completed, user authenticated. These entries form a timeline of your application’s life.

A good INFO log tells you what happened and when, without requiring you to read surrounding context.

WARN

WARN signals something unexpected that your application handled gracefully. A retry that succeeded, a cache miss that fell back to the database, or a deprecated API endpoint that is still receiving traffic. These are not emergencies, but they deserve attention during regular review.

ERROR and FATAL

ERROR means something failed and needs investigation. A database query timed out, an external API returned an unexpected response, or a file could not be written. FATAL means the application cannot continue and is shutting down.

Always include the exception or error message, a stack trace where available, and enough context to reproduce the problem. Working with teams over the years, I have found that the single biggest logging mistake is logging an error without the context needed to reproduce it.

Common Logging MistakeImpactFix
Logging errors without contextCannot reproduce bugsAlways include request ID, user ID, input parameters
Leaving DEBUG on in productionStorage costs, signal buried in noiseUse runtime-configurable log levels
Logging PII in plain textGDPR violations, security riskRedact or hash sensitive fields
Inconsistent log formatsBreaks aggregation and searchAdopt structured logging with a shared schema
No correlation IDsCannot trace requests across servicesGenerate and propagate request IDs

Structured Logging: Stop Writing Plain Text

If you are still logging strings like "User 12345 placed order for £50.00", you are making your future self’s job harder. Structured logging outputs each entry as a set of key-value pairs, typically in JSON format.

{
  "timestamp": "2026-02-09T14:23:01Z",
  "level": "INFO",
  "message": "Order placed",
  "userId": 12345,
  "orderId": "ORD-98765",
  "amount": 50.00,
  "currency": "GBP"
}

This format lets you filter, aggregate, and search across millions of log entries. Want to find all errors for a specific user? That is a simple query. Want to calculate average order value from your logs? Also straightforward. With plain text, both of those tasks require fragile regex parsing.

Structured Logging Libraries by Language

LanguageRecommended LibraryNotes
Node.jspino, winstonpino is faster; winston has more transports
Pythonstructlog, built-in loggingstructlog offers cleaner API
JavaSLF4J + Logback (JSON encoder)Industry standard
Gozerolog, zapBoth offer high-performance structured output
RubySemantic LoggerIntegrates well with Rails
.NETSerilogExcellent structured logging with enrichers

The pino documentation ↗ provides excellent examples of structured logging patterns in Node.js if you want to see this in practice.

What to Log (and What Not To)

Always Log

  • Request and response metadata: HTTP method, path, status code, response time, and correlation ID.
  • Business events: User registration, payment processed, subscription cancelled. These are invaluable for both debugging and analytics.
  • State transitions: Order status changes, deployment events, feature flag toggles.
  • Errors with context: The error message alone is rarely enough. Include the input that caused the failure, the state of the system, and any relevant identifiers.

Never Log

  • Secrets: Passwords, API keys, tokens, and connection strings. Even in DEBUG mode, these should be redacted.
  • PII without justification: Email addresses, phone numbers, and other personal data create GDPR liability. If you must log an identifier, use a hashed or tokenised version.
  • High-frequency noise: Logging every iteration of a tight loop or every healthcheck response will bury the useful information.

Correlation IDs: Tracing Requests Across Services

In a distributed system, a single user action might trigger calls across five or ten services. Without a way to link those calls together, debugging becomes guesswork. If you are working with microservices, correlation IDs are not optional; they are essential.

The solution is a correlation ID. Generate a UUID when a request enters your system at the API gateway or load balancer. Pass it downstream via an HTTP header (commonly X-Request-ID or X-Correlation-ID). Every service includes this ID in its log entries.

// Express middleware example
app.use((req, res, next) => {
  req.correlationId = req.headers['x-correlation-id'] || crypto.randomUUID();
  res.setHeader('x-correlation-id', req.correlationId);
  next();
});

When an incident occurs, you search for that single ID and get the complete picture across every service. This practice ties closely into the broader discipline of observability. Building robust, traceable APIs depends on getting this right from the start.

Log Aggregation and Centralisation

Logs sitting on individual servers are useful only if you know which server to check. For any system with more than one instance, centralise your logs using a dedicated platform.

Popular options include the ELK stack (Elasticsearch, Logstash, Kibana), Grafana Loki (lightweight and cost-effective), Datadog, and AWS CloudWatch. The choice depends on your budget, scale, and existing infrastructure.

Whichever tool you choose, ensure your logs are searchable within seconds of being emitted. A log aggregation system with a 15-minute delay is significantly less useful during an active incident.

App Instance 1 App Instance 2 App Instance 3 Log Shipper (Fluentd/Vector) Aggregation (ELK / Loki / Datadog) Dashboards Alerts Centralised Log Aggregation Flow

Alerting on Log Patterns

Centralised logs become even more powerful when you build alerts on top of them. Configure your monitoring tool to notify you when specific patterns emerge.

  • Error rate exceeds a threshold (for example, more than 5% of requests returning 500 errors)
  • A specific error message appears for the first time
  • A critical business event stops occurring (for example, zero orders processed in the last 30 minutes)

Alerting turns your logs from a passive record into an active early warning system. If you are running a mature CI/CD pipeline, your alerting should be as well-tested as your deployment process.

Performance Considerations

Logging is not free. Every log statement involves string formatting, I/O operations, and potentially network calls if you are shipping logs to a remote service. At high throughput, careless logging can measurably impact your application’s latency.

Write logs asynchronously wherever possible. Buffer entries and flush them in batches rather than writing each one individually. Use sampling for extremely high-volume events, logging one in every hundred healthcheck responses rather than all of them.

Most importantly, make your log levels configurable at runtime. The ability to temporarily increase verbosity on a single service without redeploying is invaluable for diagnosing production issues. Feature flags can be an effective mechanism for toggling log verbosity in production.

Getting Started

If your current logging is inconsistent or minimal, start with three changes. First, adopt structured logging with a consistent schema across all your services. Second, add correlation IDs to every request that crosses a service boundary. Third, centralise your logs into a searchable platform with basic alerting.

These three steps will transform your ability to understand and debug your systems. For guidance on the broader monitoring picture, see our guide to observability vs monitoring. If you want to strengthen how your applications handle failures gracefully before they become log entries, our guide to building resilient APIs is a natural next step. Everything else is refinement.

Frequently asked questions

What are the standard logging levels and when should I use each one?

The standard levels are TRACE (granular detail), DEBUG (diagnostic info for development), INFO (routine operations like startup and shutdown), WARN (unexpected but recoverable situations), ERROR (failures that need attention), and FATAL (unrecoverable errors). In production, you typically set the minimum level to INFO or WARN to avoid excessive noise.

Should I use structured logging or plain text logging?

Structured logging (outputting JSON or key-value pairs) is almost always the better choice for production systems. It makes logs machine-parseable, which is essential for log aggregation tools like the ELK stack, Datadog, or Grafana Loki. Plain text is fine for local development, but structured logs pay for themselves the moment you need to search or filter across thousands of entries.

How much logging is too much logging?

If your logs are generating so much volume that they become expensive to store or impossible to search, you have too much. A good rule of thumb is to log at INFO level for business-significant events, WARN for things that might need human attention, and ERROR for actual failures. Reserve DEBUG and TRACE for development. You can always increase verbosity temporarily when diagnosing an issue.

What should I never log?

Never log passwords, API keys, authentication tokens, credit card numbers, or any personally identifiable information (PII) such as email addresses or national insurance numbers. Beyond the security risk, logging PII can put you in breach of GDPR and other data protection regulations. Use redaction or masking if you need to reference sensitive values.

How do I correlate logs across microservices?

Use a correlation ID (also called a trace ID or request ID). Generate a unique identifier when a request enters your system and pass it through every service call via HTTP headers. Include this ID in every log entry so you can trace a single request across all services using your log aggregation tool.

Enjoyed this article? Get more developer tips straight to your inbox.

Comments

Join the conversation. Share your experience or ask a question below.

0/1000

No comments yet. Be the first to share your thoughts.