Monitoring and Observability with OpenTelemetry: Traces, Metrics, and Logs

Monitoring tells you a system is down. Observability tells you why—without requiring you to predict every possible failure mode in advance. OpenTelemetry (OTel) has emerged as the industry standard for instrumenting applications to produce traces, metrics, and logs in a unified, vendor-neutral format.

The Three Pillars of Observability

Distributed tracing tracks a single request as it propagates across services, showing latency breakdowns for each hop. Metrics provide aggregated counts and measurements over time—request rate, error rate, latency percentiles. Logs record discrete events with structured context.

OpenTelemetry unifies these three signals under a single SDK and export pipeline:

Application → OTel SDK → OTel Collector → Backend (Grafana, Datadog, SigNoz, etc.)

The OTel Collector is the key architectural component. It receives telemetry from instrumented applications, processes it (sampling, filtering, enrichment), and exports it to one or more backends. This decouples instrumentation from your observability vendor.

Instrumenting a Node.js Application

OpenTelemetry provides auto-instrumentation for popular frameworks. For a simple Express application:

const { NodeSDK } = require('@opentelemetry/sdk-node');
const { getNodeAutoInstrumentations } = require('@opentelemetry/auto-instrumentations-node');
const { OTLPTraceExporter } = require('@opentelemetry/exporter-trace-otlp-grpc');
const { OTLPMetricExporter } = require('@opentelemetry/exporter-metrics-otlp-grpc');
const { PeriodicExportingMetricReader } = require('@opentelemetry/sdk-metrics');

const sdk = new NodeSDK({
  traceExporter: new OTLPTraceExporter({ url: 'http://otel-collector:4317' }),
  metricReader: new PeriodicExportingMetricReader({
    exporter: new OTLPMetricExporter({ url: 'http://otel-collector:4317' }),
    exportIntervalMillis: 30000,
  }),
  instrumentations: [getNodeAutoInstrumentations()],
});

sdk.start();

This single configuration automatically instruments HTTP requests, database calls (pg, mysql2, redis), gRPC calls, and more. Each incoming request gets a trace ID that propagates to downstream service calls via W3C Trace Context headers.

Custom Span Attributes and Events

Auto-instrumentation covers 80% of use cases. For the remaining 20%, add custom spans to capture business-logic details:

const { trace } = require('@opentelemetry/api');
const tracer = trace.getTracer('payment-service');

async function processPayment(orderId, amount) {
  const span = tracer.startSpan('processPayment', {
    attributes: {
      'payment.order_id': orderId,
      'payment.amount': amount,
      'payment.currency': 'USD',
    },
  });

  try {
    const result = await paymentGateway.charge(amount);
    span.setAttribute('payment.status', result.status);
    return result;
  } catch (error) {
    span.setAttribute('payment.error', error.message);
    span.recordException(error);
    throw error;
  } finally {
    span.end();
  }
}

Attach structured attributes to spans so you can filter traces by order ID, customer tier, or error type in your observability backend. Record exceptions as span events with the full stack trace.

Log Correlation with Trace Context

Logs become vastly more useful when correlated with traces. The OTel SDK injects trace_id and span_id into the logging context. Configure your logger to include these:

const pino = require('pino');
const { trace } = require('@opentelemetry/api');

const logger = pino({
  mixin() {
    const span = trace.getActiveSpan();
    if (span) {
      const spanContext = span.spanContext();
      return {
        trace_id: spanContext.traceId,
        span_id: spanContext.spanId,
      };
    }
    return {};
  },
});

logger.info({ orderId: 'ORD-123' }, 'Payment request initiated');

Now a failed order trace in Grafana links directly to the relevant log lines, and browsing logs in Loki filters to the exact trace context.

Configuring the OpenTelemetry Collector

The collector is the glue. Run it as a sidecar or DaemonSet:

receivers:
  otlp:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4317
      http:
        endpoint: 0.0.0.0:4318

processors:
  batch:
    timeout: 1s
    send_batch_size: 1024
  memory_limiter:
    check_interval: 1s
    limit_mib: 512

exporters:
  otlp:
    endpoint: "grafana-cloud:4317"
    headers:
      authorization: "Bearer ${GRAFANA_API_KEY}"

service:
  pipelines:
    traces:
      receivers: [otlp]
      processors: [memory_limiter, batch]
      exporters: [otlp]
    metrics:
      receivers: [otlp]
      processors: [memory_limiter, batch]
      exporters: [otlp]

The batching processor is critical—it groups spans and metrics into larger payloads, dramatically reducing export overhead. The memory limiter prevents the collector from OOM-killing itself during traffic spikes.

Dashboards and Alerting with Grafana

Once data flows into Grafana (or Grafana Cloud), build dashboards that answer specific operational questions:

RED metrics dashboard: Rate, Errors, Duration per service
Database dashboard: query latency percentiles, connection pool depth, slow query traces
Infrastructure dashboard: CPU, memory, disk per pod/host
Business dashboard: order completion rate, payment success rate, user signup funnel

Set alerts on error rate spikes (5xx > 1% over 5 minutes), p99 latency increases (>500ms for API endpoints), and trace error count as a canary for upstream dependencies.

Implement Observability with SoniNow

OpenTelemetry gives you a vendor-agnostic observability foundation that works across any stack. Our team at SoniNow designs and deploys full observability pipelines so you understand exactly what your system is doing, without guessing.

Monitoring and Observability with OpenTelemetry: Traces, Metrics, and Logs

The Three Pillars of Observability

Instrumenting a Node.js Application

Custom Span Attributes and Events

Log Correlation with Trace Context

Configuring the OpenTelemetry Collector

Dashboards and Alerting with Grafana

Implement Observability with SoniNow

Related Insights

uptimesaas Performance Monitoring Setup Guide

Website Uptime Monitoring with uptimesaas: The Complete Guide

DevOps ROI: Building a Business Case for DevOps Transformation