Monitoring and Observability with OpenTelemetry: Traces, Metrics, and Logs

Monitoring tells you a system is down. Observability tells you why—without requiring you to predict every possible failure mode in advance. OpenTelemetry (OTel) has emerged as the industry standard for instrumenting applications to produce traces, metrics, and logs in a unified, vendor-neutral format.
The Three Pillars of Observability
Distributed tracing tracks a single request as it propagates across services, showing latency breakdowns for each hop. Metrics provide aggregated counts and measurements over time—request rate, error rate, latency percentiles. Logs record discrete events with structured context.
OpenTelemetry unifies these three signals under a single SDK and export pipeline:
Application → OTel SDK → OTel Collector → Backend (Grafana, Datadog, SigNoz, etc.)
The OTel Collector is the key architectural component. It receives telemetry from instrumented applications, processes it (sampling, filtering, enrichment), and exports it to one or more backends. This decouples instrumentation from your observability vendor.
Instrumenting a Node.js Application
OpenTelemetry provides auto-instrumentation for popular frameworks. For a simple Express application:
const { NodeSDK } = require('@opentelemetry/sdk-node');
const { getNodeAutoInstrumentations } = require('@opentelemetry/auto-instrumentations-node');
const { OTLPTraceExporter } = require('@opentelemetry/exporter-trace-otlp-grpc');
const { OTLPMetricExporter } = require('@opentelemetry/exporter-metrics-otlp-grpc');
const { PeriodicExportingMetricReader } = require('@opentelemetry/sdk-metrics');
const sdk = new NodeSDK({
traceExporter: new OTLPTraceExporter({ url: 'http://otel-collector:4317' }),
metricReader: new PeriodicExportingMetricReader({
exporter: new OTLPMetricExporter({ url: 'http://otel-collector:4317' }),
exportIntervalMillis: 30000,
}),
instrumentations: [getNodeAutoInstrumentations()],
});
sdk.start();
This single configuration automatically instruments HTTP requests, database calls (pg, mysql2, redis), gRPC calls, and more. Each incoming request gets a trace ID that propagates to downstream service calls via W3C Trace Context headers.
Custom Span Attributes and Events
Auto-instrumentation covers 80% of use cases. For the remaining 20%, add custom spans to capture business-logic details:
const { trace } = require('@opentelemetry/api');
const tracer = trace.getTracer('payment-service');
async function processPayment(orderId, amount) {
const span = tracer.startSpan('processPayment', {
attributes: {
'payment.order_id': orderId,
'payment.amount': amount,
'payment.currency': 'USD',
},
});
try {
const result = await paymentGateway.charge(amount);
span.setAttribute('payment.status', result.status);
return result;
} catch (error) {
span.setAttribute('payment.error', error.message);
span.recordException(error);
throw error;
} finally {
span.end();
}
}
Attach structured attributes to spans so you can filter traces by order ID, customer tier, or error type in your observability backend. Record exceptions as span events with the full stack trace.
Log Correlation with Trace Context
Logs become vastly more useful when correlated with traces. The OTel SDK injects trace_id and span_id into the logging context. Configure your logger to include these:
const pino = require('pino');
const { trace } = require('@opentelemetry/api');
const logger = pino({
mixin() {
const span = trace.getActiveSpan();
if (span) {
const spanContext = span.spanContext();
return {
trace_id: spanContext.traceId,
span_id: spanContext.spanId,
};
}
return {};
},
});
logger.info({ orderId: 'ORD-123' }, 'Payment request initiated');
Now a failed order trace in Grafana links directly to the relevant log lines, and browsing logs in Loki filters to the exact trace context.
Configuring the OpenTelemetry Collector
The collector is the glue. Run it as a sidecar or DaemonSet:
receivers:
otlp:
protocols:
grpc:
endpoint: 0.0.0.0:4317
http:
endpoint: 0.0.0.0:4318
processors:
batch:
timeout: 1s
send_batch_size: 1024
memory_limiter:
check_interval: 1s
limit_mib: 512
exporters:
otlp:
endpoint: "grafana-cloud:4317"
headers:
authorization: "Bearer ${GRAFANA_API_KEY}"
service:
pipelines:
traces:
receivers: [otlp]
processors: [memory_limiter, batch]
exporters: [otlp]
metrics:
receivers: [otlp]
processors: [memory_limiter, batch]
exporters: [otlp]
The batching processor is critical—it groups spans and metrics into larger payloads, dramatically reducing export overhead. The memory limiter prevents the collector from OOM-killing itself during traffic spikes.
Dashboards and Alerting with Grafana
Once data flows into Grafana (or Grafana Cloud), build dashboards that answer specific operational questions:
- RED metrics dashboard: Rate, Errors, Duration per service
- Database dashboard: query latency percentiles, connection pool depth, slow query traces
- Infrastructure dashboard: CPU, memory, disk per pod/host
- Business dashboard: order completion rate, payment success rate, user signup funnel
Set alerts on error rate spikes (5xx > 1% over 5 minutes), p99 latency increases (>500ms for API endpoints), and trace error count as a canary for upstream dependencies.
Implement Observability with SoniNow
OpenTelemetry gives you a vendor-agnostic observability foundation that works across any stack. Our team at SoniNow designs and deploys full observability pipelines so you understand exactly what your system is doing, without guessing.
Related Insights

DevOps ROI: Building a Business Case for DevOps Transformation
Build a data-driven business case for DevOps transformation. Quantify savings in deployment frequency, lead time, and MTTR to secure executive buy-in.

Monitoring Node.js Applications: APM, Logging, and Error Tracking
Learn how to monitor Node.js applications in production including APM tools, structured logging, error tracking with Sentry, performance profiling, and alerting strategies.

Web Performance Budgets: Setting and Enforcing Performance Targets
Learn how to set and enforce web performance budgets including bundle size limits, image weight budgets, third-party script caps, and CI enforcement.