Skip to content

Full Stack High Availability

This tutorial ties together Traefik, the Node.js application, and CloudNativePG into a complete high-availability web application. Each component handles its own failure domain: the database fails over automatically, the app layer stays up through replica spread, and the ingress routes around unhealthy pods.

Internet --> Traefik (ingress) --> Node.js app (3 replicas) --> PostgreSQL (1 primary + 2 read replicas)
| writes | reads
my-postgres-rw my-postgres-ro

Traffic enters through Traefik, which load-balances across the Node.js deployment. The app connects to two PostgreSQL services: my-postgres-rw for writes (always points to the primary) and my-postgres-ro for reads (distributes across replicas). CloudNativePG manages failover and replication behind these service names.

A PodDisruptionBudget tells Kubernetes how many pods it can take down at once during voluntary disruptions — node drains, cluster upgrades, or autoscaler scale-downs.

apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: node-app-pdb
spec:
maxUnavailable: 1
selector:
matchLabels:
app: node-app

With maxUnavailable: 1 and 3 replicas, at least 2 pods always serve traffic. Kubernetes will drain one node, wait for the evicted pod to reschedule elsewhere, then proceed to the next node. Without a PDB, a drain operation could kill all three pods simultaneously.

You can also use minAvailable: 2 to express the same constraint from the other direction. Pick whichever reads more clearly for your team.

Running three replicas on the same node defeats the purpose. If that node goes down, all replicas go with it.

Pod anti-affinity tells the scheduler to prefer placing pods on different nodes:

affinity:
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
podAffinityTerm:
labelSelector:
matchLabels:
app: node-app
topologyKey: kubernetes.io/hostname

This uses preferred rather than required. The difference matters:

  • required — the scheduler refuses to place a pod if no suitable node exists. With a 2-node cluster and 3 replicas, the third pod stays Pending forever.
  • preferred — the scheduler tries its best but will co-locate pods if necessary. Good for small clusters where strict spreading is not always possible.

Use required in production clusters with enough nodes. Use preferred during development or on small clusters where you would rather have degraded spread than unschedulable pods.

Topology spread constraints go further by controlling how evenly pods distribute:

topologySpreadConstraints:
- maxSkew: 1
topologyKey: kubernetes.io/hostname
whenUnsatisfiable: ScheduleAnyway
labelSelector:
matchLabels:
app: node-app

maxSkew: 1 means the difference in pod count between any two nodes can be at most 1. With 3 replicas across 3 nodes, each node gets exactly one pod. With 3 replicas across 2 nodes, one node gets 2 and the other gets 1.

ScheduleAnyway is the soft equivalent — it prefers even distribution but will tolerate skew if needed. Use DoNotSchedule when you need strict guarantees.

CloudNativePG exposes two services: my-postgres-rw routes to the primary and my-postgres-ro round-robins across read replicas. The application connects to both:

const writePool = new Pool({ connectionString: process.env.DATABASE_URL });
const readPool = new Pool({ host: 'my-postgres-ro.default.svc', ... });
async function query(sql, params, readOnly = false) {
const pool = readOnly ? readPool : writePool;
return pool.query(sql, params);
}

Route read-heavy queries (dashboards, reports, search) to the read pool. Writes and reads that must see the latest data go through the write pool. This offloads the primary and reduces replication lag impact on user-facing reads.

Traefik discovers pod endpoints behind the Kubernetes Service and distributes requests using Weighted Round Robin. The flow works like this:

  1. Traefik watches the Service’s Endpoints object for changes.
  2. When a pod passes its readiness probe, Kubernetes adds it to the Endpoints list. Traefik starts sending traffic.
  3. When a pod fails its readiness probe, Kubernetes removes it from Endpoints. Traefik stops routing to that pod — no manual intervention needed.
  4. During rolling updates, new pods join and old pods leave the Endpoints list gradually. Traffic shifts without downtime.

For stateful applications that need session affinity, configure sticky sessions with a cookie:

apiVersion: traefik.io/v1alpha1
kind: IngressRoute
metadata:
name: node-app
spec:
entryPoints:
- websecure
routes:
- match: Host(`app.example.com`)
kind: Rule
services:
- name: node-app
port: 3000
sticky:
cookie:
name: server_id
httpOnly: true
secure: true

Most stateless apps do not need sticky sessions. Use them only when the application stores in-memory state between requests.

This combines every component into a single deployable stack:

# CloudNativePG Cluster
apiVersion: postgresql.cnpg.io/v1
kind: Cluster
metadata:
name: my-postgres
spec:
instances: 3
storage:
size: 10Gi
postgresql:
parameters:
shared_buffers: "256MB"
max_connections: "100"
monitoring:
enablePodMonitor: true
---
# Node.js Deployment
apiVersion: apps/v1
kind: Deployment
metadata:
name: node-app
labels:
app: node-app
spec:
replicas: 3
selector:
matchLabels:
app: node-app
template:
metadata:
labels:
app: node-app
spec:
affinity:
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
podAffinityTerm:
labelSelector:
matchLabels:
app: node-app
topologyKey: kubernetes.io/hostname
topologySpreadConstraints:
- maxSkew: 1
topologyKey: kubernetes.io/hostname
whenUnsatisfiable: ScheduleAnyway
labelSelector:
matchLabels:
app: node-app
containers:
- name: node-app
image: node-app:latest
ports:
- containerPort: 3000
env:
- name: DATABASE_URL
valueFrom:
secretKeyRef:
name: my-postgres-app
key: uri
- name: DATABASE_RO_HOST
value: my-postgres-ro.default.svc
resources:
requests:
cpu: 100m
memory: 128Mi
limits:
cpu: 500m
memory: 256Mi
readinessProbe:
httpGet:
path: /healthz
port: 3000
periodSeconds: 10
livenessProbe:
httpGet:
path: /healthz
port: 3000
periodSeconds: 20
---
# Service
apiVersion: v1
kind: Service
metadata:
name: node-app
spec:
selector:
app: node-app
ports:
- port: 3000
targetPort: 3000
---
# Pod Disruption Budget
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: node-app-pdb
spec:
maxUnavailable: 1
selector:
matchLabels:
app: node-app
---
# Traefik IngressRoute
apiVersion: traefik.io/v1alpha1
kind: IngressRoute
metadata:
name: node-app
spec:
entryPoints:
- websecure
routes:
- match: Host(`app.example.com`)
kind: Rule
services:
- name: node-app
port: 3000
tls: {}
---
# Pod Monitor (Prometheus)
apiVersion: monitoring.coreos.com/v1
kind: PodMonitor
metadata:
name: node-app
spec:
selector:
matchLabels:
app: node-app
podMetricsEndpoints:
- port: metrics
interval: 30s

Caution — Avoid initialDelaySeconds

initialDelaySeconds delays the first probe by a fixed duration, masking slow starts rather than detecting them. If a container takes longer than expected to start, the delay hides the problem — and if it starts quickly, the delay wastes time. The readiness probe already gates traffic: a failing readiness probe keeps the pod out of Service endpoints without restarting it. For containers with genuinely slow startup (JVM apps, large ML models), use a startupProbe instead — it runs only during startup and prevents liveness kills until it passes, without hiding failures behind an arbitrary timer.

Run these tests to verify each layer recovers independently.

Terminal window
# Watch pods in one terminal
kubectl get pods -w
# Kill the primary PostgreSQL pod
kubectl delete pod my-postgres-1

CloudNativePG detects the missing primary and promotes a replica. Watch the -rw service switch to the new primary:

Terminal window
kubectl get svc my-postgres-rw -w

The application reconnects automatically through the service name — no code changes or restarts needed.

Terminal window
# Kill one app pod
kubectl delete pod node-app-xyz
# The Deployment controller creates a replacement immediately
kubectl get pods -w

The PDB prevents Kubernetes from evicting more than one pod at a time during planned operations. Traefik routes around the missing pod because its endpoint disappears from the Service.

Terminal window
# Drain a node (simulates maintenance)
kubectl drain node-1 --ignore-daemonsets --delete-emptydir-data

The PDB ensures Kubernetes evicts only one node-app pod at a time. CloudNativePG handles its own pod management and will reschedule database pods according to its replication rules.

  • Node.js: 3 replicas spread across nodes. The PDB prevents simultaneous downtime during upgrades or drains.
  • PostgreSQL: Automatic failover with CloudNativePG. WAL-based streaming replication keeps replicas in sync. Read replicas distribute query load.
  • Traefik: Routes only to pods that pass their readiness probe. Unhealthy pods get no traffic.
  • Zero-downtime upgrades: Rolling updates create new pods before terminating old ones. The PDB ensures minimum availability throughout.