Full Stack High Availability

This tutorial ties together Traefik, the Node.js application, and CloudNativePG into a complete high-availability web application. Each component handles its own failure domain: the database fails over automatically, the app layer stays up through replica spread, and the ingress routes around unhealthy pods.

The Architecture

Internet --> Traefik (ingress) --> Node.js app (3 replicas) --> PostgreSQL (1 primary + 2 read replicas)
                                       |  writes                    |  reads
                                  my-postgres-rw               my-postgres-ro

Traffic enters through Traefik, which load-balances across the Node.js deployment. The app connects to two PostgreSQL services: my-postgres-rw for writes (always points to the primary) and my-postgres-ro for reads (distributes across replicas). CloudNativePG manages failover and replication behind these service names.

Pod Disruption Budgets

A PodDisruptionBudget tells Kubernetes how many pods it can take down at once during voluntary disruptions — node drains, cluster upgrades, or autoscaler scale-downs.

apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: node-app-pdb
spec:
  maxUnavailable: 1
  selector:
    matchLabels:
      app: node-app

With maxUnavailable: 1 and 3 replicas, at least 2 pods always serve traffic. Kubernetes will drain one node, wait for the evicted pod to reschedule elsewhere, then proceed to the next node. Without a PDB, a drain operation could kill all three pods simultaneously.

You can also use minAvailable: 2 to express the same constraint from the other direction. Pick whichever reads more clearly for your team.

Anti-Affinity and Topology Spread

Running three replicas on the same node defeats the purpose. If that node goes down, all replicas go with it.

Pod Anti-Affinity

Pod anti-affinity tells the scheduler to prefer placing pods on different nodes:

affinity:
  podAntiAffinity:
    preferredDuringSchedulingIgnoredDuringExecution:
      - weight: 100
        podAffinityTerm:
          labelSelector:
            matchLabels:
              app: node-app
          topologyKey: kubernetes.io/hostname

This uses preferred rather than required. The difference matters:

required — the scheduler refuses to place a pod if no suitable node exists. With a 2-node cluster and 3 replicas, the third pod stays Pending forever.
preferred — the scheduler tries its best but will co-locate pods if necessary. Good for small clusters where strict spreading is not always possible.

Use required in production clusters with enough nodes. Use preferred during development or on small clusters where you would rather have degraded spread than unschedulable pods.

Topology Spread Constraints

Topology spread constraints go further by controlling how evenly pods distribute:

topologySpreadConstraints:
  - maxSkew: 1
    topologyKey: kubernetes.io/hostname
    whenUnsatisfiable: ScheduleAnyway
    labelSelector:
      matchLabels:
        app: node-app

maxSkew: 1 means the difference in pod count between any two nodes can be at most 1. With 3 replicas across 3 nodes, each node gets exactly one pod. With 3 replicas across 2 nodes, one node gets 2 and the other gets 1.

ScheduleAnyway is the soft equivalent — it prefers even distribution but will tolerate skew if needed. Use DoNotSchedule when you need strict guarantees.

Read/Write Split in Practice

CloudNativePG exposes two services: my-postgres-rw routes to the primary and my-postgres-ro round-robins across read replicas. The application connects to both:

const writePool = new Pool({ connectionString: process.env.DATABASE_URL });
const readPool = new Pool({ host: 'my-postgres-ro.default.svc', ... });

async function query(sql, params, readOnly = false) {
  const pool = readOnly ? readPool : writePool;
  return pool.query(sql, params);
}

Route read-heavy queries (dashboards, reports, search) to the read pool. Writes and reads that must see the latest data go through the write pool. This offloads the primary and reduces replication lag impact on user-facing reads.

How Traefik Load Balances

Traefik discovers pod endpoints behind the Kubernetes Service and distributes requests using Weighted Round Robin. The flow works like this:

Traefik watches the Service’s Endpoints object for changes.
When a pod passes its readiness probe, Kubernetes adds it to the Endpoints list. Traefik starts sending traffic.
When a pod fails its readiness probe, Kubernetes removes it from Endpoints. Traefik stops routing to that pod — no manual intervention needed.
During rolling updates, new pods join and old pods leave the Endpoints list gradually. Traffic shifts without downtime.

For stateful applications that need session affinity, configure sticky sessions with a cookie:

apiVersion: traefik.io/v1alpha1
kind: IngressRoute
metadata:
  name: node-app
spec:
  entryPoints:
    - websecure
  routes:
    - match: Host(`app.example.com`)
      kind: Rule
      services:
        - name: node-app
          port: 3000
          sticky:
            cookie:
              name: server_id
              httpOnly: true
              secure: true

Most stateless apps do not need sticky sessions. Use them only when the application stores in-memory state between requests.

The Complete Manifests

This combines every component into a single deployable stack:

# CloudNativePG Cluster
apiVersion: postgresql.cnpg.io/v1
kind: Cluster
metadata:
  name: my-postgres
spec:
  instances: 3
  storage:
    size: 10Gi
  postgresql:
    parameters:
      shared_buffers: "256MB"
      max_connections: "100"
  monitoring:
    enablePodMonitor: true
---
# Node.js Deployment
apiVersion: apps/v1
kind: Deployment
metadata:
  name: node-app
  labels:
    app: node-app
spec:
  replicas: 3
  selector:
    matchLabels:
      app: node-app
  template:
    metadata:
      labels:
        app: node-app
    spec:
      affinity:
        podAntiAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
            - weight: 100
              podAffinityTerm:
                labelSelector:
                  matchLabels:
                    app: node-app
                topologyKey: kubernetes.io/hostname
      topologySpreadConstraints:
        - maxSkew: 1
          topologyKey: kubernetes.io/hostname
          whenUnsatisfiable: ScheduleAnyway
          labelSelector:
            matchLabels:
              app: node-app
      containers:
        - name: node-app
          image: node-app:latest
          ports:
            - containerPort: 3000
          env:
            - name: DATABASE_URL
              valueFrom:
                secretKeyRef:
                  name: my-postgres-app
                  key: uri
            - name: DATABASE_RO_HOST
              value: my-postgres-ro.default.svc
          resources:
            requests:
              cpu: 100m
              memory: 128Mi
            limits:
              cpu: 500m
              memory: 256Mi
          readinessProbe:
            httpGet:
              path: /healthz
              port: 3000
            periodSeconds: 10
          livenessProbe:
            httpGet:
              path: /healthz
              port: 3000
            periodSeconds: 20
---
# Service
apiVersion: v1
kind: Service
metadata:
  name: node-app
spec:
  selector:
    app: node-app
  ports:
    - port: 3000
      targetPort: 3000
---
# Pod Disruption Budget
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: node-app-pdb
spec:
  maxUnavailable: 1
  selector:
    matchLabels:
      app: node-app
---
# Traefik IngressRoute
apiVersion: traefik.io/v1alpha1
kind: IngressRoute
metadata:
  name: node-app
spec:
  entryPoints:
    - websecure
  routes:
    - match: Host(`app.example.com`)
      kind: Rule
      services:
        - name: node-app
          port: 3000
  tls: {}
---
# Pod Monitor (Prometheus)
apiVersion: monitoring.coreos.com/v1
kind: PodMonitor
metadata:
  name: node-app
spec:
  selector:
    matchLabels:
      app: node-app
  podMetricsEndpoints:
    - port: metrics
      interval: 30s

Caution — Avoid initialDelaySeconds

initialDelaySeconds delays the first probe by a fixed duration, masking slow starts rather than detecting them. If a container takes longer than expected to start, the delay hides the problem — and if it starts quickly, the delay wastes time. The readiness probe already gates traffic: a failing readiness probe keeps the pod out of Service endpoints without restarting it. For containers with genuinely slow startup (JVM apps, large ML models), use a startupProbe instead — it runs only during startup and prevents liveness kills until it passes, without hiding failures behind an arbitrary timer.

Testing Failover

Run these tests to verify each layer recovers independently.

Database Failover

# Watch pods in one terminal
kubectl get pods -w

# Kill the primary PostgreSQL pod
kubectl delete pod my-postgres-1

CloudNativePG detects the missing primary and promotes a replica. Watch the -rw service switch to the new primary:

kubectl get svc my-postgres-rw -w

The application reconnects automatically through the service name — no code changes or restarts needed.

Application Pod Failure

# Kill one app pod
kubectl delete pod node-app-xyz

# The Deployment controller creates a replacement immediately
kubectl get pods -w

The PDB prevents Kubernetes from evicting more than one pod at a time during planned operations. Traefik routes around the missing pod because its endpoint disappears from the Service.

Node Drain

# Drain a node (simulates maintenance)
kubectl drain node-1 --ignore-daemonsets --delete-emptydir-data

The PDB ensures Kubernetes evicts only one node-app pod at a time. CloudNativePG handles its own pod management and will reschedule database pods according to its replication rules.

What This Gets You

Node.js: 3 replicas spread across nodes. The PDB prevents simultaneous downtime during upgrades or drains.
PostgreSQL: Automatic failover with CloudNativePG. WAL-based streaming replication keeps replicas in sync. Read replicas distribute query load.
Traefik: Routes only to pods that pass their readiness probe. Unhealthy pods get no traffic.
Zero-downtime upgrades: Rolling updates create new pods before terminating old ones. The PDB ensures minimum availability throughout.

Full Stack High Availability

Full Stack High Availability

The Architecture

Pod Disruption Budgets

Anti-Affinity and Topology Spread

Pod Anti-Affinity

Topology Spread Constraints

Read/Write Split in Practice

How Traefik Load Balances

The Complete Manifests

Testing Failover

Database Failover

Application Pod Failure

Node Drain

What This Gets You

Further Reading

See Also