Production Hardening

A k0s cluster running workloads needs authentication, authorization, network isolation, and resource constraints before you hand access to anyone else. This tutorial walks through each layer — who can connect, what they can do, which pods can talk to which, and how much CPU and memory each container gets.

Authentication: signed letters, not databases

Kubernetes has no user directory. It does not store usernames, passwords, or group memberships. Instead, it delegates identity to external systems and trusts whatever the authenticator asserts. There is no User resource, no group database, and no kubectl create user command.

The mental model is a signed letter. You hand someone a certificate — a letter signed with your cluster’s CA key. The API server cannot remember writing it, but it can verify the signature. Every time the holder authenticates, the server checks the signature, reads the identity fields, and trusts the contents. If you need to revoke the letter, you have a problem: there is no record to delete.

This is the same tradeoff between JWTs and session cookies in web authentication. A JWT is self-contained — the server verifies the signature and reads the claims with no database lookup. But you cannot revoke it until it expires. A session cookie points to server-side state that you can delete instantly, at the cost of maintaining that state. Kubernetes sits on the JWT side.

Client certificates

The API server reads identity from two certificate fields:

CN= maps to the username
O= maps to groups (repeatable for multiple groups)

# Generate a CSR with username "ryan" in groups "team-admins" and "ryan"
openssl req -new -newkey rsa:2048 -nodes \
  -keyout ryan.key \
  -subj "/CN=ryan/O=team-admins/O=ryan" \
  -out ryan.csr

k0s wraps this into a single command:

k0s kubeconfig create --groups "team-admins" ryan

This outputs a kubeconfig with the signed certificate embedded. The cluster does not store the group membership — it reads the O= fields from the cert on every request.

Revocation problem: Kubernetes does not support certificate revocation lists. A signed cert is valid until it expires. Your options are deleting the user’s RoleBindings (blocks authorization but not authentication) or rotating the entire cluster CA (invalidates every cert).

OIDC (Dex, Keycloak)

An OIDC provider issues short-lived JWTs with identity claims:

{
  "email": "ryan@vale.internal",
  "groups": ["team-admins", "platform-ops"],
  "exp": 1711500000
}

Tokens expire in minutes. To revoke access, remove the user from the identity provider — their next token refresh fails. This is a better revocation story than client certificates.

The pragmatic choice for a homelab

For a k0s homelab, client certificates behind an SSH tunnel give you two independent kill switches. SSH key removal blocks network access instantly, without touching certificates. Binding deletion blocks authorization even if the cert remains valid. The revocation gap becomes theoretical when the API server is not publicly reachable.

Treat OIDC as a future enhancement for when you expose the API more broadly or migrate to a managed provider.

RBAC

Kubernetes authorization is deny-by-default. An authenticated user with no matching bindings gets rejected on everything except API discovery endpoints (/api, /apis, /version). No pod listing, no secret reading, no namespace access.

The model has three parts:

Subjects (User, Group, ServiceAccount) → RoleBinding → ClusterRole or Role

A binding can have many subjects but only one roleRef. Bind to groups, not individual users — one binding per group covers everyone in it. Promotion and demotion mean moving between groups, not editing fine-grained RBAC.

Three roles

namespace-admin — full CRUD on all workload resources:

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: namespace-admin
rules:
  - apiGroups: ["", "apps", "batch"]
    resources: ["*"]
    verbs: ["*"]

deployer — create, update, and delete deployments, services, configmaps, and secrets:

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: deployer
rules:
  - apiGroups: ["", "apps"]
    resources: ["deployments", "services", "configmaps", "secrets"]
    verbs: ["create", "update", "patch", "delete", "get", "list", "watch"]

viewer — read-only access:

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: viewer
rules:
  - apiGroups: ["", "apps", "batch"]
    resources: ["*"]
    verbs: ["get", "list", "watch"]

Three groups

Group	Dev	Staging	Production
`team-admins`	namespace-admin	namespace-admin	namespace-admin
`team-members`	deployer	deployer	viewer
`contractors`	deployer	viewer	—

Binding roles to groups

A RoleBinding in a specific namespace connects a group to a ClusterRole. The ClusterRole defines the permissions once; the RoleBinding scopes them to a namespace:

apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: team-admins-namespace-admin
  namespace: production
subjects:
  - kind: Group
    name: team-admins
    apiGroup: rbac.authorization.k8s.io
roleRef:
  kind: ClusterRole
  name: namespace-admin
  apiGroup: rbac.authorization.k8s.io

Create one RoleBinding per group per namespace. The naming convention <subject>-<role> makes bindings read as “who gets what”:

team-admins-namespace-admin    → team-admins get namespace-admin
team-members-deployer          → team-members get deployer
contractors-viewer             → contractors get viewer

Verify permissions with impersonation:

# What can team-admins do in production?
kubectl auth can-i --list \
  --as=ryan \
  --as-group=team-admins \
  --namespace=production

# What can contractors do in staging?
kubectl auth can-i --list \
  --as=harvey \
  --as-group=contractors \
  --namespace=staging

Network Policies

Network policies control pod-to-pod and pod-to-external traffic. Without them, every pod in the cluster can reach every other pod. The pattern is always: start with a default deny, then punch holes.

The following five policies compose to cover most scenarios. Apply default deny to every namespace, then layer on the others as needed.

1. Default deny all

Blocks all ingress and egress in a namespace. DNS is exempted — without it, nothing resolves and everything breaks silently.

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: default-deny-all
  namespace: production
spec:
  podSelector: {}
  policyTypes:
    - Ingress
    - Egress
  ingress: []
  egress:
    - to:
        - namespaceSelector: {}
          podSelector:
            matchLabels:
              k8s-app: kube-dns
      ports:
        - port: 53
          protocol: UDP
        - port: 53
          protocol: TCP

2. Allow ingress from same namespace

Lets pods within a namespace talk freely. Covers the common case of frontend, backend, and cache all living together:

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: allow-same-namespace
  namespace: production
spec:
  podSelector: {}
  policyTypes:
    - Ingress
  ingress:
    - from:
        - podSelector: {}

3. Allow ingress from Traefik

Lets the ingress controller namespace reach your app namespace. Without this, Traefik cannot route traffic to your pods:

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: allow-from-traefik
  namespace: production
spec:
  podSelector:
    matchLabels:
      app: my-app
  policyTypes:
    - Ingress
  ingress:
    - from:
        - namespaceSelector:
            matchLabels:
              kubernetes.io/metadata.name: traefik

4. Allow egress to external

Lets pods reach third-party APIs. Includes DNS so name resolution works:

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: allow-external-egress
  namespace: production
spec:
  podSelector:
    matchLabels:
      app: my-app
  policyTypes:
    - Egress
  egress:
    - to:
        - ipBlock:
            cidr: 0.0.0.0/0
      ports:
        - port: 443
          protocol: TCP
    - to:
        - namespaceSelector: {}
          podSelector:
            matchLabels:
              k8s-app: kube-dns
      ports:
        - port: 53
          protocol: UDP
        - port: 53
          protocol: TCP

5. Allow ingress on specific port from specific pod

The surgical option. Only pods with a matching label can reach a target pod on a specific port:

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: allow-frontend-to-api
  namespace: production
spec:
  podSelector:
    matchLabels:
      app: api
  policyTypes:
    - Ingress
  ingress:
    - from:
        - podSelector:
            matchLabels:
              app: frontend
      ports:
        - port: 8080
          protocol: TCP

For more patterns, see the network policy recipes repository and the interactive policy editor.

Resource Limits and LimitRanges

Without resource limits, a misbehaving pod can consume all node resources and starve everything else. A LimitRange sets default requests and limits for every container in a namespace:

apiVersion: v1
kind: LimitRange
metadata:
  name: default-limits
  namespace: production
spec:
  limits:
    - default:
        cpu: 200m
        memory: 256Mi
      defaultRequest:
        cpu: 100m
        memory: 128Mi
      type: Container

Any pod deployed to this namespace without explicit resource fields inherits these defaults. The defaultRequest is what the scheduler uses for placement decisions. The default (limit) is the ceiling — the kubelet kills a container that exceeds its memory limit and throttles CPU above the limit.

Set requests based on observed steady-state usage and limits based on peak usage. Start conservative and adjust after monitoring real workloads.

CA Rotation (k0s)

If you need to invalidate every client certificate — a leaked key, a departing team member with a long-lived cert — rotate the cluster CA. For a two-machine homelab, this takes 10-15 minutes:

# 1. Stop k0s on all nodes
sudo k0s stop                          # controller
ssh vidar sudo k0s stop                # worker

# 2. Delete the CA and SA keys on the controller
sudo rm /var/lib/k0s/pki/ca.key /var/lib/k0s/pki/ca.crt
sudo rm /var/lib/k0s/pki/sa.key /var/lib/k0s/pki/sa.pub

# 3. Restart k0s — it regenerates a new CA
sudo k0s start

# 4. Get a new admin kubeconfig
sudo k0s kubeconfig admin > ~/.kube/config

# 5. Generate a rejoin token for the worker
sudo k0s token create --role worker > /tmp/worker-token

# 6. On the worker: clear old certs and rejoin
ssh vidar sudo rm -rf /var/lib/k0s/kubelet.conf /var/lib/k0s/pki/
# Copy the token to the worker and rejoin

Every previously issued client certificate is now untrusted. Reissue kubeconfigs for your team with k0s kubeconfig create. Workloads stay on disk and come back up with the cluster.

With SSH as the access gate, you will rarely need this. SSH key removal is instant and sufficient. CA rotation is the nuclear option for when you believe a cert has been leaked and the API server is reachable without SSH.

Security Checklist

What a production cluster needs:

Production Hardening

Production Hardening

Authentication: signed letters, not databases

Client certificates

OIDC (Dex, Keycloak)

The pragmatic choice for a homelab

RBAC

Three roles

Three groups

Binding roles to groups

Network Policies

1. Default deny all

2. Allow ingress from same namespace

3. Allow ingress from Traefik

4. Allow egress to external

5. Allow ingress on specific port from specific pod

Resource Limits and LimitRanges

CA Rotation (k0s)

Security Checklist

Further reading

Related tutorials