Skip to content

Production Hardening

A k0s cluster running workloads needs authentication, authorization, network isolation, and resource constraints before you hand access to anyone else. This tutorial walks through each layer — who can connect, what they can do, which pods can talk to which, and how much CPU and memory each container gets.

Authentication: signed letters, not databases

Section titled “Authentication: signed letters, not databases”

Kubernetes has no user directory. It does not store usernames, passwords, or group memberships. Instead, it delegates identity to external systems and trusts whatever the authenticator asserts. There is no User resource, no group database, and no kubectl create user command.

The mental model is a signed letter. You hand someone a certificate — a letter signed with your cluster’s CA key. The API server cannot remember writing it, but it can verify the signature. Every time the holder authenticates, the server checks the signature, reads the identity fields, and trusts the contents. If you need to revoke the letter, you have a problem: there is no record to delete.

This is the same tradeoff between JWTs and session cookies in web authentication. A JWT is self-contained — the server verifies the signature and reads the claims with no database lookup. But you cannot revoke it until it expires. A session cookie points to server-side state that you can delete instantly, at the cost of maintaining that state. Kubernetes sits on the JWT side.

The API server reads identity from two certificate fields:

  • CN= maps to the username
  • O= maps to groups (repeatable for multiple groups)
Terminal window
# Generate a CSR with username "ryan" in groups "team-admins" and "ryan"
openssl req -new -newkey rsa:2048 -nodes \
-keyout ryan.key \
-subj "/CN=ryan/O=team-admins/O=ryan" \
-out ryan.csr

k0s wraps this into a single command:

Terminal window
k0s kubeconfig create --groups "team-admins" ryan

This outputs a kubeconfig with the signed certificate embedded. The cluster does not store the group membership — it reads the O= fields from the cert on every request.

Revocation problem: Kubernetes does not support certificate revocation lists. A signed cert is valid until it expires. Your options are deleting the user’s RoleBindings (blocks authorization but not authentication) or rotating the entire cluster CA (invalidates every cert).

An OIDC provider issues short-lived JWTs with identity claims:

{
"email": "ryan@vale.internal",
"groups": ["team-admins", "platform-ops"],
"exp": 1711500000
}

Tokens expire in minutes. To revoke access, remove the user from the identity provider — their next token refresh fails. This is a better revocation story than client certificates.

For a k0s homelab, client certificates behind an SSH tunnel give you two independent kill switches. SSH key removal blocks network access instantly, without touching certificates. Binding deletion blocks authorization even if the cert remains valid. The revocation gap becomes theoretical when the API server is not publicly reachable.

Treat OIDC as a future enhancement for when you expose the API more broadly or migrate to a managed provider.

Kubernetes authorization is deny-by-default. An authenticated user with no matching bindings gets rejected on everything except API discovery endpoints (/api, /apis, /version). No pod listing, no secret reading, no namespace access.

The model has three parts:

Subjects (User, Group, ServiceAccount) → RoleBinding → ClusterRole or Role

A binding can have many subjects but only one roleRef. Bind to groups, not individual users — one binding per group covers everyone in it. Promotion and demotion mean moving between groups, not editing fine-grained RBAC.

namespace-admin — full CRUD on all workload resources:

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: namespace-admin
rules:
- apiGroups: ["", "apps", "batch"]
resources: ["*"]
verbs: ["*"]

deployer — create, update, and delete deployments, services, configmaps, and secrets:

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: deployer
rules:
- apiGroups: ["", "apps"]
resources: ["deployments", "services", "configmaps", "secrets"]
verbs: ["create", "update", "patch", "delete", "get", "list", "watch"]

viewer — read-only access:

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: viewer
rules:
- apiGroups: ["", "apps", "batch"]
resources: ["*"]
verbs: ["get", "list", "watch"]
GroupDevStagingProduction
team-adminsnamespace-adminnamespace-adminnamespace-admin
team-membersdeployerdeployerviewer
contractorsdeployerviewer

A RoleBinding in a specific namespace connects a group to a ClusterRole. The ClusterRole defines the permissions once; the RoleBinding scopes them to a namespace:

apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: team-admins-namespace-admin
namespace: production
subjects:
- kind: Group
name: team-admins
apiGroup: rbac.authorization.k8s.io
roleRef:
kind: ClusterRole
name: namespace-admin
apiGroup: rbac.authorization.k8s.io

Create one RoleBinding per group per namespace. The naming convention <subject>-<role> makes bindings read as “who gets what”:

team-admins-namespace-admin → team-admins get namespace-admin
team-members-deployer → team-members get deployer
contractors-viewer → contractors get viewer

Verify permissions with impersonation:

Terminal window
# What can team-admins do in production?
kubectl auth can-i --list \
--as=ryan \
--as-group=team-admins \
--namespace=production
# What can contractors do in staging?
kubectl auth can-i --list \
--as=harvey \
--as-group=contractors \
--namespace=staging

Network policies control pod-to-pod and pod-to-external traffic. Without them, every pod in the cluster can reach every other pod. The pattern is always: start with a default deny, then punch holes.

The following five policies compose to cover most scenarios. Apply default deny to every namespace, then layer on the others as needed.

Blocks all ingress and egress in a namespace. DNS is exempted — without it, nothing resolves and everything breaks silently.

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: default-deny-all
namespace: production
spec:
podSelector: {}
policyTypes:
- Ingress
- Egress
ingress: []
egress:
- to:
- namespaceSelector: {}
podSelector:
matchLabels:
k8s-app: kube-dns
ports:
- port: 53
protocol: UDP
- port: 53
protocol: TCP

Lets pods within a namespace talk freely. Covers the common case of frontend, backend, and cache all living together:

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: allow-same-namespace
namespace: production
spec:
podSelector: {}
policyTypes:
- Ingress
ingress:
- from:
- podSelector: {}

Lets the ingress controller namespace reach your app namespace. Without this, Traefik cannot route traffic to your pods:

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: allow-from-traefik
namespace: production
spec:
podSelector:
matchLabels:
app: my-app
policyTypes:
- Ingress
ingress:
- from:
- namespaceSelector:
matchLabels:
kubernetes.io/metadata.name: traefik

Lets pods reach third-party APIs. Includes DNS so name resolution works:

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: allow-external-egress
namespace: production
spec:
podSelector:
matchLabels:
app: my-app
policyTypes:
- Egress
egress:
- to:
- ipBlock:
cidr: 0.0.0.0/0
ports:
- port: 443
protocol: TCP
- to:
- namespaceSelector: {}
podSelector:
matchLabels:
k8s-app: kube-dns
ports:
- port: 53
protocol: UDP
- port: 53
protocol: TCP

5. Allow ingress on specific port from specific pod

Section titled “5. Allow ingress on specific port from specific pod”

The surgical option. Only pods with a matching label can reach a target pod on a specific port:

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: allow-frontend-to-api
namespace: production
spec:
podSelector:
matchLabels:
app: api
policyTypes:
- Ingress
ingress:
- from:
- podSelector:
matchLabels:
app: frontend
ports:
- port: 8080
protocol: TCP

For more patterns, see the network policy recipes repository and the interactive policy editor.

Without resource limits, a misbehaving pod can consume all node resources and starve everything else. A LimitRange sets default requests and limits for every container in a namespace:

apiVersion: v1
kind: LimitRange
metadata:
name: default-limits
namespace: production
spec:
limits:
- default:
cpu: 200m
memory: 256Mi
defaultRequest:
cpu: 100m
memory: 128Mi
type: Container

Any pod deployed to this namespace without explicit resource fields inherits these defaults. The defaultRequest is what the scheduler uses for placement decisions. The default (limit) is the ceiling — the kubelet kills a container that exceeds its memory limit and throttles CPU above the limit.

Set requests based on observed steady-state usage and limits based on peak usage. Start conservative and adjust after monitoring real workloads.

If you need to invalidate every client certificate — a leaked key, a departing team member with a long-lived cert — rotate the cluster CA. For a two-machine homelab, this takes 10-15 minutes:

Terminal window
# 1. Stop k0s on all nodes
sudo k0s stop # controller
ssh vidar sudo k0s stop # worker
# 2. Delete the CA and SA keys on the controller
sudo rm /var/lib/k0s/pki/ca.key /var/lib/k0s/pki/ca.crt
sudo rm /var/lib/k0s/pki/sa.key /var/lib/k0s/pki/sa.pub
# 3. Restart k0s — it regenerates a new CA
sudo k0s start
# 4. Get a new admin kubeconfig
sudo k0s kubeconfig admin > ~/.kube/config
# 5. Generate a rejoin token for the worker
sudo k0s token create --role worker > /tmp/worker-token
# 6. On the worker: clear old certs and rejoin
ssh vidar sudo rm -rf /var/lib/k0s/kubelet.conf /var/lib/k0s/pki/
# Copy the token to the worker and rejoin

Every previously issued client certificate is now untrusted. Reissue kubeconfigs for your team with k0s kubeconfig create. Workloads stay on disk and come back up with the cluster.

With SSH as the access gate, you will rarely need this. SSH key removal is instant and sufficient. CA rotation is the nuclear option for when you believe a cert has been leaked and the API server is reachable without SSH.

What a production cluster needs:

  • RBAC configured — no default service account usage
  • Network policies in every namespace
  • Resource quotas and limit ranges
  • Non-root containers enforced
  • Read-only root filesystem where possible
  • Image pull policies set to Always
  • Secrets encrypted at rest (SOPS + age)
  • TLS everywhere (cert-manager)
  • Audit logging enabled
  • Runtime monitoring (Falco)