Production Hardening
Production Hardening
Section titled “Production Hardening”A k0s cluster running workloads needs authentication, authorization, network isolation, and resource constraints before you hand access to anyone else. This tutorial walks through each layer — who can connect, what they can do, which pods can talk to which, and how much CPU and memory each container gets.
Authentication: signed letters, not databases
Section titled “Authentication: signed letters, not databases”Kubernetes has no user directory. It does not store usernames, passwords, or group memberships. Instead, it delegates identity to external systems and trusts whatever the authenticator asserts. There is no User resource, no group database, and no kubectl create user command.
The mental model is a signed letter. You hand someone a certificate — a letter signed with your cluster’s CA key. The API server cannot remember writing it, but it can verify the signature. Every time the holder authenticates, the server checks the signature, reads the identity fields, and trusts the contents. If you need to revoke the letter, you have a problem: there is no record to delete.
This is the same tradeoff between JWTs and session cookies in web authentication. A JWT is self-contained — the server verifies the signature and reads the claims with no database lookup. But you cannot revoke it until it expires. A session cookie points to server-side state that you can delete instantly, at the cost of maintaining that state. Kubernetes sits on the JWT side.
Client certificates
Section titled “Client certificates”The API server reads identity from two certificate fields:
CN=maps to the usernameO=maps to groups (repeatable for multiple groups)
# Generate a CSR with username "ryan" in groups "team-admins" and "ryan"openssl req -new -newkey rsa:2048 -nodes \ -keyout ryan.key \ -subj "/CN=ryan/O=team-admins/O=ryan" \ -out ryan.csrk0s wraps this into a single command:
k0s kubeconfig create --groups "team-admins" ryanThis outputs a kubeconfig with the signed certificate embedded. The cluster does not store the group membership — it reads the O= fields from the cert on every request.
Revocation problem: Kubernetes does not support certificate revocation lists. A signed cert is valid until it expires. Your options are deleting the user’s RoleBindings (blocks authorization but not authentication) or rotating the entire cluster CA (invalidates every cert).
OIDC (Dex, Keycloak)
Section titled “OIDC (Dex, Keycloak)”An OIDC provider issues short-lived JWTs with identity claims:
{ "email": "ryan@vale.internal", "groups": ["team-admins", "platform-ops"], "exp": 1711500000}Tokens expire in minutes. To revoke access, remove the user from the identity provider — their next token refresh fails. This is a better revocation story than client certificates.
The pragmatic choice for a homelab
Section titled “The pragmatic choice for a homelab”For a k0s homelab, client certificates behind an SSH tunnel give you two independent kill switches. SSH key removal blocks network access instantly, without touching certificates. Binding deletion blocks authorization even if the cert remains valid. The revocation gap becomes theoretical when the API server is not publicly reachable.
Treat OIDC as a future enhancement for when you expose the API more broadly or migrate to a managed provider.
Kubernetes authorization is deny-by-default. An authenticated user with no matching bindings gets rejected on everything except API discovery endpoints (/api, /apis, /version). No pod listing, no secret reading, no namespace access.
The model has three parts:
Subjects (User, Group, ServiceAccount) → RoleBinding → ClusterRole or RoleA binding can have many subjects but only one roleRef. Bind to groups, not individual users — one binding per group covers everyone in it. Promotion and demotion mean moving between groups, not editing fine-grained RBAC.
Three roles
Section titled “Three roles”namespace-admin — full CRUD on all workload resources:
apiVersion: rbac.authorization.k8s.io/v1kind: ClusterRolemetadata: name: namespace-adminrules: - apiGroups: ["", "apps", "batch"] resources: ["*"] verbs: ["*"]deployer — create, update, and delete deployments, services, configmaps, and secrets:
apiVersion: rbac.authorization.k8s.io/v1kind: ClusterRolemetadata: name: deployerrules: - apiGroups: ["", "apps"] resources: ["deployments", "services", "configmaps", "secrets"] verbs: ["create", "update", "patch", "delete", "get", "list", "watch"]viewer — read-only access:
apiVersion: rbac.authorization.k8s.io/v1kind: ClusterRolemetadata: name: viewerrules: - apiGroups: ["", "apps", "batch"] resources: ["*"] verbs: ["get", "list", "watch"]Three groups
Section titled “Three groups”| Group | Dev | Staging | Production |
|---|---|---|---|
team-admins | namespace-admin | namespace-admin | namespace-admin |
team-members | deployer | deployer | viewer |
contractors | deployer | viewer | — |
Binding roles to groups
Section titled “Binding roles to groups”A RoleBinding in a specific namespace connects a group to a ClusterRole. The ClusterRole defines the permissions once; the RoleBinding scopes them to a namespace:
apiVersion: rbac.authorization.k8s.io/v1kind: RoleBindingmetadata: name: team-admins-namespace-admin namespace: productionsubjects: - kind: Group name: team-admins apiGroup: rbac.authorization.k8s.ioroleRef: kind: ClusterRole name: namespace-admin apiGroup: rbac.authorization.k8s.ioCreate one RoleBinding per group per namespace. The naming convention <subject>-<role> makes bindings read as “who gets what”:
team-admins-namespace-admin → team-admins get namespace-adminteam-members-deployer → team-members get deployercontractors-viewer → contractors get viewerVerify permissions with impersonation:
# What can team-admins do in production?kubectl auth can-i --list \ --as=ryan \ --as-group=team-admins \ --namespace=production
# What can contractors do in staging?kubectl auth can-i --list \ --as=harvey \ --as-group=contractors \ --namespace=stagingNetwork Policies
Section titled “Network Policies”Network policies control pod-to-pod and pod-to-external traffic. Without them, every pod in the cluster can reach every other pod. The pattern is always: start with a default deny, then punch holes.
The following five policies compose to cover most scenarios. Apply default deny to every namespace, then layer on the others as needed.
1. Default deny all
Section titled “1. Default deny all”Blocks all ingress and egress in a namespace. DNS is exempted — without it, nothing resolves and everything breaks silently.
apiVersion: networking.k8s.io/v1kind: NetworkPolicymetadata: name: default-deny-all namespace: productionspec: podSelector: {} policyTypes: - Ingress - Egress ingress: [] egress: - to: - namespaceSelector: {} podSelector: matchLabels: k8s-app: kube-dns ports: - port: 53 protocol: UDP - port: 53 protocol: TCP2. Allow ingress from same namespace
Section titled “2. Allow ingress from same namespace”Lets pods within a namespace talk freely. Covers the common case of frontend, backend, and cache all living together:
apiVersion: networking.k8s.io/v1kind: NetworkPolicymetadata: name: allow-same-namespace namespace: productionspec: podSelector: {} policyTypes: - Ingress ingress: - from: - podSelector: {}3. Allow ingress from Traefik
Section titled “3. Allow ingress from Traefik”Lets the ingress controller namespace reach your app namespace. Without this, Traefik cannot route traffic to your pods:
apiVersion: networking.k8s.io/v1kind: NetworkPolicymetadata: name: allow-from-traefik namespace: productionspec: podSelector: matchLabels: app: my-app policyTypes: - Ingress ingress: - from: - namespaceSelector: matchLabels: kubernetes.io/metadata.name: traefik4. Allow egress to external
Section titled “4. Allow egress to external”Lets pods reach third-party APIs. Includes DNS so name resolution works:
apiVersion: networking.k8s.io/v1kind: NetworkPolicymetadata: name: allow-external-egress namespace: productionspec: podSelector: matchLabels: app: my-app policyTypes: - Egress egress: - to: - ipBlock: cidr: 0.0.0.0/0 ports: - port: 443 protocol: TCP - to: - namespaceSelector: {} podSelector: matchLabels: k8s-app: kube-dns ports: - port: 53 protocol: UDP - port: 53 protocol: TCP5. Allow ingress on specific port from specific pod
Section titled “5. Allow ingress on specific port from specific pod”The surgical option. Only pods with a matching label can reach a target pod on a specific port:
apiVersion: networking.k8s.io/v1kind: NetworkPolicymetadata: name: allow-frontend-to-api namespace: productionspec: podSelector: matchLabels: app: api policyTypes: - Ingress ingress: - from: - podSelector: matchLabels: app: frontend ports: - port: 8080 protocol: TCPFor more patterns, see the network policy recipes repository and the interactive policy editor.
Resource Limits and LimitRanges
Section titled “Resource Limits and LimitRanges”Without resource limits, a misbehaving pod can consume all node resources and starve everything else. A LimitRange sets default requests and limits for every container in a namespace:
apiVersion: v1kind: LimitRangemetadata: name: default-limits namespace: productionspec: limits: - default: cpu: 200m memory: 256Mi defaultRequest: cpu: 100m memory: 128Mi type: ContainerAny pod deployed to this namespace without explicit resource fields inherits these defaults. The defaultRequest is what the scheduler uses for placement decisions. The default (limit) is the ceiling — the kubelet kills a container that exceeds its memory limit and throttles CPU above the limit.
Set requests based on observed steady-state usage and limits based on peak usage. Start conservative and adjust after monitoring real workloads.
CA Rotation (k0s)
Section titled “CA Rotation (k0s)”If you need to invalidate every client certificate — a leaked key, a departing team member with a long-lived cert — rotate the cluster CA. For a two-machine homelab, this takes 10-15 minutes:
# 1. Stop k0s on all nodessudo k0s stop # controllerssh vidar sudo k0s stop # worker
# 2. Delete the CA and SA keys on the controllersudo rm /var/lib/k0s/pki/ca.key /var/lib/k0s/pki/ca.crtsudo rm /var/lib/k0s/pki/sa.key /var/lib/k0s/pki/sa.pub
# 3. Restart k0s — it regenerates a new CAsudo k0s start
# 4. Get a new admin kubeconfigsudo k0s kubeconfig admin > ~/.kube/config
# 5. Generate a rejoin token for the workersudo k0s token create --role worker > /tmp/worker-token
# 6. On the worker: clear old certs and rejoinssh vidar sudo rm -rf /var/lib/k0s/kubelet.conf /var/lib/k0s/pki/# Copy the token to the worker and rejoinEvery previously issued client certificate is now untrusted. Reissue kubeconfigs for your team with k0s kubeconfig create. Workloads stay on disk and come back up with the cluster.
With SSH as the access gate, you will rarely need this. SSH key removal is instant and sufficient. CA rotation is the nuclear option for when you believe a cert has been leaked and the API server is reachable without SSH.
Security Checklist
Section titled “Security Checklist”What a production cluster needs:
- RBAC configured — no default service account usage
- Network policies in every namespace
- Resource quotas and limit ranges
- Non-root containers enforced
- Read-only root filesystem where possible
- Image pull policies set to
Always - Secrets encrypted at rest (SOPS + age)
- TLS everywhere (cert-manager)
- Audit logging enabled
- Runtime monitoring (Falco)
Further reading
Section titled “Further reading”- Kubernetes RBAC documentation
- Kubernetes Network Policies
- k0s authentication configuration
- Network policy recipes
- Interactive network policy editor
Related tutorials
Section titled “Related tutorials”- Security Linting — automated checks with kube-linter, Polaris, and kubescape
- Secrets Management — SOPS and age encryption workflow
- Development Workflow — environment isolation across dev, staging, and production
- Traefik Ingress — TLS configuration with cert-manager
- Installing k0s — k0s-specific setup and configuration