Skip to content

Monitoring with Prometheus and Grafana

The monitoring stack runs three components: Prometheus collects and stores metrics, Grafana visualises them, and Alertmanager routes alerts. All three are deployed as a single Helm release from the prometheus-community chart repository, managed by Flux.

Everything lives under infrastructure/base/monitoring/:

infrastructure/base/monitoring/
├── kustomization.yaml
├── helmrepository.yaml
├── helmrelease.yaml
└── grafana-secret.yaml

The Kustomization lists all three manifests:

infrastructure/base/monitoring/kustomization.yaml
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
resources:
- grafana-secret.yaml
- helmrepository.yaml
- helmrelease.yaml

helmrepository.yaml points Flux at the upstream chart index:

apiVersion: source.toolkit.fluxcd.io/v1
kind: HelmRepository
metadata:
name: prometheus-community
namespace: flux-system
spec:
interval: 1h0m0s
url: https://prometheus-community.github.io/helm-charts

Flux refreshes this index every hour and notifies the HelmRelease controller when a new chart version appears.

helmrelease.yaml pins chart version 82.15.0 and installs everything into the monitoring namespace:

apiVersion: helm.toolkit.fluxcd.io/v2
kind: HelmRelease
metadata:
name: kube-prometheus
namespace: flux-system
spec:
chart:
spec:
chart: kube-prometheus-stack
reconcileStrategy: ChartVersion
sourceRef:
kind: HelmRepository
name: prometheus-community
version: 82.15.0
interval: 5m0s
releaseName: kube-prometheus
storageNamespace: monitoring
targetNamespace: monitoring
values: ...

The storageNamespace and targetNamespace both point to monitoring. Flux reconciles the release every five minutes, applying any drift back to the declared state.

Grafana admin credentials live in grafana-secret.yaml, encrypted with SOPS and age. The file holds two fields — admin-user and admin-password — both encrypted at rest in Git. SOPS decrypts them during reconciliation using the cluster’s age key.

The HelmRelease references the secret by name rather than embedding plaintext values:

grafana:
admin:
existingSecret: grafana-admin

When Helm renders the chart, it reads credentials from the grafana-admin Secret in the monitoring namespace. The secret name in the encrypted file matches this reference.

Without persistence, Prometheus and Grafana lose all data on pod restart. The HelmRelease configures PVCs for all three components:

ComponentSizeAccess mode
Prometheus20 GiBReadWriteOnce
Grafana10 GiBReadWriteOnce
Alertmanager2 GiBReadWriteOnce

ReadWriteOnce means one node mounts the volume at a time — appropriate for single-replica deployments.

Kubernetes does not support an unbounded PVC size. A concrete storage request is required; if omitted, Helm falls back to the chart default.

The default storage class on this cluster uses reclaim policy Delete. If you remove the HelmRelease from Git, or run flux delete helmrelease, Flux uninstalls the release. That deletes the PVCs, and the Delete policy then destroys the backing volumes.

To protect a volume, patch its reclaim policy to Retain after Kubernetes binds it:

Terminal window
kubectl get pvc -n monitoring
kubectl get pv
kubectl patch pv <pv-name> -p '{"spec":{"persistentVolumeReclaimPolicy":"Retain"}}'

With Retain, deleting the PVC moves the PV to Released state rather than destroying the disk. Recovery requires manually re-binding a new PVC to that volume.

The local overlay adds a Traefik IngressRoute that routes grafana.k8s.local to the Grafana service:

infrastructure/overlays/local/ingressroute-grafana.yaml
apiVersion: traefik.io/v1alpha1
kind: IngressRoute
metadata:
name: grafana
namespace: monitoring
spec:
entryPoints:
- websecure
routes:
- match: Host(`grafana.k8s.local`)
kind: Rule
services:
- name: kube-prometheus-grafana
port: 80
tls: {}

The route uses the websecure entrypoint (port 443) with TLS enabled. Add grafana.k8s.local to your /etc/hosts file pointing at the cluster’s load balancer address, then open https://grafana.k8s.local.

If the IngressRoute is unavailable or you want direct access without DNS, forward the Grafana service port to your local machine:

Terminal window
kubectl port-forward -n monitoring svc/kube-prometheus-grafana 3000:80

Then open http://localhost:3000. Log in with the credentials stored in the grafana-admin secret.

To retrieve the password:

Terminal window
kubectl get secret -n monitoring grafana-admin \
-o jsonpath="{.data.admin-password}" | base64 -d; echo

After committing the manifests and pushing, trigger reconciliation and check status:

Terminal window
flux reconcile source git flux-system
flux get helmreleases -A
kubectl get pods -n monitoring

A healthy deployment shows the operator, Grafana, Prometheus, node-exporter, and kube-state-metrics pods all running. The Alertmanager pod may enter a crash loop on first deploy if the node’s inotify limit is too low — see the inotify troubleshooting guide for the fix.