Monitoring with Prometheus and Grafana

The monitoring stack runs three components: Prometheus collects and stores metrics, Grafana visualises them, and Alertmanager routes alerts. All three are deployed as a single Helm release from the prometheus-community chart repository, managed by Flux.

Directory structure

Everything lives under infrastructure/base/monitoring/:

infrastructure/base/monitoring/
├── kustomization.yaml
├── helmrepository.yaml
├── helmrelease.yaml
└── grafana-secret.yaml

The Kustomization lists all three manifests:

apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
resources:
  - grafana-secret.yaml
  - helmrepository.yaml
  - helmrelease.yaml

Helm source

helmrepository.yaml points Flux at the upstream chart index:

apiVersion: source.toolkit.fluxcd.io/v1
kind: HelmRepository
metadata:
  name: prometheus-community
  namespace: flux-system
spec:
  interval: 1h0m0s
  url: https://prometheus-community.github.io/helm-charts

Flux refreshes this index every hour and notifies the HelmRelease controller when a new chart version appears.

HelmRelease

helmrelease.yaml pins chart version 82.15.0 and installs everything into the monitoring namespace:

apiVersion: helm.toolkit.fluxcd.io/v2
kind: HelmRelease
metadata:
  name: kube-prometheus
  namespace: flux-system
spec:
  chart:
    spec:
      chart: kube-prometheus-stack
      reconcileStrategy: ChartVersion
      sourceRef:
        kind: HelmRepository
        name: prometheus-community
      version: 82.15.0
  interval: 5m0s
  releaseName: kube-prometheus
  storageNamespace: monitoring
  targetNamespace: monitoring
  values: ...

The storageNamespace and targetNamespace both point to monitoring. Flux reconciles the release every five minutes, applying any drift back to the declared state.

Grafana credentials

Grafana admin credentials live in grafana-secret.yaml, encrypted with SOPS and age. The file holds two fields — admin-user and admin-password — both encrypted at rest in Git. SOPS decrypts them during reconciliation using the cluster’s age key.

The HelmRelease references the secret by name rather than embedding plaintext values:

grafana:
  admin:
    existingSecret: grafana-admin

When Helm renders the chart, it reads credentials from the grafana-admin Secret in the monitoring namespace. The secret name in the encrypted file matches this reference.

Persistent storage

Without persistence, Prometheus and Grafana lose all data on pod restart. The HelmRelease configures PVCs for all three components:

Component	Size	Access mode
Prometheus	20 GiB	ReadWriteOnce
Grafana	10 GiB	ReadWriteOnce
Alertmanager	2 GiB	ReadWriteOnce

ReadWriteOnce means one node mounts the volume at a time — appropriate for single-replica deployments.

Kubernetes does not support an unbounded PVC size. A concrete storage request is required; if omitted, Helm falls back to the chart default.

Reclaim policy risk

The default storage class on this cluster uses reclaim policy Delete. If you remove the HelmRelease from Git, or run flux delete helmrelease, Flux uninstalls the release. That deletes the PVCs, and the Delete policy then destroys the backing volumes.

To protect a volume, patch its reclaim policy to Retain after Kubernetes binds it:

kubectl get pvc -n monitoring
kubectl get pv
kubectl patch pv <pv-name> -p '{"spec":{"persistentVolumeReclaimPolicy":"Retain"}}'

With Retain, deleting the PVC moves the PV to Released state rather than destroying the disk. Recovery requires manually re-binding a new PVC to that volume.

Accessing Grafana

IngressRoute (recommended)

The local overlay adds a Traefik IngressRoute that routes grafana.k8s.local to the Grafana service:

apiVersion: traefik.io/v1alpha1
kind: IngressRoute
metadata:
  name: grafana
  namespace: monitoring
spec:
  entryPoints:
    - websecure
  routes:
    - match: Host(`grafana.k8s.local`)
      kind: Rule
      services:
        - name: kube-prometheus-grafana
          port: 80
  tls: {}

The route uses the websecure entrypoint (port 443) with TLS enabled. Add grafana.k8s.local to your /etc/hosts file pointing at the cluster’s load balancer address, then open https://grafana.k8s.local.

Port-forward

If the IngressRoute is unavailable or you want direct access without DNS, forward the Grafana service port to your local machine:

kubectl port-forward -n monitoring svc/kube-prometheus-grafana 3000:80

Then open http://localhost:3000. Log in with the credentials stored in the grafana-admin secret.

To retrieve the password:

kubectl get secret -n monitoring grafana-admin \
  -o jsonpath="{.data.admin-password}" | base64 -d; echo

Verify the deployment

After committing the manifests and pushing, trigger reconciliation and check status:

flux reconcile source git flux-system
flux get helmreleases -A
kubectl get pods -n monitoring

A healthy deployment shows the operator, Grafana, Prometheus, node-exporter, and kube-state-metrics pods all running. The Alertmanager pod may enter a crash loop on first deploy if the node’s inotify limit is too low — see the inotify troubleshooting guide for the fix.