Choosing a Cloud Provider
Decision heuristic
Section titled “Decision heuristic”Start with DigitalOcean unless the client gives you a reason to move up. Those reasons are usually:
- Compliance requirements (HIPAA, PCI) — AWS or Azure
- Existing Microsoft identity (Entra ID) — Azure
- Heavy data/ML workloads (BigQuery) — GCP
Provider matrix
Section titled “Provider matrix”| Provider | Setup | Cost | Enterprise fit | Tradeoffs |
|---|---|---|---|---|
| DigitalOcean | Low | $ | Starter | Simple DOKS, managed DBs, flat pricing, great docs. No IAM policies, no compliance certs beyond SOC2, limited regions, no WAF/DDoS beyond basic. |
| AWS | High | $$$ | Full | Deepest service catalogue. EKS, RDS, CloudFront, IAM, compliance (HIPAA, PCI, FedRAMP). Billing is complex; egress fees add up. OpenTofu support is strongest here. |
| Google Cloud | Medium | $$ | Strong | Best managed K8s (GKE Autopilot). Strong networking, good BigQuery/AI story. Fewer services than AWS; sustained-use discounts help. IAM is project-scoped. |
| Azure | Medium | $$$ | Full | Best fit when clients use M365/Entra ID. AKS is solid, AD integration is unmatched. Compliance breadth rivals AWS. Portal UX is rough; naming is inconsistent. |
DigitalOcean is ideal for current client hosting — already running DOKS + Flux + Traefik. The lack of IAM policies won’t matter for most clients since they don’t care how the sausage is made. Where it bites: if a corporate client asks for an audit trail of who accessed what, or wants scoped access for their own IT team to view logs without touching production, DO can’t do that — it’s all-or-nothing API tokens. That’s the kind of thing that forces a move to AWS/Azure, not because you need it, but because their procurement or security team will ask for it on a checklist.
AWS has the most mature OpenTofu providers and community modules. The learning curve is the IAM + networking model, not the compute itself.
GKE Autopilot removes most node management overhead. If a client needs Kubernetes but you don’t want to manage nodes, GCP is the lowest-friction option.
Azure wins by default when the client’s org is already on Microsoft — SSO, AD groups, and licence bundling make the business case easy.
Module layout
Section titled “Module layout”Both active providers follow the same file structure under tofu/remote/:
Directorytofu/
Directoryremote/
- shared.tfvars values common to all providers (
cluster_name,registry_name) Directorydigitalocean/
- main.tf provisions VPC, container registry, and DOKS cluster
- variables.tf input schema — region, node size, autoscaling, maintenance window
- outputs.tf cluster ID, endpoint, kubeconfig, registry endpoint
- versions.tf pins DO provider
~> 2.44, OpenTofu>= 1.6.0 - terraform.tfvars concrete values for this provider
Directorygoogle/
- main.tf provisions VPC, subnet with pod/service CIDRs, GKE cluster, node pool, Artifact Registry
- variables.tf input schema — project, region, zone, machine type, Spot VMs, networking CIDRs
- outputs.tf cluster ID, endpoint, CA cert, network/subnet names, registry URL
- versions.tf pins Google provider
~> 6.0, OpenTofu>= 1.6.0 - terraform.tfvars concrete values for this provider
- shared.tfvars values common to all providers (
Shared values flow through variable precedence — see OpenTofu Variables for the full schema.
Dry-run workflow
Section titled “Dry-run workflow”Both providers support tofu plan with free read-only API calls. Validate syntax first (no credentials required), then preview changes:
tofu validatetofu plan -var-file=../shared.tfvars -out=tfplanThe saved plan file can later be applied exactly as previewed with tofu apply tfplan.
Credential setup
Section titled “Credential setup”| Provider | What you need |
|---|---|
| DigitalOcean | Single API token (DIGITALOCEAN_TOKEN) |
| AWS | IAM user + policy, or SSO session |
| Google Cloud | Service account key, or ADC via gcloud auth application-default login |
| Azure | Service principal, or az login session |
DigitalOcean specifics
Section titled “DigitalOcean specifics”The digitalocean_kubernetes_versions data source resolves the latest Kubernetes version at plan time rather than relying on the "latest" slug, which the DO API only resolves at creation time. This prevents plan diffs where state and config disagree on the version string. See digitalocean/terraform-provider-digitalocean#997.
Confirm allowable values with doctl before changing terraform.tfvars:
doctl kubernetes options versionsdoctl kubernetes options sizesdoctl kubernetes options regionsGoogle Cloud specifics
Section titled “Google Cloud specifics”GKE uses a zonal cluster (location = zone) instead of regional. A zonal cluster gets a free control plane under the GKE free tier; regional clusters charge for it.
The default node pool is removed immediately and replaced by a separately managed google_container_node_pool. This avoids lifecycle issues where changes to node config force cluster recreation.
Confirm allowable values with gcloud container before changing terraform.tfvars:
gcloud container get-server-config --region=REGIONgcloud compute machine-types list --zones=ZONEgcloud compute regions listCost estimation
Section titled “Cost estimation”| Provider | Approach |
|---|---|
| DigitalOcean | Infracost does not support DO resources. Estimate manually from the pricing page. After deployment, kb infra do clusters calculates monthly cost from live node data via doctl compute size list. |
| Google Cloud | Infracost parses the config locally: infracost breakdown --path . --terraform-var-file ../shared.tfvars. Free-tier zonal clusters are not costed; node pool compute is the main line item. See supported GCP resources. |
| AWS / Azure | Infracost supports both. Run the same infracost breakdown command against the module directory. |