Skip to content

Choosing a Cloud Provider

Start with DigitalOcean unless the client gives you a reason to move up. Those reasons are usually:

  • Compliance requirements (HIPAA, PCI) — AWS or Azure
  • Existing Microsoft identity (Entra ID) — Azure
  • Heavy data/ML workloads (BigQuery) — GCP
ProviderSetupCostEnterprise fitTradeoffs
DigitalOceanLow$StarterSimple DOKS, managed DBs, flat pricing, great docs. No IAM policies, no compliance certs beyond SOC2, limited regions, no WAF/DDoS beyond basic.
AWSHigh$$$FullDeepest service catalogue. EKS, RDS, CloudFront, IAM, compliance (HIPAA, PCI, FedRAMP). Billing is complex; egress fees add up. OpenTofu support is strongest here.
Google CloudMedium$$StrongBest managed K8s (GKE Autopilot). Strong networking, good BigQuery/AI story. Fewer services than AWS; sustained-use discounts help. IAM is project-scoped.
AzureMedium$$$FullBest fit when clients use M365/Entra ID. AKS is solid, AD integration is unmatched. Compliance breadth rivals AWS. Portal UX is rough; naming is inconsistent.

DigitalOcean is ideal for current client hosting — already running DOKS + Flux + Traefik. The lack of IAM policies won’t matter for most clients since they don’t care how the sausage is made. Where it bites: if a corporate client asks for an audit trail of who accessed what, or wants scoped access for their own IT team to view logs without touching production, DO can’t do that — it’s all-or-nothing API tokens. That’s the kind of thing that forces a move to AWS/Azure, not because you need it, but because their procurement or security team will ask for it on a checklist.

AWS has the most mature OpenTofu providers and community modules. The learning curve is the IAM + networking model, not the compute itself.

GKE Autopilot removes most node management overhead. If a client needs Kubernetes but you don’t want to manage nodes, GCP is the lowest-friction option.

Azure wins by default when the client’s org is already on Microsoft — SSO, AD groups, and licence bundling make the business case easy.

Both active providers follow the same file structure under tofu/remote/:

  • Directorytofu/
    • Directoryremote/
      • shared.tfvars values common to all providers (cluster_name, registry_name)
      • Directorydigitalocean/
        • main.tf provisions VPC, container registry, and DOKS cluster
        • variables.tf input schema — region, node size, autoscaling, maintenance window
        • outputs.tf cluster ID, endpoint, kubeconfig, registry endpoint
        • versions.tf pins DO provider ~> 2.44, OpenTofu >= 1.6.0
        • terraform.tfvars concrete values for this provider
      • Directorygoogle/
        • main.tf provisions VPC, subnet with pod/service CIDRs, GKE cluster, node pool, Artifact Registry
        • variables.tf input schema — project, region, zone, machine type, Spot VMs, networking CIDRs
        • outputs.tf cluster ID, endpoint, CA cert, network/subnet names, registry URL
        • versions.tf pins Google provider ~> 6.0, OpenTofu >= 1.6.0
        • terraform.tfvars concrete values for this provider

Shared values flow through variable precedence — see OpenTofu Variables for the full schema.

Both providers support tofu plan with free read-only API calls. Validate syntax first (no credentials required), then preview changes:

Terminal window
tofu validate
tofu plan -var-file=../shared.tfvars -out=tfplan

The saved plan file can later be applied exactly as previewed with tofu apply tfplan.

ProviderWhat you need
DigitalOceanSingle API token (DIGITALOCEAN_TOKEN)
AWSIAM user + policy, or SSO session
Google CloudService account key, or ADC via gcloud auth application-default login
AzureService principal, or az login session

The digitalocean_kubernetes_versions data source resolves the latest Kubernetes version at plan time rather than relying on the "latest" slug, which the DO API only resolves at creation time. This prevents plan diffs where state and config disagree on the version string. See digitalocean/terraform-provider-digitalocean#997.

Confirm allowable values with doctl before changing terraform.tfvars:

Terminal window
doctl kubernetes options versions
doctl kubernetes options sizes
doctl kubernetes options regions

GKE uses a zonal cluster (location = zone) instead of regional. A zonal cluster gets a free control plane under the GKE free tier; regional clusters charge for it.

The default node pool is removed immediately and replaced by a separately managed google_container_node_pool. This avoids lifecycle issues where changes to node config force cluster recreation.

Confirm allowable values with gcloud container before changing terraform.tfvars:

Terminal window
gcloud container get-server-config --region=REGION
gcloud compute machine-types list --zones=ZONE
gcloud compute regions list
ProviderApproach
DigitalOceanInfracost does not support DO resources. Estimate manually from the pricing page. After deployment, kb infra do clusters calculates monthly cost from live node data via doctl compute size list.
Google CloudInfracost parses the config locally: infracost breakdown --path . --terraform-var-file ../shared.tfvars. Free-tier zonal clusters are not costed; node pool compute is the main line item. See supported GCP resources.
AWS / AzureInfracost supports both. Run the same infracost breakdown command against the module directory.