Lets say you have a config repo for your infrastructure - an initial pass might look like this:
~/config$ tree
.
├── httpd.tf
├── influxdb.tf
├── postgres.tf
├── providers.tf
└── terraform.tfstate
In this environment, we have some web servers, a postgres database, and an influxdb database for collecting metrics.
InfluxDB in this example is a volume, deployment, and a service in kubernetes:
resource "kubernetes_service" "influxdb" {
wait_for_load_balancer = "false"
metadata {
name = "influxdb"
}
spec {
selector = {
app = "influxdb"
}
port {
port = 8086
target_port = 8086
}
type = "LoadBalancer"
external_ips = ["10.0.100.4"]
}
}
resource "kubernetes_deployment" "influxdb" {
metadata {
name = "influxdb"
}
spec {
replicas = 1
selector {
match_labels = {
app = "influxdb"
}
}
template {
metadata {
labels = {
app = "influxdb"
}
}
spec {
container {
image = "influxdb:2.4"
name = "influxdb"
volume_mount {
name = "influxdb-volume"
mount_path = "/var/lib/influxdb2"
}
}
volume {
name = "influxdb-volume"
persistent_volume_claim {
claim_name = "influxdb"
}
}
}
}
}
}
resource "kubernetes_persistent_volume_claim" "influxdb" {
metadata {
name = "influxdb"
}
spec {
access_modes = ["ReadWriteOnce"]
resources {
requests = {
storage = "64Gi"
}
}
storage_class_name = "rook-ceph-block"
}
}
This alone is a good step toward accountability for our infrastructure, but as soon as it comes time to perform maintenance or upgrades on our services, the question quickly becomes “ok, so how do we actually reproduce our so-called ‘reproducable’ infrastructure?”. Lets say we want to rehearse an upgrade of our influxdb deployment, from version 2.4 to 2.6 of the image - prior to affecting the production service.
The naive approach would of course be to copy-paste influxdb.tf
, and change whatever we need in the new copy:
~/config$ tree
.
├── httpd.tf
├── influxdb-staging.tf
├── influxdb.tf
├── postgres.tf
├── providers.tf
└── terraform.tfstate
That works right? Sure does; but sadly in my professional experience, that sort of thinking is going to lead to a mess:
~/config$ tree
.
├── httpd-qa-1.tf
├── httpd-qa.tf
├── httpd-staging.tf
├── httpd.tf
├── influxdb-dev-john.tf
├── influxdb-dev.tf
├── influxdb-prod.tf
├── influxdb-staging.tf
├── influxdb.tf
├── postgres-14.tf
├── postgres.tf
├── providers.tf
└── terraform.tfstate
This is the failed state of configuration management. We ultimately have not ‘solved’ the matter of complexity and unrestrained mutable state in our infrastructure; all we’ve done is move the problem.
So how do we do better? - by creating reusable modules with parameters to allow for simultaneous deployment of any and all variations of our stack. For influxdb, that might look like this:
variable "context" {}
variable "ip" {}
variable "image" {}
variable "storage" {}
variable "port" {
default = 8086
}
variable "stack" {
default = "influxdb"
}
resource "kubernetes_service" "main" {
wait_for_load_balancer = "false"
metadata {
name = "${var.stack}-${var.context}"
}
spec {
selector = {
app = "${var.stack}-${var.context}"
}
port {
port = var.port
target_port = var.port
}
type = "LoadBalancer"
external_ips = ["${var.ip}"]
}
}
resource "kubernetes_deployment" "main" {
metadata {
name = "${var.stack}-${var.context}"
}
spec {
replicas = 1
selector {
match_labels = {
app = "${var.stack}-${var.context}"
}
}
template {
metadata {
labels = {
app = "${var.stack}-${var.context}"
}
}
spec {
container {
image = var.image
name = "${var.stack}-${var.context}"
volume_mount {
name = "${var.stack}-${var.context}"
mount_path = "/var/lib/influxdb2"
}
}
volume {
name = "${var.stack}-${var.context}"
persistent_volume_claim {
claim_name = "${var.stack}-${var.context}"
}
}
}
}
}
}
resource "kubernetes_persistent_volume_claim" "main" {
metadata {
name = "${var.stack}-${var.context}"
}
spec {
access_modes = ["ReadWriteOnce"]
resources {
requests = {
storage = var.storage
}
}
storage_class_name = "rook-ceph-block"
}
}
ip
, image
, and storage
have been parameterized, and context
to provide a namespace tag.
What we’ve also done is put that module in its own git repo - which we can import into our primary infrastructure as a submodule:
$ mkdir modules
$ git -C modules/ submodule add [email protected]:nihr43/influxdb-tf.git
$ tree
.
├── httpd.tf
├── influxdb.tf
├── modules
│ └── influxdb-tf
│ ├── main.tf
│ ├── Makefile
│ └── README.md
├── postgres.tf
├── providers.tf
└── terraform.tfstate
The role of influxdb.tf
now is to define our varying instances of the stack:
module "influxdb-prod" {
source = "./modules/influxdb-tf"
context = "prod"
ip = "10.0.100.100"
image = "influxdb:2.4"
storage = "256Gi"
}
module "influxdb-staging" {
source = "./modules/influxdb-tf"
context = "staging"
ip = "10.0.100.101"
image = "influxdb:2.5"
storage = "8Gi"
}
Want to test an upgrade? Go to influxdb.tf
and bump the image version of the influxdb-staging instance. Screwed it up? terraform taint
the deployment and rebuild it.
As an added benefit - with the base influxdb module in its own repo, it would be straightforward for the development and testing of that module itself to happen independently of our git repo and infrastructure - perhaps even of our team.