Provisioning compute instances with Chef from Terraform

The problem

When Terraform 0.13.4 was released, various provisioners, including the Chef provisioner, were deprecated with no replacement. They were later removed in Terraform 0.15.

You’d get a warning about this such as this one:

Warning: The "chef" provisioner is deprecated

  on example.tf line 52, in resource "google_compute_instance" "example":
 42:   provisioner "chef" {

The "chef" provisioner is deprecated and will be removed from future versions of Terraform. Visit https://learn.hashicorp.com/collections/terraform/provision for alternatives to using provisioners that are a better fit for the Terraform workflow.

As we had hundreds of VMs set up already using this provisioner, we’d needed a drop-in replacement that would not hinder the current workflow and pipelines.

The solution

We decided to replace the Chef provisioner with a custom remote-exec provisioner. One of the requirements for us was that development machines and mostly our CI setup would not need to have any of the Chef tooling installed locally. In “the old days”, one would have used knife bootstrap to provision a VM. We no longer want this cumbersome dependency and require our VMs to be able to fully bootstrap themselves.

We ended up writing a custom shell script that gets run through a remote-exec provisioner during the creation of the instance.

Hence we cooked up the following:

#! /bin/bash

set -o errtrace
set -o errexit
set -o nounset
set -o pipefail
set +x trace

# download and install the desired Chef version uing the Chef Omnitruck install script
sudo curl --silent --show-error --fail --output /tmp/chef_install.sh https://omnitruck.chef.io/install.sh
bash -n /tmp/chef_install.sh && sudo bash /tmp/chef_install.sh -v ${chef_client_version}

# wipe any existing Chef configs (there shouldn't be any)
sudo rm -rf /etc/chef
sudo mkdir /etc/chef

# set up Chef client.rb
chef_client_config=$(cat <<CHEF_CLIENT_CONFIG
log_location    STDOUT
chef_server_url "https://example.com/organizations/example/"
node_name       "${node_name}"
ssl_verify_mode :verify_peer
validation_client_name "${validation_client_name}"
validation_key         "/etc/chef/validator.pem"
CHEF_CLIENT_CONFIG
)
echo "$chef_client_config" | sudo tee /etc/chef/client.rb > /dev/null

# set up the bootstrap JSON
first_boot_json=$(cat <<FIRST_BOOT_JSON
${jsonencode(attributes)}
FIRST_BOOT_JSON
)
echo "$first_boot_json" | sudo tee /etc/chef/first-boot.json > /dev/null

# set up the Chef validation key
validator_pem=$(cat <<VALIDATOR_PEM
${validation_key}
VALIDATOR_PEM
)
echo "$validator_pem" | sudo tee /etc/chef/validator.pem > /dev/null

# delete the node in case it exists already
sudo knife node show \
  --config /etc/chef/client.rb \
  --key /etc/chef/validator.pem \
  --config-option node_name=${validation_client_name} \
  ${node_name} > /dev/null 2>&1 \
&& sudo knife node delete \
  --config /etc/chef/client.rb \
  --key /etc/chef/validator.pem \
  --config-option node_name=${validation_client_name} \
  --yes \
  ${node_name}

# perform the initial Chef run (thus bootstrapping the node)
sudo chef-client --environment ${chef_environment} --json-attributes /etc/chef/first-boot.json

# clean up the Chef validation key and configs
sudo rm /etc/chef/validator.pem
sudo sed -i '/^validation_/d' /etc/chef/client.rb
files/provisioning/chef.sh.tftpl
resource "google_compute_instance" "example" {
  name         = "example"
  project      = "example"
  machine_type = "n2-standard-2"
  zone         = "europe-west1-c"

  boot_disk {
    auto_delete = true

    initialize_params {
      size  = 10
      image = "ubuntu-os-cloud/ubuntu-2004-lts"
    }
  }

  network_interface {
    network = "default"

    access_config {
      # ephemeral IP
    }
  }

  provisioner "remote-exec" {
    inline = [templatefile("files/provisioning/chef.sh.tftpl", {
      validation_client_name = var.chef_user_name
      validation_key         = file(var.chef_user_key_path)
      chef_client_version    = var.chef_client_version
      chef_environment       = var.name
      node_name              = "example.com"
      attributes = {
        "run_list" = [
          "recipe[base]"
        ]
        "base" = {
          "example" = self.name
        }
      }
    })]

    connection {
      host        = self.network_interface[0].access_config[0].nat_ip
      type        = "ssh"
      user        = var.provisioner_connection_ssh_user
      private_key = file(var.provisioner_connection_ssh_private_key_path)
    }
  }
}
example.tf
variable "chef_user_key_path" {
  description = "Location of the Chef user key to user for provisioning"
}

variable "chef_user_name" {
  description = "User name associated with the given Chef user key"
}

variable "chef_client_version" {
  description = "The version of Chef Infra Client"
}

variable "provisioner_connection_ssh_user" {
  description = "User used to open SSH connections when provisioning instances"
}

variable "provisioner_connection_ssh_private_key_path" {
  description = "Location of the private key to use when provisioning instances"
}
variables.tf

The legacy

There are valid reasons why provisioning is something that should not be part of Terraform’s work. We’re not only looking at moving the provisioning out of the code, we’re looking at eliminating the need to provision software with Chef at all.
For this we’re looking at a possibility to move some of the provisioned software to a Kubernetes cluster, or a managed service, to eliminate the overhead that comes with configuration management. In the same breath, we’re also looking at moving away from Chef as a whole, but that’ll be a story for another blogpost.