Deploying a compute cluster in OpenStack via Terraform

Authors:

Overview
Questions:

What is Terraform?

In which situations is it good/bad?

How to use it for managing your VM cluster

Objectives:

Learn Terraform basics

Launch a VM with Terraform

Launch and tear down a cluster with Terraform

Time estimation: 60 minutes

Supporting Materials:

Slides

FAQs

Last modification: Oct 18, 2022

License: Tutorial Content is licensed under Creative Commons Attribution 4.0 International License The GTN Framework is licensed under MIT

Overview

In this tutorial we will briefly cover what Terraform is and how you can leverage it for your needs. This will not make you an expert on Terraform but will give you the tools you need in order to maintain your cloud infrastructure as code.

This will be a very practical training with emphasis on looking at examples from modules and becoming self sufficient. This tutorial uses the OpenStack provider for Terraform. Other Cloud providers are available, but their invocation will be different from that which is described here.

Agenda

Overview

Cattle vs. Pets

What is Terraform?

What can be managed

What should be managed

Managing a Single VM

Keypair

Adding an Instance

Managing a Cluster of VMs

Configuration

cloud-init

Login / HTCondor / A Test Job

Infrastructure Graph

Tearing Everything Down

Destroy time provisioners

UseGalaxy.eu’s Terraform Usage

Cattle vs. Pets

For those of you with more experience as System Administrators, you may have heard the term “cattle vs. pets.”

Pets: These servers are managed by hand, with admins SSHing in to make changes directly on the machine, and no log of this. If a “pet” dies, everyone is sad.
Cattle: These servers are managed in bulk, you no longer own a beloved pet but hundreds of identical ones. You stop caring if an individual server lives or dies as you have many identical ones ready to take its place, and you can easily replace them.

What is Terraform?

Terraform is a tool you can use to “sync” your infrastructure (VMs in Clouds, DNS records, etc.) with code you have written in configuration files. This is known as infrastructure as code. Knowing that your infrastructure is exactly what you expect it to be can simplify your operations significantly. You can have confidence that if anything changes, any images crash or are accidentally deleted, that you can immediately re-build your infrastructure.

UseGalaxy.eu used Terraform to migrate easily between clouds, when our old cloud shut down and our new cloud was launched. We simply updated the credentials for connecting to the cloud, and ran Terraform. It recognised that none of our existing resources were there and recreated all of them. This made life incredibly easy for us, we knew that our infrastructure would be exactly correct relative to what it looked like in the previous cloud.

What can be managed

Terraform supports many providers, allowing you to easily manage resources no matter where they are located. Primarily this consists of resources like virtual machines and DNS records.

If you have VMs you are managing, whether for individual projects or for large scale infrastructure like UseGalaxy.eu, your work or research can be simpler and more reproducible if you are doing it with automation like Terraform.

What should be managed

Support for certain databases and other software is available, but whether or not this is a good idea depends heavily on your workflow. UseGalaxy.eu found that managing databases and database users was not a good fit with our workflow. We launch VMs with terraform and then provision them with ansible. To then have a second step where we re-connect and provision the database with terraform was an awkward workflow, so we mostly let Ansible manage these resources.

Some groups use Ansible or Bash scripts in order to launch VMs. This can be a better workflow for certain cases. Launching VMs that manage themselves or are automatically scaled by some external process is not a good fit. Terraform expects the resource to be there the next time, with the same ID, with the same values, and will alert you if it isn’t.

Managing a Single VM

We will start small, by managing a single VM in our cloud account. Make sure you have your OpenStack credentials available.

You can download the environment file with the credentials from the OpenStack dashboard.

Log in to the OpenStack dashboard, choose the project for which you want to download the OpenStack RC file, and click “Access & Security”.

Click “Download OpenStack RC File” and save the file.

Keypair

Terraform reads all files with the extension .tf in your current directory. Resources can be in a single file, or organised across several different files. We had the best experience by separating out logical groups of resources:

DNS entries in a single file
Security groups in separate .tf files
Keypairs in a single file
Servers/Instances in individual files.

Hands-on: Setting up Terraform
Install Terraform

Create a new directory and go into it.
Create a providers.tf file with the following contents:
provider "openstack" {}
This specifies the configuration for the OpenStack plugin. You can either specify the configuration in the plugin, or it will automatically load the values from the normal OpenStack environment variable names.
Run terraform init
Question

What did the output look like?
If this is not the first time you’ve run it, you will see Terraform download any missing providers:
Initializing provider plugins...
- Checking for available provider plugins on https://releases.hashicorp.com...
- Downloading plugin for provider "openstack" (1.10.0)...

The following providers do not have any version constraints in configuration,
so the latest version was installed.

To prevent automatic upgrades to new major versions that may contain breaking
changes, it is recommended to add version = "..." constraints to the
corresponding provider blocks in configuration, with the constraint strings
suggested below.

* provider.openstack: version = "~> 1.10"

Terraform has been successfully initialized!

You may now begin working with Terraform. Try running "terraform plan" to see
any changes that are required for your infrastructure. All Terraform commands
should now work.

If you ever set or change modules or backend configuration for Terraform,
rerun this command to reinitialize your working directory. If you forget, other
commands will detect it and remind you to do so if necessary.

This is the minimum to get started with Terraform, defining which providers we will use and any parameters for them. We have two options now:

We can load the OpenStack credentials as environment variables
We can code the OpenStack credentials in the providers.tf file

We recommend the first option, as often Terraform plans are made publicly available in order to collaborate on them, and this prevents accidentally committing your credentials to the git repository where.

We will start by managing your SSH keypair for the cloud as this is an easy thing to add.

Comment: Your Operating System

If you are on Windows and do not know your public key, please skip to the “Adding an Instance” section, as you probably do not have the tools installed to do this, and we cannot document how to do it. Instead you can simply reference the key by name later. We will point out this location.

Hands-on: Keypairs
Find the public key you will use for connecting to the your new VM. It is usually known as id_rsa.pub
Comment: No public key

If you can find the private key file (possibly a cloud.pem file you downloaded earlier from OpenStack), then you can find the public key by running the command:
$ ssh-keygen -y -f /path/to/key.pem
Create a new file, main.tf with the following structure. In public_key, write the complete value of your public key that you found in the first step.
resource "openstack_compute_keypair_v2" "my-cloud-key" {
  name       = "my-key"
  public_key = "ssh-rsa AAAAB3Nz..."
}
Run terraform plan
Question

What does the output look like? Did it execute successfully?
If you have not sourced your OpenStack credentials file, you will see something like the following:
Error: Error running plan: 1 error(s) occurred:

* provider.openstack: One of 'auth_url' or 'cloud' must be specified
You should source your openstack credentials first. This is the file which has lines like:
export OS_AUTH_URL=https://...
export OS_PROJECT_ID=...
export OS_PROJECT_NAME="..."
export OS_USER_DOMAIN_NAME="Default"
You should run source /path/to/file.sh in your terminal, and then re-run terraform plan:
An execution plan has been generated and is shown below.
Resource actions are indicated with the following symbols:
  + create

Terraform will perform the following actions:

  + openstack_compute_keypair_v2.my-cloud-key
      id:          <computed>
      fingerprint: <computed>
      name:        "my-key"
      private_key: <computed>
      public_key:  "ssh-rsa AAAAB...."
      region:      <computed>


Plan: 1 to add, 0 to change, 0 to destroy.
If you see this, then everything ran successfully

Let’s look at the output in detail:

An execution plan has been generated and is shown below.
Resource actions are indicated with the following symbols:
  + create

Terraform will perform the following actions:

  + openstack_compute_keypair_v2.my-cloud-key
      id:          <computed>
      fingerprint: <computed>
      name:        "my-key"
      private_key: <computed>
      public_key:  "ssh-rsa AAAAB..."
      region:      <computed>


Plan: 1 to add, 0 to change, 0 to destroy.

------------------------------------------------------------------------

Note: You didn't specify an "-out" parameter to save this plan, so Terraform
can't guarantee that exactly these actions will be performed if
"terraform apply" is subsequently run.

Terraform informs us first about the different symbols used. Here it tells us that it will + create a resource. Sometimes it will - delete or ~ update in-place. Next it describes in detail what will be done and why. For creating a resource it does not give us much information, we will see more types of changes later.

Lastly it informs us that we did not save our plan. Terraform can maintain a concept of what the remote resource’s state looks like between your terraform plan and terraform apply steps. This is a more advanced feature and will not be covered today.

Hands-on: Applying our plan

Now that you’ve reviewed the plan and everything looks good, we’re ready to apply our changes.
Run terraform apply. Because we did not re-use our plan from the previous step, terraform will re-run the plan step first, and request your confirmation:
An execution plan has been generated and is shown below.
Resource actions are indicated with the following symbols:
  + create

Terraform will perform the following actions:

  + openstack_compute_keypair_v2.my-cloud-key
      id:          <computed>
      fingerprint: <computed>
      name:        "my-key"
      private_key: <computed>
      public_key:  "" => "ssh-rsa AAAAB3..."
      region:      <computed>


Plan: 1 to add, 0 to change, 0 to destroy.

Do you want to perform these actions?
  Terraform will perform the actions described above.
  Only 'yes' will be accepted to approve.

  Enter a value:
Confirm that everything looks good and enter the value ‘yes’, and hit enter.
openstack_compute_keypair_v2.my-cloud-key: Creating...
  fingerprint: "" => "<computed>"
  name:        "" => "my-key"
  private_key: "" => "<computed>"
  public_key:  "" => "ssh-rsa AAAAB3..."
  region:      "" => "<computed>"
openstack_compute_keypair_v2.my-cloud-key: Creation complete after 3s (ID: my-key)

Apply complete! Resources: 1 added, 0 changed, 0 destroyed.
Comment: "Key pair 'my-key' already exists"

If you see an error message that says:
{"conflictingRequest": {"message": "Key pair 'my-key' already exists.", "code": 409}}
Then you already have a keypair with this name. You should update the name = "my-key" to use a different name.

We should now have a keypair in our cloud!

Adding an Instance

We now have:

Terraform installed
The OpenStack plugin initialized
A keypair in OpenStack

Comment: Doing this training outside of a training event

If you are doing this training outside of an event, then you will likely need to do some additional steps specific to your Cloud:

Identify a small image flavor that you can use for your testing.

Upload this raw format image available from UseGalaxy.eu to your OpenStack

You will need to do both of these things before you can proceed.

You are now ready to launch an instance!

Hands-on: Launching an Instance

Open your main.tf in your text editor and add the following.

Warning: Correct image/flavor/network/security_group names

The documentation below notes some specific values for the image_name, flavor_name, security_groups, and network properties. These may not be correct for your training, instead your instructor will provide these values to you.
```
resource "openstack_compute_instance_v2" "test" {
  name            = "test-vm"
  image_name      = "denbi-centos7-j10-2e08aa4bfa33-master"
  flavor_name     = "m1.tiny"
  key_pair        = "${openstack_compute_keypair_v2.my-cloud-key.name}"
  security_groups = ["default"]

  network {
    name = "public"
  }
}
```
There are some important things to note, we reproduce the above configuration but with comments:

“openstack_compute_instance_v2” is the ‘type’ of a resource, ‘test’ is the name of this specific resource.
```
resource "openstack_compute_instance_v2" "test" {
```
This will become the server name
```
  name            = "test-vm"
```
The image which will be launched
```
  image_name      = "denbi-centos7-j10-2e08aa4bfa33-master"
```
Which flavor
```
  flavor_name     = "m1.tiny"
```
This uses Terraform’s knowledge of the openstack_compute_keypair_v2.my-cloud-key resource which you previously described. Then it accesses the .name attribute. This allow your to update the name at any time, and still have your resource definitions be correct.
```
  key_pair        = "${openstack_compute_keypair_v2.my-cloud-key.name}"
```
The security group to apply. This is a comma separated list, but default should work for most OpenStack clouds.
```
  security_groups = ["default"]
```
A network that is accessible to you.
```
  network {
    name = "public"
  }
```

Run terraform apply. Running the plan step is not necessary, it is just useful to see what changes will be applied without starting the apply process.

Note that first terraform “refreshes” the state of the remote resources that it already manages, before checking what changes need to be made.

$ terraform apply
openstack_compute_keypair_v2.my-cloud-key: Refreshing state... (ID: my-key)

An execution plan has been generated and is shown below.
Resource actions are indicated with the following symbols:
  + create

Terraform will perform the following actions:

  + openstack_compute_instance_v2.test
      id:                         <computed>
      access_ip_v4:               <computed>
      access_ip_v6:               <computed>
      all_metadata.%:             <computed>
      availability_zone:          <computed>
      flavor_id:                  <computed>
      flavor_name:                "m1.tiny"
      force_delete:               "false"
      image_id:                   <computed>
      image_name:                 "denbi-centos7-j10-2e08aa4bfa33-master"
      key_pair:                   "my-key"
      name:                       "test-vm"
      network.#:                  "1"
      network.0.access_network:   "false"
      network.0.fixed_ip_v4:      <computed>
      network.0.fixed_ip_v6:      <computed>
      network.0.floating_ip:      <computed>
      network.0.mac:              <computed>
      network.0.name:             "public"
      network.0.port:             <computed>
      network.0.uuid:             <computed>
      power_state:                "active"
      region:                     <computed>
      security_groups.#:          "1"
      security_groups.3814588639: "default"
      stop_before_destroy:        "false"


Plan: 1 to add, 0 to change, 0 to destroy.

Do you want to perform these actions?
  Terraform will perform the actions described above.
  Only 'yes' will be accepted to approve.

  Enter a value:

If everything looks good, enter ‘yes’.

openstack_compute_instance_v2.test: Creating...
  access_ip_v4:               "" => "<computed>"
  access_ip_v6:               "" => "<computed>"
  all_metadata.%:             "" => "<computed>"
  availability_zone:          "" => "<computed>"
  flavor_id:                  "" => "<computed>"
  flavor_name:                "" => "m1.tiny"
  force_delete:               "" => "false"
  image_id:                   "" => "<computed>"
  image_name:                 "" => "denbi-centos7-j10-2e08aa4bfa33-master"
  key_pair:                   "" => "my-key"
  name:                       "" => "test-vm"
  network.#:                  "" => "1"
  network.0.access_network:   "" => "false"
  network.0.fixed_ip_v4:      "" => "<computed>"
  network.0.fixed_ip_v6:      "" => "<computed>"
  network.0.floating_ip:      "" => "<computed>"
  network.0.mac:              "" => "<computed>"
  network.0.name:             "" => "public"
  network.0.port:             "" => "<computed>"
  network.0.uuid:             "" => "<computed>"
  power_state:                "" => "active"
  region:                     "" => "<computed>"
  security_groups.#:          "" => "1"
  security_groups.3814588639: "" => "default"
  stop_before_destroy:        "" => "false"
openstack_compute_instance_v2.test: Still creating... (10s elapsed)
openstack_compute_instance_v2.test: Still creating... (20s elapsed)
openstack_compute_instance_v2.test: Creation complete after 30s (ID: 7a2ed5ba-0801-49b5-bf1b-9bf9cec733fa)

Apply complete! Resources: 1 added, 0 changed, 0 destroyed.

You now have a running instance! We do not know the IP address so we cannot login yet. You can obtain that from the OpenStack dashboard, or via the terraform show command

Hands-on: `terraform show` and logging in

Run terraform show

The values in the following output will not match yours.

openstack_compute_instance_v2.test:
  id = 7a2ed5ba-0801-49b5-bf1b-9bf9cec733fa
  access_ip_v4 = 192.52.32.231
  access_ip_v6 =
  all_metadata.% = 0
  availability_zone = nova
  flavor_id = a9a58bcf-29a3-47f4-b4c3-e71e40127833
  flavor_name = m1.tiny
  force_delete = false
  image_id = 922ae172-275b-4e6a-bb9d-c7ace52fc8d4
  image_name = denbi-centos7-j10-2e08aa4bfa33-master
  key_pair = my-key
  name = test-vm
  network.# = 1
  network.0.access_network = false
  network.0.fixed_ip_v4 = 192.52.32.231
  network.0.fixed_ip_v6 =
  network.0.floating_ip =
  network.0.mac = fa:16:3e:e6:cd:6c
  network.0.name = public
  network.0.port =
  network.0.uuid = 6a68b40c-356f-4d4c-95e9-e5c72855fb35
  power_state = active
  region = Freiburg
  security_groups.# = 1
  security_groups.3814588639 = default
  stop_before_destroy = false
openstack_compute_keypair_v2.my-cloud-key:
  id = my-key
  fingerprint = 4a:72:2c:7e:d2:b7:f2:c4:5a:9e:bb:ca:c7:8f:d7:a3
  name = my-key
  private_key =
  public_key = ssh-rsa AAAAB3...
  region = Freiburg

Here we can obtain the IP address. In the above output, it is 192.52.32.231

The authenticity of host '192.52.32.231 (192.52.32.231)' can't be established.
ECDSA key fingerprint is SHA256:oEHgotonwT3a47K0kLdsNVd/QqctBjDK7F829J0VnwQ.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added '192.52.32.231' (ECDSA) to the list of known hosts.
      _        _   _ ____ _____
     | |      | \ | |  _ \_   _| ██      ██
   __| | ___  |  \| | |_) || |     ██████
  / _' |/ _ \ | . ' |  _ < | |     ██████
 | (_| |  __/_| |\  | |_) || |_  ██  ██  ██
  \__,_|\___(_)_| \_|____/_____| ██  ██  ██
====================================================
Hostname..........: test-vm.novalocal
Release...........: CentOS Linux release 7.5.1804 (Core)
Build.............: de.NBI Generic denbi-centos7-j10-2e08aa4bfa33-master
Build Date........: 2018-10-24T14:28:38Z
Current user......: centos
Load..............: 0.00 0.07 0.05 2/148 1644
Uptime............: up 5 minutes
====================================================
[centos@test-vm ~]$

With this, you’re done. You’ve launched a single VM with Terraform. The image we are using comes with:

HTCondor
Docker
Singularity
Conda
CVMFS connected to Galaxyproject’s biological databases and Singularity containers

This is very similar to the standard image that UseGalaxy.eu uses for all of its compute nodes.

Managing a Cluster of VMs

When you are done playing around with the individual VM, we’re ready to explore launching an entire cluster.

We will launch a VM to act as our HTCondor central manager
We will launch a VM to act as an NFS server for our “cluster”
We will launch two execution nodes to process jobs

We use a very similar setup for our production cloud cluster that powers UseGalaxy.eu’s compute and training infrastructure.

Configuration

We will start by setting up the new nodes and launching them as a test:

Hands-on: Adding configuration for other nodes
Update the name of the instance you already have, test to name = "central-manager". This will be our scheduler central manager.
resource "openstack_compute_instance_v2" "test" {
  name            = "central-manager"
  ...
}
Add the NFS server. It is mostly the same configuration you’ve done before, just with a different name and resource name.
resource "openstack_compute_instance_v2" "nfs" {
  name            = "nfs-server"
  image_name      = "denbi-centos7-j10-2e08aa4bfa33-master"
  flavor_name     = "m1.tiny"
  key_pair        = "${openstack_compute_keypair_v2.my-cloud-key.name}"
  security_groups = ["default"]

  network {
    name = "public"
  }
}
Lastly, we’ll add the execution nodes. Here we will launch two servers at once by using the count parameter. We can add count in the main portion of a Terraform resource definition and it will create that many copies of that same resource. In the case where we wish to be able to distinguish between them, we can use ${count.index} in the name or another field to distinguish them.
resource "openstack_compute_instance_v2" "exec" {
  name            = "exec-${count.index}"
  image_name      = "denbi-centos7-j10-2e08aa4bfa33-master"
  flavor_name     = "m1.tiny"
  key_pair        = "${openstack_compute_keypair_v2.my-cloud-key.name}"
  security_groups = ["default"]
  count           = 2

  network {
    name = "public"
  }
}
Run terraform apply

You will see something like the following:
openstack_compute_keypair_v2.my-cloud-key: Refreshing state... (ID: my-key)
openstack_compute_instance_v2.test: Refreshing state... (ID: 7a2ed5ba-0801-49b5-bf1b-9bf9cec733fa)

An execution plan has been generated and is shown below.
Resource actions are indicated with the following symbols:
  + create
  ~ update in-place

Terraform will perform the following actions:

  + openstack_compute_instance_v2.exec[0]
      id:                         <computed>
      access_ip_v4:               <computed>
      access_ip_v6:               <computed>
      all_metadata.%:             <computed>
      availability_zone:          <computed>
      ...

  ~ openstack_compute_instance_v2.test
      name:                       "test-vm" => "central-manager"


Plan: 3 to add, 1 to change, 0 to destroy.

Do you want to perform these actions?
  Terraform will perform the actions described above.
  Only 'yes' will be accepted to approve.

  Enter a value:
This time, Terraform is able to modify the openstack_compute_instance_v2.test resource in place, it can just update the name of the VM. It will also create the other three requested resources.
Enter yes and terraform will make the requested changes
It should complete successfully
Question

What did the output look like?
openstack_compute_instance_v2.nfs: Creation complete after 2m55s (ID: 421f0899-f2c1-43c9-8bbe-9b0415c8f4f4)
openstack_compute_instance_v2.exec[0]: Creation complete after 2m57s (ID: 1ae02775-a0ff-47a5-907f-513190a8548e)
openstack_compute_instance_v2.exec[1]: Creation complete after 2m59s (ID: d339ebeb-4288-498c-9923-42fce32a3808)

Apply complete! Resources: 3 added, 0 changed, 0 destroyed.

`cloud-init`

These VMs that we have launched currently are all absolutely identical, except for the host name. There is no specialisation. They all do nothing. We will fix that and make these machines have individual roles by booting them with metadata attached to each instance.

cloud-init is used for this process, it allows for injecting files and executing commands as part of the boot process.

cloud-init allows for many more actions to be executed as well, you can read about them in the documentation.

The cloud-init configuration is a YAML file. Here we can see an example of how it is used, we have a YAML array at the top level. Inside, a hash of content, owner, path, and permissions.

#cloud-config
write_files:
- content: |
    CONDOR_HOST = 127.0.0.1
    ALLOW_WRITE = *
    ALLOW_READ = $(ALLOW_WRITE)
    ALLOW_NEGOTIATOR = $(ALLOW_WRITE)
    DAEMON_LIST = COLLECTOR, MASTER, NEGOTIATOR, SCHEDD
    FILESYSTEM_DOMAIN = denbi
    UID_DOMAIN = denbi
    TRUST_UID_DOMAIN = True
    SOFT_UID_DOMAIN = True
  owner: root:root
  path: /etc/condor/condor_config.local
  permissions: '0644'

This will create a file with the value from content, owned by root/group root, in /etc/condor/condor_config.local. So let’s re-structure our main.tf file to have some configuration:

Hands-on: cloud-init

Edit your central-manager server and add a block like the following at the end.

user_data = <<-EOF
  #cloud-config
  write_files:
  - content: |
      CONDOR_HOST = localhost
      ALLOW_WRITE = *
      ALLOW_READ = $(ALLOW_WRITE)
      ALLOW_NEGOTIATOR = $(ALLOW_WRITE)
      DAEMON_LIST = COLLECTOR, MASTER, NEGOTIATOR, SCHEDD
      FILESYSTEM_DOMAIN = terraform-training
      UID_DOMAIN = terraform-training
      TRUST_UID_DOMAIN = True
      SOFT_UID_DOMAIN = True
    owner: root:root
    path: /etc/condor/condor_config.local
    permissions: '0644'
  - content: |
      /data           /etc/auto.data          nfsvers=3
    owner: root:root
    path: /etc/auto.master.d/data.autofs
    permissions: '0644'
  - content: |
      share  -rw,hard,intr,nosuid,quota  ${openstack_compute_instance_v2.nfs.access_ip_v4}:/data/share
    owner: root:root
    path: /etc/auto.data
    permissions: '0644'
EOF

Near the end we see an interesting thing, ${openstack_compute_instance_v2.nfs.access_ip_v4}, this will template the user_data that is passed to cloud-init, with the IP address of the NFS server.

Edit your NFS server and add a user data block like:

user_data = <<-EOF
  #cloud-config
  write_files:
  - content: |
      /data/share *(rw,sync)
    owner: root:root
    path: /etc/exports
    permissions: '0644'
  runcmd:
   - [ mkdir, -p, /data/share ]
   - [ chown, "centos:centos", -R, /data/share ]
   - [ systemctl, enable, nfs-server ]
   - [ systemctl, start, nfs-server ]
   - [ exportfs, -avr ]
EOF

This will:

Make the /data/share directory
Set permissions on it
Launch the NFS service
Export /data/share to the network

Edit the exec node and add a user data block like:

user_data = <<-EOF
  #cloud-config
  write_files:
  - content: |
      CONDOR_HOST = ${openstack_compute_instance_v2.test.access_ip_v4}
      ALLOW_WRITE = *
      ALLOW_READ = $(ALLOW_WRITE)
      ALLOW_ADMINISTRATOR = *
      ALLOW_NEGOTIATOR = $(ALLOW_ADMINISTRATOR)
      ALLOW_CONFIG = $(ALLOW_ADMINISTRATOR)
      ALLOW_DAEMON = $(ALLOW_ADMINISTRATOR)
      ALLOW_OWNER = $(ALLOW_ADMINISTRATOR)
      ALLOW_CLIENT = *
      DAEMON_LIST = MASTER, SCHEDD, STARTD
      FILESYSTEM_DOMAIN = terraform-training
      UID_DOMAIN = terraform-training
      TRUST_UID_DOMAIN = True
      SOFT_UID_DOMAIN = True
      # run with partitionable slots
      CLAIM_PARTITIONABLE_LEFTOVERS = True
      NUM_SLOTS = 1
      NUM_SLOTS_TYPE_1 = 1
      SLOT_TYPE_1 = 100%
      SLOT_TYPE_1_PARTITIONABLE = True
      ALLOW_PSLOT_PREEMPTION = False
      STARTD.PROPORTIONAL_SWAP_ASSIGNMENT = True
    owner: root:root
    path: /etc/condor/condor_config.local
    permissions: '0644'
  - content: |
      /data           /etc/auto.data          nfsvers=3
    owner: root:root
    path: /etc/auto.master.d/data.autofs
    permissions: '0644'
  - content: |
      share  -rw,hard,intr,nosuid,quota  ${openstack_compute_instance_v2.nfs.access_ip_v4}:/data/share
    owner: root:root
    path: /etc/auto.data
    permissions: '0644'
EOF

Compare your final configuration against ours:

resource "openstack_compute_keypair_v2" "my-cloud-key" {
  name       = "my-key"
  public_key = "ssh-rsa AAAAB3..."
}

resource "openstack_compute_instance_v2" "test" {
  name            = "central-manager"
  image_name      = "denbi-centos7-j10-2e08aa4bfa33-master"
  flavor_name     = "m1.tiny"
  key_pair        = "${openstack_compute_keypair_v2.my-cloud-key.name}"
  security_groups = ["default"]

  network {
    name = "public"
  }

  user_data = <<-EOF
    #cloud-config
    write_files:
    - content: |
        CONDOR_HOST = localhost
        ALLOW_WRITE = *
        ALLOW_READ = $(ALLOW_WRITE)
        ALLOW_NEGOTIATOR = $(ALLOW_WRITE)
        DAEMON_LIST = COLLECTOR, MASTER, NEGOTIATOR, SCHEDD
        FILESYSTEM_DOMAIN = terraform-training
        UID_DOMAIN = terraform-training
        TRUST_UID_DOMAIN = True
        SOFT_UID_DOMAIN = True
      owner: root:root
      path: /etc/condor/condor_config.local
      permissions: '0644'
    - content: |
        /data           /etc/auto.data          nfsvers=3
      owner: root:root
      path: /etc/auto.master.d/data.autofs
      permissions: '0644'
    - content: |
        share  -rw,hard,intr,nosuid,quota  ${openstack_compute_instance_v2.nfs.access_ip_v4}:/data/share
      owner: root:root
      path: /etc/auto.data
      permissions: '0644'
  EOF
}

resource "openstack_compute_instance_v2" "nfs" {
  name            = "nfs-server"
  image_name      = "denbi-centos7-j10-2e08aa4bfa33-master"
  flavor_name     = "m1.tiny"
  key_pair        = "${openstack_compute_keypair_v2.my-cloud-key.name}"
  security_groups = ["default"]

  network {
    name = "public"
  }

  user_data = <<-EOF
    #cloud-config
    write_files:
    - content: |
        /data/share *(rw,sync)
      owner: root:root
      path: /etc/exports
      permissions: '0644'
    runcmd:
     - [ mkdir, -p, /data/share ]
     - [ chown, "centos:centos", -R, /data/share ]
     - [ systemctl, enable, nfs-server ]
     - [ systemctl, start, nfs-server ]
     - [ exportfs, -avr ]
  EOF
}

resource "openstack_compute_instance_v2" "exec" {
  name            = "exec-${count.index}"
  image_name      = "denbi-centos7-j10-2e08aa4bfa33-master"
  flavor_name     = "m1.tiny"
  key_pair        = "${openstack_compute_keypair_v2.my-cloud-key.name}"
  security_groups = ["default"]
  count           = 2

  network {
    name = "public"
  }

  user_data = <<-EOF
    #cloud-config
    write_files:
    - content: |
        CONDOR_HOST = ${openstack_compute_instance_v2.test.access_ip_v4}
        ALLOW_WRITE = *
        ALLOW_READ = $(ALLOW_WRITE)
        ALLOW_ADMINISTRATOR = *
        ALLOW_NEGOTIATOR = $(ALLOW_ADMINISTRATOR)
        ALLOW_CONFIG = $(ALLOW_ADMINISTRATOR)
        ALLOW_DAEMON = $(ALLOW_ADMINISTRATOR)
        ALLOW_OWNER = $(ALLOW_ADMINISTRATOR)
        ALLOW_CLIENT = *
        DAEMON_LIST = MASTER, SCHEDD, STARTD
        FILESYSTEM_DOMAIN = terraform-training
        UID_DOMAIN = terraform-training
        TRUST_UID_DOMAIN = True
        SOFT_UID_DOMAIN = True
        # run with partitionable slots
        CLAIM_PARTITIONABLE_LEFTOVERS = True
        NUM_SLOTS = 1
        NUM_SLOTS_TYPE_1 = 1
        SLOT_TYPE_1 = 100%
        SLOT_TYPE_1_PARTITIONABLE = True
        ALLOW_PSLOT_PREEMPTION = False
        STARTD.PROPORTIONAL_SWAP_ASSIGNMENT = True
      owner: root:root
      path: /etc/condor/condor_config.local
      permissions: '0644'
    - content: |
        /data           /etc/auto.data          nfsvers=3
      owner: root:root
      path: /etc/auto.master.d/data.autofs
      permissions: '0644'
    - content: |
        share  -rw,hard,intr,nosuid,quota  ${openstack_compute_instance_v2.nfs.access_ip_v4}:/data/share
      owner: root:root
      path: /etc/auto.data
      permissions: '0644'
  EOF
}

Run terraform apply

It will look slightly different this time:

$ terraform apply
openstack_compute_keypair_v2.my-cloud-key: Refreshing state... (ID: my-key)
openstack_compute_instance_v2.nfs: Refreshing state... (ID: 421f0899-f2c1-43c9-8bbe-9b0415c8f4f4)
openstack_compute_instance_v2.test: Refreshing state... (ID: 7a2ed5ba-0801-49b5-bf1b-9bf9cec733fa)
openstack_compute_instance_v2.exec[0]: Refreshing state... (ID: 1ae02775-a0ff-47a5-907f-513190a8548e)
openstack_compute_instance_v2.exec[1]: Refreshing state... (ID: d339ebeb-4288-498c-9923-42fce32a3808)

An execution plan has been generated and is shown below.
Resource actions are indicated with the following symbols:
-/+ destroy and then create replacement

Terraform will perform the following actions:

-/+ openstack_compute_instance_v2.exec[0] (new resource required)
      id:                         "1ae02775-a0ff-47a5-907f-513190a8548e" => <computed> (forces new resource)
      access_ip_v4:               "192.52.32.255" => <computed>
      ...
      user_data:                  "" => "07d27d50dbdf12a5647d9fd12a3510861d32203c" (forces new resource)

...
Plan: 4 to add, 0 to change, 4 to destroy.

Do you want to perform these actions?
  Terraform will perform the actions described above.
  Only 'yes' will be accepted to approve.

  Enter a value:

Here Terraform has detected that the userdata has changed. It cannot change this dynamically at runtime, only at boot time. So it decides that it must destroy and then replace that resource.

We now have a running cluster! Let’s log in

Hands-on: Cluster Usage
Now that we have a cluster in the cloud, let’s login. Look through the output of terraform show to find your central-manager server and its IP address.

ssh centos@<your manager ip>
Run condor_status which will show you the status of your cluster.

After your central manager VM booted, the executor nodes booted as well. Then, using the IP address of the central manager, the executors contacted that machine, and advertised their availability to run jobs.
Question

What does the output look like? How many executor nodes do you see?
[centos@central-manager share]$ condor_status
Name                   OpSys      Arch   State     Activity LoadAv Mem   ActvtyTime

slot1@exec-0.novalocal LINUX      X86_64 Unclaimed Idle      0.150  991  0+00:15:28
slot1@exec-1.novalocal LINUX      X86_64 Unclaimed Idle      0.000  991  0+00:15:32

                     Machines Owner Claimed Unclaimed Matched Preempting  Drain

        X86_64/LINUX        2     0       0         2       0          0      0

               Total        2     0       0         2       0          0      0
Change into the /data/share/
Create a file, test.job with the following contents:
Universe = vanilla
Executable = /data/share/test.sh
Log = test.$(process).log
Output = test.$(process).out
Error = test.$(process).err
request_cpus = 1
Queue 10
Create a file test.sh with the following:
#!/bin/bash
echo "$(hostname)"
sleep 1
Run chmod +x test.sh

Run condor_submit test.job

(Quickly) Run condor_q to see the queue status. You can invoke this repeatedly to watch the queue update.

Run cat *.out and you should see the hostnames where the jobs were run

Infrastructure Graph

Terraform has the ability to produce a graph showing the relationship of resources in your infrastructure. We will produce a graphic for our current terraform plan:

Hands-on: Infrastructure Graph
Run:
terraform graph | dot -Tpng > graph.png
You may need the graphviz package installed in order to produce the graph
Open up graph.png in an image viewer

Question

What did the output look like?

Figure 1: A simple graph showing the structure of the infrastructure in this lesson

This is a very simple resrouce graph:

A simple graph. — Figure 2: A simple graph showing the structure of the infrastructure in this lesson

The dependencies follow the arrows, the NFS depends on the keypair being setup. The test machine (central manager) depends on the NFS server being available (and having an IP), the exec nodes depend on the central manager server. This is a somewhat simplified view, there are more dependencies which are not shown for simplicity (e.g. the exec nodes depend on NFS)

If you have variables this can produce a more complex dependency graph:

A more complex graph showing variables. — Figure 3: A more complex graph showing variables

Once you develop complex infrastructure, these graphics become less useful.

Tearing Everything Down

Hands-on: Cleanup

Simple delete your main.tf file (or rename it without the .tf extension)

Run terraform apply. Terraform will see that the resource is no longer part of your code and remove it.

openstack_compute_keypair_v2.my-cloud-key: Refreshing state... (ID: my-key)
openstack_compute_instance_v2.nfs: Refreshing state... (ID: 0b58d613-48f2-49c4-9d8a-3623d17f1c93)
openstack_compute_instance_v2.test: Refreshing state... (ID: e433f772-6f17-4609-aa61-053f7602533f)
openstack_compute_instance_v2.exec[0]: Refreshing state... (ID: 034a4206-b0ad-4d0e-9c75-27d3966f99ac)
openstack_compute_instance_v2.exec[1]: Refreshing state... (ID: fe576776-397e-4f1f-a8e7-4a9203764567)

An execution plan has been generated and is shown below.
Resource actions are indicated with the following symbols:
  - destroy

Terraform will perform the following actions:

  - openstack_compute_instance_v2.exec[0]

  - openstack_compute_instance_v2.exec[1]

  - openstack_compute_instance_v2.nfs

  - openstack_compute_instance_v2.test

  - openstack_compute_keypair_v2.my-cloud-key


Plan: 0 to add, 0 to change, 5 to destroy.

Do you want to perform these actions?
  Terraform will perform the actions described above.
  Only 'yes' will be accepted to approve.

  Enter a value:

Destroy time provisioners

Sometimes, immediately terminating the VM is not ideal behaviour. Especially if you’re managing a cluster like HTCondor which can gracefully terminate jobs, you might want to permit that. Terraform provides ‘destroy-time provisioners’, code snippets that are run before the VM is destroyed. If they return successfully, destruction continues. If they exit with an error code, then destruction is halted. We can write a simple one for HTCondor like so:

set -ex
condor_drain $(hostname) || true;
my_ip=$(/sbin/ifconfig eth0 | grep 'inet ' | awk '{print $2}')
# Find slots which have an address associated with us.
slots=$(condor_status -autoformat Name State MyAddress | grep "<${my_ip}?9618" | wc -l)
# If there are more than one slots, leave it.
if (( slots > 1 )); then
    exit 1;
else
    # Otherwise, poweroff.
    /usr/sbin/condor_off -graceful
    exit 0;
fi

Which can be provided to your VM like:

provisioner "remote-exec" {
  when = "destroy"

  scripts = [
    "prepare-restart.sh",
  ]

  connection {
    type        = "ssh"
    user        = "centos"
    private_key = "${file("~/.ssh/id_rsa")}"
  }
}

Terraform will SSH in with those credentials, copy over the script in prepare-restart.sh, and execute it.

UseGalaxy.eu’s Terraform Usage

All of our virtual infrastructure is managed, publicly, with Terraform. We hope that this can inspire others and give people ideas of the sort of things they can accomplish with Terraform. If you have questions over the way we have done certain things, feel free to file an issue and ask us!

Key points

Terraform lets you develop and implement infrastructure-as-code within your organisation

It can drastically simplify management of large numbers of VMs

Frequently Asked Questions

Have questions about this tutorial? Check out the tutorial FAQ page or the FAQ page for the Galaxy Server administration topic to see if your question is listed there. If not, please ask your question on the GTN Gitter Channel or the Galaxy Help Forum

Feedback

Did you use this material as an instructor? Feel free to give us feedback on how it went.
Did you use this material as a learner or student? Click the form below to leave feedback.

Citing this Tutorial

Helena Rasche, 2022 Deploying a compute cluster in OpenStack via Terraform (Galaxy Training Materials). https://training.galaxyproject.org/training-material/topics/admin/tutorials/terraform/tutorial.html Online; accessed TODAY
Batut et al., 2018 Community-Driven Data Analysis Training for Biology Cell Systems 10.1016/j.cels.2018.05.012

@misc{admin-terraform,
author = "Helena Rasche",
title = "Deploying a compute cluster in OpenStack via Terraform (Galaxy Training Materials)",
year = "2022",
month = "10",
day = "18"
url = "\url{https://training.galaxyproject.org/training-material/topics/admin/tutorials/terraform/tutorial.html}",
note = "[Online; accessed TODAY]"
}
@article{Batut_2018,
    doi = {10.1016/j.cels.2018.05.012},
    url = {https://doi.org/10.1016%2Fj.cels.2018.05.012},
    year = 2018,
    month = {jun},
    publisher = {Elsevier {BV}},
    volume = {6},
    number = {6},
    pages = {752--758.e1},
    author = {B{\'{e}}r{\'{e}}nice Batut and Saskia Hiltemann and Andrea Bagnacani and Dannon Baker and Vivek Bhardwaj and Clemens Blank and Anthony Bretaudeau and Loraine Brillet-Gu{\'{e}}guen and Martin {\v{C}}ech and John Chilton and Dave Clements and Olivia Doppelt-Azeroual and Anika Erxleben and Mallory Ann Freeberg and Simon Gladman and Youri Hoogstrate and Hans-Rudolf Hotz and Torsten Houwaart and Pratik Jagtap and Delphine Larivi{\`{e}}re and Gildas Le Corguill{\'{e}} and Thomas Manke and Fabien Mareuil and Fidel Ram{\'{\i}}rez and Devon Ryan and Florian Christoph Sigloch and Nicola Soranzo and Joachim Wolff and Pavankumar Videm and Markus Wolfien and Aisanjiang Wubuli and Dilmurat Yusuf and James Taylor and Rolf Backofen and Anton Nekrutenko and Björn Grüning},
    title = {Community-Driven Data Analysis Training for Biology},
    journal = {Cell Systems}
}
                   

Congratulations on successfully completing this tutorial!