A Month of Terraform

By: Jeremy W. Sherman. Published: 21 Nov 2020. Categories: experience-report. Tags: terraform aws cloud.

I took Heroku for granted, and a month into setting up my own infra, I now know how much it bought me.

A lot of my past work has been infrastructure-adjacent. I often find myself filling in the Build & Integration role - the person that gets continuous integration off the ground and keeps it actually continuing rather than falling flat on its face. But often I’ve just been building one of a constellation of services, so the core infrastructure was already there, or I’ve been targeting something like Heroku, where you basically pick your poison, git push, and bob’s your uncle.

This time, I’m putting the pieces together using the AWS toolkit. And to smoosh them all together, I’m using Terraform, because heck if I’m going to be hand-writing YAML or JSON and praying it’s formatted right. Plus there’s more I want to orchestrate than just AWS, like, say, GitLab.

I don’t wanna talk about AWS just now. It reminds me of learning Foundation & Cocoa - you look at one piece, and it can do so much, and then you gotta put all those individually deep & complex pieces together to do more stuff. I figure if I put in the hours reading docs, learning what’s all there, and getting stabbed by the pointy bits, it’ll probably all come out fine in the end.

So, Terraform.

The Good

It mostly works!
When it doesn’t, it generally fails in a useful way, and then I can fix it and try again.
There are docs for most things.
Autoformatting works great.
Linting works pretty well.
Terraform: Up & Running is excellent, and Terragrunt makes it even easier. Huge thanks to their team for providing the duct tape we need. 🙌

The Not So Good

terraform-lsp is supposed to provide autocomplete, but it mostly doesn’t, in my experience. First it flipped its lid that I dared to have a repo with multiple root modules in it, so I just aimed VS Code at the folder with a single root module. Then the language server says it’s all hunky dory AFAICT, and yet it autocompletes nothing beyond bare language syntax. As a result, I’m manually referencing docs and writing stuff down and wasting tons of time that tools like autocomplete and integrated linting ought to be saving me from.
State files contain secrets in plaintext. (You might enjoy the six-year-old GitHub issue about the plaintext secrets problem.) You can mark outputs as secret, so they don’t get printed at the end of applying your infra spec, but run terraform show instead of terraform apply, and there they are, staring back at you. At least you can lock down and encrypt the S3 bucket holding the state.
- Pulumi’s secrets management is far more satisfying. But Pulumi is even more cutting-edge than v0.whatever Terraform, and I expect Hashicorp to keep TF running for a good while, while I’m not so confident in Pulumi, so I’m using TF. (Hashicorp of course would recommend Vault.)
Annoying asymmetries in the language about how you *declare and reference things in slightly variant ways - I trip over these over and over as a beginner:
- You declare locals in a locals block, but you reference them as local.thing, not locals.thing.
- You declare a variable in a variable block, but you reference it as var.thing.
- You declare data sources as data "provider_thingy" "my_name_for_this_data", and then you have to access it as data.provider_thingy.my_name_for_this_data. (This is actually pretty darn consistent, at least. Though, like, why the quotes around the provider thingy?)
- You declare resources as resource "provider_thingy" "my_name". But you do NOT reference them as resource.provider_thingy.my_name. Nope, you just reference them as bare provider_thingy.my_name.
For that matter, there are other oddities as well. Pieces of syntax that seem like they should be orthogonal just aren’t. for_each stands out here:
- You can generate multiple resources by just dropping a for_each in the block: resource "provider_thing" "mine" {} becomes resource "provider_thing" "mine" { for_each = of_these }
- But nested argument blocks require conversion from like setting { namespace = "blah" } to dynamic "setting" { for_each = thingy; content { namespace = "blah" }}. Have fun looking that up a few times.
- And you can’t even use the for_each trick with module imports. It just isn’t supported. Sorry, sucks to be you.
Annoying gaps in the docs:
- Required vs optional parameters are not very clearly called out and are not at all segregated. So you get to play the game of “what is the minimal skeleton to declare this resource”. Actually running it a few times to see what you screwed up takes longer than just looking at the docs and puzzling it out, due to the lengthy iteration times in infra-land (see below).
- Types are not shown in the docs!!! All the outputs and arguments are typed. You have to declare those types. It’s right there in the code. But the docs don’t say what any of the types are. You just hit a type error at runtime. Fun fun!
- The HCL language is doc’d under the CLI tool, not in and of itself. It was really hard to actually find the docs since my first thought when I have syntax questions isn’t “let’s look at the docs for the tool.” It’d be like pulling up the manpage for GCC (carefully draw your triangle of art first) when you have a question about C syntax.
Annoying asymmetries in the AWS provider:
- Missing links: Sometimes you get into a “can’t get there from here” situation. Like trying to find the zone ID for an Elastic Beanstalk environment’s CNAME so you can aim a Route 53 alias at it. (Hint, you need a completely different resource, the aws_elastic_beanstalk_hosted_zone.)
- Irregular naming:
  - Sometimes something is zone_id, but other times it’s maybe just id.
  - Sometimes you can fish stuff out by arn, or maybe by id, or maybe it’s by name - good luck. Keep the docs close to hand.
  - (It’s totally possible this is inherited from the AWS APIs themselves, but the whole point of an abstraction layer is to make things better and more usable, dangit.)

The Different

Iteration times are way longer than with even mobile apps. Like, “you’re liable to task-switch while waiting to see plan output” longer.
Testing is a pain. I haven’t pulled in Terratest yet, because anyone maintaining this after me is unlikely to have Go experience, and my focus here isn’t building reusable infra anyway - it’s building this infra – so I’ve just been using bats and Bash shell scripts (with shellcheck, which is amazing) for some after-the-fact sanity checking using the AWS CLI. (Pro tip: Use the community-maintained fork bats-core rather than the no-longer-maintained sstephenson original.)
- Policy assertions feel like a different flavor of test, but the tooling here seems to be fairly immature, with perhaps the exception of if you’re targeting Kubernetes.

Summary

I expect I’ll get used to most of the rough edges of the syntax in another month. And Terraform is still v0, so hey, maybe some breaking changes will clear all this mess away. 🤞

I’m intentionally not getting sucked into hacking around the docs frustrations just now. Or even the very tempting open issue about silencing all the Terragrunt logspew.

I do plan to spend a bit of time trying to get autocomplete working for resource and data source types and their arguments/attributes from the language server, at least. That would be a huge help.

It still feels like magic to run a command and have infrastructure just…happen. You hit return, wait a bit, and suddenly servers are serving and domains are aliasing and a whole constellation of systems are interoperating. It kinda reminds me of the magic of home automation with blinkenlights, only without any of that messy “hardware” stuff to break on you.