A Month of Terraform
By: . Published: . Categories: experience-report. Tags: terraform aws cloud.I took Heroku for granted, and a month into setting up my own infra, I now know how much it bought me.
A lot of my past work has been infrastructure-adjacent. I often find myself filling in the Build & Integration role - the person that gets continuous integration off the ground and keeps it actually continuing rather than falling flat on its face. But often I’ve just been building one of a constellation of services, so the core infrastructure was already there, or I’ve been targeting something like Heroku, where you basically pick your poison, git push, and bob’s your uncle.
This time, I’m putting the pieces together using the AWS toolkit. And to smoosh them all together, I’m using Terraform, because heck if I’m going to be hand-writing YAML or JSON and praying it’s formatted right. Plus there’s more I want to orchestrate than just AWS, like, say, GitLab.
I don’t wanna talk about AWS just now. It reminds me of learning Foundation & Cocoa - you look at one piece, and it can do so much, and then you gotta put all those individually deep & complex pieces together to do more stuff. I figure if I put in the hours reading docs, learning what’s all there, and getting stabbed by the pointy bits, it’ll probably all come out fine in the end.
So, Terraform.
The Good
- It mostly works!
- When it doesn’t, it generally fails in a useful way, and then I can fix it and try again.
- There are docs for most things.
- Autoformatting works great.
- Linting works pretty well.
- Terraform: Up & Running is excellent, and Terragrunt makes it even easier. Huge thanks to their team for providing the duct tape we need. 🙌
The Not So Good
- terraform-lsp is supposed to provide autocomplete, but it mostly doesn’t, in my experience. First it flipped its lid that I dared to have a repo with multiple root modules in it, so I just aimed VS Code at the folder with a single root module. Then the language server says it’s all hunky dory AFAICT, and yet it autocompletes nothing beyond bare language syntax. As a result, I’m manually referencing docs and writing stuff down and wasting tons of time that tools like autocomplete and integrated linting ought to be saving me from.
- State files contain secrets in plaintext. (You might enjoy the six-year-old GitHub issue about the plaintext secrets problem.) You can mark outputs as secret, so they don’t get printed at the end of applying your infra spec, but run
terraform show
instead ofterraform apply
, and there they are, staring back at you. At least you can lock down and encrypt the S3 bucket holding the state.- Pulumi’s secrets management is far more satisfying. But Pulumi is even more cutting-edge than v0.whatever Terraform, and I expect Hashicorp to keep TF running for a good while, while I’m not so confident in Pulumi, so I’m using TF. (Hashicorp of course would recommend Vault.)
- Annoying asymmetries in the language about how you *declare and reference things in slightly variant ways - I trip over these over and over as a beginner:
- You declare locals in a
locals
block, but you reference them aslocal.thing
, notlocals.thing
. - You declare a variable in a
variable
block, but you reference it asvar.thing
. - You declare data sources as
data "provider_thingy" "my_name_for_this_data"
, and then you have to access it asdata.provider_thingy.my_name_for_this_data
. (This is actually pretty darn consistent, at least. Though, like, why the quotes around the provider thingy?) - You declare resources as
resource "provider_thingy" "my_name"
. But you do NOT reference them asresource.provider_thingy.my_name
. Nope, you just reference them as bareprovider_thingy.my_name
.
- You declare locals in a
- For that matter, there are other oddities as well. Pieces of syntax that seem like they should be orthogonal just aren’t.
for_each
stands out here:- You can generate multiple resources by just dropping a
for_each
in the block:resource "provider_thing" "mine" {}
becomesresource "provider_thing" "mine" { for_each = of_these }
- But nested argument blocks require conversion from like
setting { namespace = "blah" }
todynamic "setting" { for_each = thingy; content { namespace = "blah" }}
. Have fun looking that up a few times. - And you can’t even use the
for_each
trick with module imports. It just isn’t supported. Sorry, sucks to be you.
- You can generate multiple resources by just dropping a
- Annoying gaps in the docs:
- Required vs optional parameters are not very clearly called out and are not at all segregated. So you get to play the game of “what is the minimal skeleton to declare this resource”. Actually running it a few times to see what you screwed up takes longer than just looking at the docs and puzzling it out, due to the lengthy iteration times in infra-land (see below).
- Types are not shown in the docs!!! All the outputs and arguments are typed. You have to declare those types. It’s right there in the code. But the docs don’t say what any of the types are. You just hit a type error at runtime. Fun fun!
- The HCL language is doc’d under the CLI tool, not in and of itself. It was really hard to actually find the docs since my first thought when I have syntax questions isn’t “let’s look at the docs for the tool.” It’d be like pulling up the manpage for GCC (carefully draw your triangle of art first) when you have a question about C syntax.
- Annoying asymmetries in the AWS provider:
- Missing links: Sometimes you get into a “can’t get there from here” situation. Like trying to find the zone ID for an Elastic Beanstalk environment’s CNAME so you can aim a Route 53 alias at it. (Hint, you need a completely different resource, the
aws_elastic_beanstalk_hosted_zone
.) - Irregular naming:
- Sometimes something is
zone_id
, but other times it’s maybe justid
. - Sometimes you can fish stuff out by
arn
, or maybe byid
, or maybe it’s byname
- good luck. Keep the docs close to hand. - (It’s totally possible this is inherited from the AWS APIs themselves, but the whole point of an abstraction layer is to make things better and more usable, dangit.)
- Sometimes something is
- Missing links: Sometimes you get into a “can’t get there from here” situation. Like trying to find the zone ID for an Elastic Beanstalk environment’s CNAME so you can aim a Route 53 alias at it. (Hint, you need a completely different resource, the
The Different
- Iteration times are way longer than with even mobile apps. Like, “you’re liable to task-switch while waiting to see plan output” longer.
- Testing is a pain. I haven’t pulled in Terratest yet, because anyone maintaining this after me is unlikely to have Go experience, and my focus here isn’t building reusable infra anyway - it’s building this infra – so I’ve just been using
bats
and Bash shell scripts (with shellcheck, which is amazing) for some after-the-fact sanity checking using the AWS CLI. (Pro tip: Use the community-maintained forkbats-core
rather than the no-longer-maintained sstephenson original.)- Policy assertions feel like a different flavor of test, but the tooling here seems to be fairly immature, with perhaps the exception of if you’re targeting Kubernetes.
Summary
I expect I’ll get used to most of the rough edges of the syntax in another month. And Terraform is still v0, so hey, maybe some breaking changes will clear all this mess away. 🤞
I’m intentionally not getting sucked into hacking around the docs frustrations just now. Or even the very tempting open issue about silencing all the Terragrunt logspew.
I do plan to spend a bit of time trying to get autocomplete working for resource and data source types and their arguments/attributes from the language server, at least. That would be a huge help.
It still feels like magic to run a command and have infrastructure just…happen. You hit return, wait a bit, and suddenly servers are serving and domains are aliasing and a whole constellation of systems are interoperating. It kinda reminds me of the magic of home automation with blinkenlights, only without any of that messy “hardware” stuff to break on you.