Here at Seeq, we are in the middle of a re-platforming effort that involves taking our on-premise deployment environment and moving to a containerized platform using Kubernetes. I am leading a squad that is developing the new production platform. As part of the re-platforming effort, we knew we wanted to use Infrastructure as Code (IaC) to manage the cloud and Kubernetes resources, but we weren’t sure what tool we wanted to use.
We knew that whatever tool we picked had a few requirements to support. First of all, we needed the tool to provision the actual underlying infrastructure for our new platform – including Virtual Private Clouds, subnets, Kubernetes clusters, etc. This lead us away from configuration management tools like Chef, Puppet, or Ansible. Some of these tools do have support for provisioning infrastructure, but that wasn’t their originally intended purpose, so we did not consider them seriously.
Another feature we wanted from this tool was to define our infrastructure and resources in a declarative manner. Rather than writing code to directly change our infrastructure, we wanted to be able to write code that defines what the infrastructure should look like and let the tool determine the differences and make the necessary updates. Most IaC tools work this way, so this still left us with a number of tools to consider, including Terraform, Pulumi, Azure Resource Manager with Bicep, and Amazon Web Services Cloud Development Kit (AWS CDK).
Our last requirement narrowed down the list considerably: we needed to be able to provision infrastructure in both Azure and AWS using this tool. Since Bicep and the AWS CDK are cloud provider-specific, we found ourselves taking a close look at Terraform and Pulumi.
To begin our evaluation, we created prototype Kubernetes clusters in AWS on Elastic Kubernetes Service using both Terraform and Pulumi. This exercise was useful, and we learned a great deal from the exercise. For example, one of the things we liked about Terraform is that it definitely felt more mature than Pulumi – its available documentation and community support seemed superior. On the other hand, we appreciated that Pulumi’s Python libraries for Azure, AWS, and Kubernetes are generated directly from API definitions, so you don’t have to wait for support of new features. We also loved Pulumi’s IDE integration. If you’re interested in more comparisons between the two, Pulumi has actually published a page that’s worth a look.
Ultimately, our experiences with both Terraform and Pulumi were positive, and it was a difficult decision between the two. We picked Pulumi mainly because it integrated well with code and tooling that we already had, since we planned to use Pulumi’s Python libraries, and we had a fair amount of tooling written in Python already. Also, we felt that Pulumi would allow for limitless possibilities for how we defined our infrastructure, since we’d have the full power of Python at our command. Unfortunately, there was one thing we didn’t really take into account when making this decision – or at least we didn’t give it enough weight. Keep reading for more detail on what we missed.
Over the next few months, a couple of developers on our squad created Pulumi stacks that worked for our use cases and were fairly well organized. There were some waves of refactors to rework some of the abstractions, but to those actively working on the infrastructure it felt like things were coming together well.
After this initial development, we decided to do a week of intensive work where we would bring in other team members. After that week, the other team members reported that they struggled to grasp the existing code and contribute their own changes. The concern they expressed was that the Python abstractions and the way the Pulumi code was structured unnecessarily complicated things and made it difficult to get up to speed. Even those that had previous experience with Terraform (which has a very similar model to Pulumi) had trouble.
I believe the newly involved members of the team were correct in their assessment that Pulumi is inherently more complicated than Terraform. Terraform’s approach to IaC is intentionally limited, and these limitations make it simpler. For example, there is no way to define your own function in Terraform even though it has its own built-in functions. Pulumi, on the other hand, aims for the essentially unlimited power of conventional programming languages. You could, if you were sufficiently deranged, define all of your infrastructure in an Excel spreadsheet, and then use Python to parse the spreadsheet and provision a cloud resource for each row. Pulumi gives you flexibility at the cost of requiring a bit more cognitive load to read and understand the code.
I think this tradeoff hits on the main point that we failed to consider when doing our initial evaluation: our team’s experience. We have a mix of backgrounds on our squad. Some of us come from a product development, software engineering background, and some of us come from an operational, site reliability engineering background. For those of us who are used to writing code day in and day out the additional cognitive load brought on by Pulumi was, for the most part, negated by our familiarity with conventional programming languages, software design patterns, etc. However, for those of us who have more of an operational background, the simplicity that Terraform provides – especially when reading Terraform code – is highly valuable, allowing for quicker ramp up and helping to prevent mistakes.
The feedback we received lead to a discussion where we reevaluated our options. We realized that we didn’t really need Pulumi’s flexibility for what we were doing or the things we were planning to do. In order to keep our Terraform code well factored, we could use Terragrunt. Since then, we haven’t needed anything else besides what Terraform and Helm charts provide. As we’ve gotten more experience with Terraform, we have found it very easy to understand and write. The cloud and Kubernetes resources we’re provisioning are complicated on their own, and the intentional simplicity of Terraform can help keep things manageable.
Even though our experience with Terraform has been positive, we still wanted to have a plan in case we ever ran into its limitations. If that ever happens, I think we’ll have a few options depending on exactly what it is we’re trying to do. The first thing we might reach for is writing a Terraform plugin. Just like Terraform’s providers for Helm releases, cloud resources, etc., you can create your own Terraform provider with its own custom logic and capabilities. If a Terraform plugin doesn’t seem like the right fit, we could look into the Cloud Development Kit for Terraform, which hasn’t had a stable release yet, but looks to fill the same space as Pulumi – defining infrastructure using familiar programming languages. Finally, as we’ve gotten more experienced with Kubernetes, we have realized that the Kubernetes operator pattern may be a good fit for some of the more advanced infrastructure configurations we may want to support. With each of these options, the members of our squad with software engineering backgrounds should be able to dig in and provide a solution for any limitations we may run into.
In conclusion, the bottom line is that both Terraform and Pulumi are excellent tools. They both work as advertised and allow you to define and manage your infrastructure in a clean manner. In my mind, the decision of one or the other mostly comes down to what your team is comfortable with. If a team has a significant number of engineers who don’t have a lot of experience with conventional programming languages, then Terraform is probably a better choice due to its built-in limitations. However, if you’re looking to turn a team of software developers into DevOps engineers, then Pulumi would be a more natural fit. I think either tool can support any scenario that you may throw at them – whether through built-in support for general purpose programming languages in Pulumi’s case or via extensions like Terraform’s plugins or Cloud Development Kit. So how about you? Have you tried Pulumi or Terraform? If so, what was your experience like?