In my last blog post I introduced a stage which executed Checkov to my Terraform Azure DevOps pipeline, this scanned the Terraform configuration and stopped the deployment if there was an issue. I also added a stage which checks to see if there are any resources being destroyed.
Whats missing?
Both of these I thought should give some basic protection against problems caused by common configuration issues, which they did, but it didn’t take into account the end user, i.e. me, making a change which would dramatically increase the running costs of the deployment.
Enter Infracost
One day as I was skimming through Reddit and I noticed mention of Infracost (I can’t remember the post, sorry) - the description of the tool got my attention “Cloud cost estimates for Terraform in pull requests” peaked my interest and gave it ago locally.
Installing and registering Infracost locally
As I am macOS user installing Infracost locally was a Homebrew command away:
brew install infracost
Once installed you need to register for an API key, this can be done with a single command:
infracost register
It will ask you for your Name and Email address, once you enter these you should see something like the following output:
Tip
Please make a note of the API key, you will need it later.
$ infracost register
Please enter your name and email address to get an API key.
See our FAQ (https://www.infracost.io/docs/faq) for more details.
Name: Russ McKendrick
Email: [email protected]
Thank you Russ McKendrick!
Your API key is: IaMnOtREaLlyANapIK3y
Success: Your API key has been saved to /Users/russ.mckendrick/.config/infracost/credentials.yml
You can now run infracost breakdown --path=... and point to your Terraform directory or JSON/plan file.
That is all of the configuration you need to do, once installed you can try running the tool.
Running Infracost locally
Next up we need some Terraform to run it against, I have some test code which launches a Linux Virtual Machine in Azure so decided to use that.
Info
The Terraform code I am using can be found at the russmckendrick/terraform-vm-local-example Github repo.
To start with run the following:
terraform init
terraform plan -out tfplan.binary
terraform show -json tfplan.binary > plan.json
This will download all of the Terraform providers and create a Terraform Plan file and then convert it to JSON. Next up we can run Infracost against the JSON version of the Terraform Plan file u sing the following command:
infracost breakdown --path plan.json
As you can see from the output below (you may need to scroll right), the virtual machine using the SKU Standard_B1ms is going to cost $17.23 per month with an addition cost of around $1.69 for the disk operations:
$ infracost breakdown --path plan.json
Detected Terraform plan JSON file at plan.json
✔ Calculating monthly cost estimate
Project: russmckendrick/terraform-vm-local-example/plan.json
Name Monthly Qty Unit Monthly Cost
azurerm_linux_virtual_machine.main
├─ Instance usage (pay as you go, Standard_B1ms) 730 hours $17.23
└─ os_disk
├─ Storage (S4) 1 months $1.69
└─ Disk operations Monthly cost depends on usage: $0.0005 per 10k operations
OVERALL TOTAL $18.92
----------------------------------
To estimate usage-based resources use --usage-file, see https://infracost.io/usage-file
That’s a reasonable cost, so lets launch the Virtual Machine by running:
Warning
Warning! If you are following along running the commands below will incur cost.
terraform apply
Now that we have the Virtual Machine, lets increase the specification by updating the SKU to Standard_B4ms, this can be done in the terraform.tfvars
file in the repo. Once updated, generate a new plan file and run Infracost again:
terraform plan -out tfplan.binary
terraform show -json tfplan.binary > plan.json
infracost breakdown --path plan.json
You will notice that when you ran the terraform plan
command it checked against the Terraform Statefile, however as you can see from the output below …
$ infracost breakdown --path plan.json
Detected Terraform plan JSON file at plan.json
✔ Calculating monthly cost estimate
Project: russmckendrick/terraform-vm-local-example/plan.json
Name Monthly Qty Unit Monthly Cost
azurerm_linux_virtual_machine.main
├─ Instance usage (pay as you go, Standard_B4ms) 730 hours $137.97
└─ os_disk
├─ Storage (S4) 1 months $1.69
└─ Disk operations Monthly cost depends on usage: $0.0005 per 10k operations
OVERALL TOTAL $139.66
----------------------------------
To estimate usage-based resources use --usage-file, see https://infracost.io/usage-file
… all it shows is the new cost, wouldn’t it be good if you could figure out the difference? Well you can, just run the following command:
infracost diff --path plan.json
This time I got the output below:
$ infracost diff --path plan.json
Detected Terraform plan JSON file at plan.json
✔ Calculating monthly cost estimate
Project: russmckendrick/terraform-vm-local-example/plan.json
~ azurerm_linux_virtual_machine.main
+$121 ($18.92 -> $140)
- Instance usage (pay as you go, Standard_B1ms)
-$17.23
+ Instance usage (pay as you go, Standard_B4ms)
+$138
Monthly cost change for russmckendrick/terraform-vm-local-example/plan.json
Amount: +$121 ($18.92 -> $140)
Percent: +638%
----------------------------------
Key: ~ changed, + added, - removed
To estimate usage-based resources use --usage-file, see https://infracost.io/usage-file
… as you can, here we have an increase in cost of 638% - probably best that I don’t update the SKU !!!
Now lets look how this can be applied to the Azure DevOps pipeline, but not before I run the following to remove the Virtual Machine:
terraform destroy
The Pipeline
The stages of the pipeline are not changing too much, they are still the following
- Checkov Scan, there are no changes to this stage
- Terraform Validate, there are no changes to this stage
- Terraform Plan, there is where all of the changes are and we will be covering this stage in more detail below.
- Terraform Apply (Auto Approval), there are no changes to this stage
- Terraform Apply (Manual Approval), there are some minor changes to this stage, mostly around the wording
Additional Pipeline variables
There is an addition of a single variable at the top of the azure-pipeline.yml
file, this sets the cost_increase_alert_percentage
threshold - in my case I set this to 50%:
variables:
tf_version: "latest" # what version of terraform should be used
tf_state_rg: "rg-tfstate" # name of the resource group to create/use for the terraform state file
tz_state_location: "uksouth" # location of the resource group to create/use for the terraform state file
tf_state_sku: "Standard_RAGRS" # sku to use when creating the storeage account to create/use for the terraform state file
tf_state_sa_name: "tfstatesa20210606" # name of of the storage account to create/use for the terraform state file
tf_state_container_name: "tfstate" # name of of the container to create/use for the terraform state file
tf_environment: "dev" # enviroment name, used for the statefile name
cost_increase_alert_percentage: 50 # if the difference in costs is higher than x% then you will need to manually validate the deployment
The second variable which needs to be added contains the API key which you made a note of when the infracost register
command was ran locally. If you didn’t make a note then the configuration file created by the command also contains the API key, in my case was stored at /Users/russ.mckendrick/.config/infracost/credentials.yml
and make a note of the API key.
Open the pipeline in Azure DevOps, click Edit, then Variables and finally add a variable called INFRACOST_API_KEY
making sure that you tick the Keep this value secret box:
Now the two variables have been added lets look at the changes to the pipeline itself.
Stage - Terraform Plan
Before this stage contained the following tasks:
- “Run > terraform init”
- “Run > terraform plan”
- “Run > terraform show”
There are no changes to these three tasks, by the end of the them we are left with an idea of what Terraform is going to do and a Terraform Plan file is stored at $(System.DefaultWorkingDirectory)/terraform.tfplan
.
Task - Install > Infracost
The first of the two new tasks we are adding simply installs Infracost:
- bash: |
if [ -z "$(INFRACOST_API_KEY)" ]; then
echo "ℹ️ - No Infracost API Key has been detected - skipping task"
else
sudo apt-get update -qq && sudo apt-get -qq install bc curl git jq bc
curl -sL https://github.com/infracost/infracost/releases/latest/download/infracost-linux-amd64.tar.gz | tar xz -C /tmp
sudo mv /tmp/infracost-linux-amd64 /usr/bin/infracost
fi
name: "installinfracost"
displayName: "Install > Infrascost"
As you can see, there is a little logic in there which skips this step if the $(INFRACOST_API_KEY)
is not defined and we are just left with a message which looks like the following:
Once Infracost has been installed we can then run it.
Task - Run > Infracost
There is quite a bit of logic in the this task, here it is in its entirety:
- bash: |
if [ -z "$(INFRACOST_API_KEY)" ]; then
echo "ℹ️ - No Infracost API Key has been detected - skipping task"
else
mkdir $(System.DefaultWorkingDirectory)/output
terraform show -json $(System.DefaultWorkingDirectory)/terraform.tfplan > $(System.DefaultWorkingDirectory)/output/plan.json
infracost breakdown --format json --path $(System.DefaultWorkingDirectory)/output/plan.json > $(System.DefaultWorkingDirectory)/output/cost.json
past_total_monthly_cost=$(jq '[.projects[].pastBreakdown.totalMonthlyCost | select (.!=null) | tonumber] | add' $(System.DefaultWorkingDirectory)/output/cost.json)
total_monthly_cost=$(jq '[.projects[].breakdown.totalMonthlyCost | select (.!=null) | tonumber] | add' $(System.DefaultWorkingDirectory)/output/cost.json)
diff_cost=$(jq '[.projects[].diff.totalMonthlyCost | select (.!=null) | tonumber] | add' $(System.DefaultWorkingDirectory)/output/cost.json)
percentage_threshold=$(cost_increase_alert_percentage)
if [ $(echo "$past_total_monthly_cost > 0" | bc -l) = 1 ] && [ $(echo "$total_monthly_cost > 0" | bc -l) = 1 ]; then
percent=$(echo "scale=6; $total_monthly_cost / $past_total_monthly_cost * 100 - 100" | bc)
fi
if [ $(echo "$past_total_monthly_cost <= 0" | bc -l) = 1 ] && [ $(echo "$total_monthly_cost <= 0" | bc -l) = 1 ]; then
percent=0
fi
if [ -z "$percent" ]; then
echo "##vso[task.logissue type=warning]💰 - ℹ️ No previous cost data has been detected"
elif [ $(echo "$percent > $percentage_threshold" | bc -l) = 1 ]; then
echo "##vso[task.logissue type=warning]💰 - 📈 A $percent% increase in cost have be detected. Your monthly costs are increasing from \$$past_total_monthly_cost to \$$total_monthly_cost"
echo "##vso[task.setvariable variable=TERRAFORM_PLAN_HAS_DESTROY_CHANGES]true"
else
echo "##vso[task.logissue type=warning]💰 - 📉 An acceptable or no change in cost has been detected. Your new monthly costs are \$$total_monthly_cost from \$$past_total_monthly_cost"
fi
fi
env:
INFRACOST_API_KEY: $(INFRACOST_API_KEY)
name: "runinfracost"
displayName: "Run > Infrascost"
The first few of the steps in the task roughly follow what we ran locally:
- Check to see if
$(INFRACOST_API_KEY)
has been set - Create a folder called output
- Run
terraform show
using the plan file created by the Run > terraform plan task to save a JSON version of the plan - Take the JSON file created above and run
infracost breakdown
, this time outputting the results as a second JSON file
With the only difference being that rather outputting the the screen we are saving the results to a JSON file, once the file has been generated we can interact with it using eh jq
command, jq
is a lightweight and flexible command-line JSON processor.
First we get the value for the previous cost, if there was one, and assign it to the $past_total_monthly_cost
variable
past_total_monthly_cost=$(jq '[.projects[].pastBreakdown.totalMonthlyCost | select (.!=null) | tonumber] | add' $(System.DefaultWorkingDirectory)/output/cost.json)
Them we get the value for the new cost, and assign it to the $total_monthly_cost
variable:
total_monthly_cost=$(jq '[.projects[].breakdown.totalMonthlyCost | select (.!=null) | tonumber] | add' $(System.DefaultWorkingDirectory)/output/cost.json)
Next up, we get the difference in cost and set that as the $diff_cost
variable:
diff_cost=$(jq '[.projects[].diff.totalMonthlyCost | select (.!=null) | tonumber] | add' $(System.DefaultWorkingDirectory)/output/cost.json)
Not how that the difference in cost was aviable to us in the JSON output without the need for us to run the infracost diff
command.
Finally, we take the pipeline variable $(cost_increase_alert_percentage)
and set a local one called $percentage_threshold
:
percentage_threshold=$(cost_increase_alert_percentage)
The next part of the script …
if [ $(echo "$past_total_monthly_cost > 0" | bc -l) = 1 ] && [ $(echo "$total_monthly_cost > 0" | bc -l) = 1 ]; then
percent=$(echo "scale=6; $total_monthly_cost / $past_total_monthly_cost * 100 - 100" | bc)
fi
… only runs if both $past_total_monthly_cost
and $total_monthly_cost
are greater than 0, what it does it set the percentage increase or decrease based on the data in the variables we have just set, this is then exported to the $percent
variable.
The next statement sets $percent
if there is no cost data:
if [ $(echo "$past_total_monthly_cost <= 0" | bc -l) = 1 ] && [ $(echo "$total_monthly_cost <= 0" | bc -l) = 1 ]; then
percent=0
fi
Now we should have information to make a decision on what the Terraform should do, which should be one of three things;
- 1. Do nothing, there is no price data to output a message saying that and move on.
- 2. Check to see if
$percent
is higher than$percentage_threshold
, if so output a message and also set$TERRAFORM_PLAN_HAS_DESTROY_CHANGES
totrue
to trigger the manual review stage. - 3. If neither of the conditions above are met then assume that the cost increase with within
$percentage_threshold
, print a message.
This looks like the following:
if [ -z "$percent" ]; then
echo "##vso[task.logissue type=warning]💰 - ℹ️ No previous cost data has been detected"
elif [ $(echo "$percent > $percentage_threshold" | bc -l) = 1 ]; then
echo "##vso[task.logissue type=warning]💰 - 📈 A $percent% increase in cost have be detected. Your monthly costs are increasing from \$$past_total_monthly_cost to \$$total_monthly_cost"
echo "##vso[task.setvariable variable=TERRAFORM_PLAN_HAS_DESTROY_CHANGES]true"
else
echo "##vso[task.logissue type=warning]💰 - 📉 An acceptable or no change in cost has been detected. Your new monthly costs are \$$total_monthly_cost from \$$past_total_monthly_cost"
fi
The final part of the task closes the loop and also sets the content of $(INFRACOST_API_KEY)
as an environment variable called INFRACOST_API_KEY
which the infracost
checks when it is executed:
fi
env:
INFRACOST_API_KEY: $(INFRACOST_API_KEY)
name: "runinfracost"
displayName: "Run > Infrascost"
Task - Vars > Set Variables for next stage
The final task in this stage is not much different than before, just some of the wording has been tweaked to take into account we are now looking for cost as well a resources being destroyed:
- bash: |
if [ "$TERRAFORM_PLAN_HAS_CHANGES" = true ] && [ "$TERRAFORM_PLAN_HAS_DESTROY_CHANGES" = false ] ; then
echo "##vso[task.setvariable variable=HAS_CHANGES_ONLY;isOutput=true]true"
echo "##vso[task.logissue type=warning]👍 - Changes with no destroys detected, it is safe for the pipeline to proceed automatically"
fi
if [ "$TERRAFORM_PLAN_HAS_CHANGES" = true ] && [ "$TERRAFORM_PLAN_HAS_DESTROY_CHANGES" = true ] ; then
echo "##vso[task.setvariable variable=HAS_DESTROY_CHANGES;isOutput=true]true"
echo "##vso[task.logissue type=warning]⛔️ - Changes with Destroy or Cost increase, pipeline will require a manual approval to proceed"
fi
if [ "$TERRAFORM_PLAN_HAS_CHANGES" != true ] ; then
echo "##vso[task.logissue type=warning]ℹ️ - No changes detected, terraform apply will not run"
fi
name: "setvar"
displayName: "Vars > Set Variables for next stage"
There are also some tweaks to the rest of the pipeline, but nothing outside of changing some of the wording.
Running the Pipeline
Now that we have all of the bits together lets run the same Terraform code which launches a Linux virtual machine with the Standard_B1ms SKU.
Initial Run
When the pipeline is first run there are no existing costs so we get the following output:
As you can see, we have a message saying that “No previous cost data has been detected” and that Terraform as just run as expected as it is only adding resources.
Running again
Rerunning with the same SKU us the following:
As we already have an existing resource Infracost returns information on both the previous and new cost, which in our case was $18.91 - also not that as there are no changes Terraform does not attempt to apply any thing.
Updating the SKU and increasing costs
Now lets bump the SKU to Standard_B4ms:
As you can see, an cost increase of over 50% has been detected, over 630% in-fact from $18.91 to £139.66 per month, so the $HAS_DESTROY_CHANGES
has been set and the manual validation stage was executed.
Undo the change to the SKU
The final change is changing the SKU of the virtual machine back to Standard_B1ms:
The message this time shows that the costs have been reduced and we are OK with that, so the pipeline triggered the auto-approve stage and we didn’t have to step in and review the changes.
Summary
Now the pipeline described above does differ from the native CI/CD integration provided by Infracost which can be found here . Infracost’s own integration hooks into your repo and is triggered on a pull request - as I already had a pipeline built I decided to adapt their script a little so that it fitted my own needs.
With over 3 million prices listed covering the bulk of Microsoft Azure, Amazon Web Services, and Google Cloud Platform cloud services it should pick up the majority of common mistakes when it comes to incorrectly configuring a service using Terraform and hopefully stop you getting any nasty surprises at the end of the month.
They have also just updated the self-hosted version of the Cloud Pricing API meaning that you can connect to your own instance rather than registered to use their public end-point which is extremely useful if you have limited network access, see this blog post for more information.
The full code for the pipeline and Terraform scripts covered in this post can be found in the GitHub repo here .