One of the easiest ways of building resilience into a system running in AWS is to use an autoscaling group. Generally speaking, I use one for any service which is required to self-heal - even when aiming to maintain a steady number of instances, as is desirable when running servers for Consul and Nomad, as well as a whole host of other clustered systems. Unhealthy instances can simply be replaced, usually without operator intervention, and launch configurations can be used to simplify upgrading clustered software one instance at a time.
However, it is often useful to be able to easily track activity within the chat system of your choice. In this post, we’ll look at how to use Terraform to deploy an AWS Lambda function which posts a message in Slack whenever a scaling operation happens - regardless of whether it was caused by an operator in the AWS console, API-driven changes, or automatic scaling for health.
Configuring Autoscaling Notifications
One of the features of AWS Autoscaling is the ability to deliver notifications to an SNS topic whenever a scaling event happens. We can configure notifications for the following types of events:
- successful launch of a new instance
- failed launch of a new instance
- successful termination of a running instance
- failed termination of a running instance
- test notifications (more on these later)
In Terraform, a resource named aws_autoscaling_notification
is used to configure notification delivery. We need to specify the notification event types we are interested in, the names of the autoscaling groups whose events we want, and the ARN of the SNS topic the notifications should be delivered to.
First though, we’ll use the aws_sns_topic
resource to configure the SNS topic for notifications to be delivered to:
resource "aws_sns_topic" "asg_slack_notify" {
name = "SlackNotify-ASG"
display_name = "Autoscaling Notifications to Slack"
}
Then we can configure notifications to be delievered to the topic we created:
resource "aws_autoscaling_notification" "slack_notify" {
group_names = ["${var.asg_names}"]
notifications = [
"autoscaling:EC2_INSTANCE_LAUNCH",
"autoscaling:EC2_INSTANCE_TERMINATE",
"autoscaling:EC2_INSTANCE_LAUNCH_ERROR",
"autoscaling:EC2_INSTANCE_TERMINATE_ERROR",
"autoscaling:TEST_NOTIFICATION"
]
topic_arn = "${aws_sns_topic.asg_slack_notify.arn}"
}
For now, we’re setting the autoscaling group names to the value of a variable named asg_names
- we’ll look more at how that gets populated later, when we talk about the overall structure of this module.
Sending Notifications to Slack
Lambda Function
Now we have notifications being delivered, we can write a Lambda function to extract the important information and use the Slack Webhooks API to send messages into the channel of our choice. I’m using JavaScript for this, but in principle you could use any of the supported Lambda platforms.
var https = require('https');
var util = require('util');
exports.handler = function(event, context) {
try {
var message = JSON.parse(event.Records[0].Sns.Message);
var channel = process.env.SLACK_CHANNEL
var username = process.env.SLACK_USERNAME
var webhookId = process.env.SLACK_WEBHOOK
var eventType = message.Event;
var autoScaleGroupName = message.AutoScalingGroupName;
var description = message.Description;
var cause = message.Cause;
var slackMessage = [
"*Event*: " + eventType,
"*Description*: " + description,
"*Cause*: " + cause,
].join("\n");
var postData = {
channel: channel,
username: username,
text: "*" + autoScaleGroupName + "*",
attachments: [{ text: slackMessage, mrkdwn_in: ["text"] }]
};
var options = {
method: 'POST',
hostname: 'hooks.slack.com',
port: 443,
path: '/services/' + webhookId
};
var req = https.request(options, function(res) {
res.setEncoding('utf8');
res.on('data', function (chunk) {
context.done(null);
});
});
req.on('error', function(e) {
context.fail(e);
console.log('request error: ' + e.message);
});
req.write(util.format("%j", postData));
req.end();
} catch (e) {
context.fail(e)
}
};
This is fairly self-explanatory code - the important things to note are the variables which must be set in the function’s environment - SLACK_CHANNEL
, SLACK_USERNAME
and SLACK_WEBHOOK_ID
.
Packaging
Lambda requires the code that makes up a function to be packaged as a zip archive before it can be deployed. We can use of Terraform’s archive_file
resource to do this:
data "archive_file" "notify_js" {
type = "zip"
source_file = "../../lambda/asgSlackNotifications.js"
output_path = "../../lambda/asgSlackNotifications.zip"
}
In this case we’re not using any third party NPM modules, so simply archiving the JavaScript file itself is sufficient.
Creating the Lambda Function
Next, we can use the aws_lambda_function
resource to create the lambda function itself, using the archive:
resource "aws_lambda_function" "slack_notify" {
depends_on = ["data.archive_file.notify_js"]
function_name = "asgSlackNotifications"
description = "Send notifications to Slack when Autoscaling events occur"
runtime = "nodejs4.3"
handler = "asgSlackNotifications.handler"
role = "${aws_iam_role.slack_notify.arn}"
filename = "${data.archive_file.notify_js.output_path}"
source_code_hash = "${base64sha256(file(data.archive_file.notify_js.output_path))}"
environment {
variables {
SLACK_CHANNEL = "${var.channel}"
SLACK_USERNAME = "${var.username}"
SLACK_WEBHOOK = "${var.asg_hook_id}"
}
}
}
There are a few interesting points about this resource:
-
The
depends_on
specification ensures that the archive file has finished being to processing this resource - it consists of the Terraform type and the specified name of the resource. -
Assigning a hash of the source code archive ensures that we will appropriately update the lambda function if the code package changes.
-
The environment variables we called out in the code above are set in the
environment
block. A future improvement is to encrypt the ID of the webhook using KMS, and use the AWSkms:Decrypt
operation in the lambda function in order to obtain the value so it is not availabe in plain text to an operator looking at the console. -
The
handler
must match the module name and function name in the source file, or invocations of the function will fail. -
We assign an IAM role to the function by ARN. We’ll look at the content of this next.
Creating an IAM Role
In order to create the role associated with the Lambda function, we need a couple of resources and data sources:
- An
aws_iam_role
resource to create the role itself, - An
aws_iam_role_policy
resource to add an inline policy to the role, - An
aws_iam_policy_document
data source to specify the policy for which services may assume the role, and - An
aws_iam_policy_document
data source to specify the actual text of the policy itself.
Using the aws_iam_policy_document
data source in Terraform allows us to author policies using HCL rather than templating. Whether you choose to use this is something of a matter of preference, but we tend to find it substantially better than writing or templating JSON.
First, we’ll look at the data source for the policy for who can assume the role:
data "aws_iam_policy_document" "assume_role" {
statement {
effect = "Allow"
actions = [
"sts:AssumeRole",
]
principals {
type = "Service"
identifiers = ["lambda.amazonaws.com"]
}
}
}
When reified at plan time, this will produce the following JSON policy text as the json
attribute:
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "",
"Effect": "Allow",
"Action": "sts:AssumeRole",
"Principal": {
"Service": "lambda.amazonaws.com"
}
}
]
}
We can use the rendered JSON to create our role:
resource "aws_iam_role" "slack_notify" {
name = "SlackNotifications"
assume_role_policy = "${data.aws_iam_policy_document.assume_role.json}"
}
Next, we can write the text of the policy, which allows writing logs to CloudWatch.
data "aws_iam_policy_document" "slack_notify" {
statement {
sid = "CloudwatchLogs"
effect = "Allow"
actions = [
"logs:CreateLogGroup",
"logs:CreateLogStream",
"logs:GetLogEvents",
"logs:PutLogEvents"
]
resources = ["arn:aws:logs:*:*:*"]
}
}
Finally, we create an inline policy on the role, using the reified policy text:
resource "aws_iam_role_policy" "slack_notify" {
name = "SlackNotifications"
role = "${aws_iam_role.slack_notify.id}"
policy = "${data.aws_iam_policy_document.slack_notify.json}"
}
Subscribing to the SNS topic
Before we can subscribe a Lambda function to an SNS topic, we must first add a permission to the function to allow the lambda:InvokeFunction
permission to the SNS topic. We can use the aws_lambda_permission
resource to do so:
resource "aws_lambda_permission" "with_sns" {
statement_id = "AllowExecutionFromSNS"
action = "lambda:InvokeFunction"
function_name = "${aws_lambda_function.slack_notify.arn}"
principal = "sns.amazonaws.com"
source_arn = "${aws_sns_topic.asg_slack_notify.arn}"
}
Finally, we can create a subscription with a aws_sns_topic_subscription
resource:
resource "aws_sns_topic_subscription" "lambda" {
depends_on = ["aws_lambda_permission.with_sns"]
topic_arn = "${aws_sns_topic.asg_slack_notify.arn}"
protocol = "lambda"
endpoint = "${aws_lambda_function.slack_notify.arn}"
}
This is the final piece of the configuration puzzle needed to provision all our cloud resources. Before we can plan or apply it though, we need to talk a bit about module arrangement and instantiation.
The Composition Root Pattern
Many of the Terraform best practices discussed on the web today revolve around the idea of building an entire infrastructure with one command. I prefer a world of small, cohesive modules instead - where infrastructure is made up of many states representing individual components. I’ll go into the rationale for this shortly, but first let’s look at the layout of a component:
$ tree
.
├── README.md
└── terraform
├── environments
│ ├── production
│ │ └── main.tf
│ └── staging
│ └── main.tf
├── lambda
│ └── asgSlackNotifications.js
└── modules
└── asg-notifications
├── iam.tf
├── interface.tf
├── lambda.tf
└── notifications.tf
In this repository layout, we separate individual functional units into modules, and then use a composition root per individual environment - in this case staging
and production
.
We consider there to be a number of benefits to this approach versus the commonly seen terraform.tfvars-per-environment approach. They will be covered in a lot more depth in future posts, but the big reason for now is that composition roots which are Terraform configuration files rather than variables files can use data sources to obtain values to plug in to the modules, and additional modules can be composed on a per-environment basis as necessary.
Composition roots tend to have a pattern which includes the following elements:
- Provider instantiation
- Data sources to query module values
- Module instantiation
- Outputs to provide for use in other composition roots
Staging Environment
The composition root for our staging environment looks like this:
provider "aws" {
region = "us-west-2"
}
data "aws_autoscaling_groups" "all" {}
module "asg_notifications" {
source = "../../modules/asg-notifications"
asg_names = "${data.aws_autoscaling_groups.all.names}"
asg_hook_id = "<redacted>"
channel = "#ops-staging"
username = "aws"
}
Notice the hard-coded environment specific variables such as the channel name, which would normally live within a .tfvars
file. In this case we do not need to provide any outputs.
In the case of needing to replicate this in another environment, a separate composition root in a different subdirectory of environments
would be used - for example environments/production/main.tf
.
Planning and Applying
Now the module and composition root for our target environment root are ready, we can run a plan and ensure all is as we expect. To do this, we use the terraform plan
command, with the -out
flag to ensure the plan is saved.
$ export AWS_ACCESS_KEY_ID=<redacted>
$ export AWS_SECRET_ACCESS_KEY=<redacted>
$ cd staging
$ terraform plan -out 001.plan
# Plan output removed for brevity
Plan: 10 to add, 0 to change, 0 to destroy.
Finally, we can apply the plan using terraform apply 001.plan
to create the resources and start receiving notifications!
Summary
In this post we’ve seen a few things which will feature more heavily in future in my posts on this blog:
- Managing SNS topics and subscriptions, Lambda functions and permissions
- Configuring Autoscaling Notifications
- Using
aws_iam_policy_document
to write IAM policies in HCL rather than JSON - The composition root pattern for Terraform.
If you want to run this for yourself, you’ll need Terraform version 0.8.5 (for the aws_autoscaling_groups
data source). If you’re running in Terraform Enterprise and using the composition root pattern, be sure to set the TF_ATLAS_DIR
environment variable to the root of the environment you are provisioning for.
In my next post, we’ll be looking at using Terraform to build a high quality reusable VPC module, configuring a VPC and all of it’s accoutrements such as flow logging, VPC endpoints, NAT and routing.