CloudWatch custom metrics
Wednesday, December 16th, 2015 05:40 pm GMT +2

In this post we’re going to investigate how to create custom metrics for the AWS CloudWatch service. In some cases it’s not enough to have default set of AWS metrics which are already defined for you in dashboard. Let’s say you want to monitor the number of active customers on your website, or you would like to know the number of failed requests to your API backend.

Whatever it is, CloudWatch will allow you to record any time-based data using AWS APIs.


Install AWS CLI

First off, in order to post data to CloudWatch you need to install and setup command-line tools to work with AWS APIs:

$ pip install awscli

If you don’t know what pip is, head over pip installation manual.
Once AWS CLI is installed the aws command should be available via your command shell:

$ aws
usage: aws [options]   [ ...] [parameters]
To see help text, you can run:

aws help
aws: error: too few arguments

Before issuing real commands to AWS service you need to configure cli tools:

$ aws configure
AWS Access Key ID [****************KCEQ]:
AWS Secret Access Key [****************x++s]:
Default region name [us-west-2]:
Default output format [json]:

After successfull configuration you should be able to use command-line interface for AWS, for example, let’s list all available EC2 instances:

$ aws ec2 describe-instances | grep "PublicDnsName" | awk '{$1=$1};1' | uniq
"PublicDnsName": "ec2-44-148-96-212.us-west-2.compute.amazonaws.com",
"PublicDnsName": "ec2-42-25-174-143.us-west-2.compute.amazonaws.com",
"PublicDnsName": "ec2-44-68-194-93.us-west-2.compute.amazonaws.com",
"PublicDnsName": "ec2-44-69-229-202.us-west-2.compute.amazonaws.com",

Choose a metric

Let’s say that we want to monitor application API status by periodically requesting specific endpoint and checking HTTP response code. In our example we’ll use http://httpbin.org/status/200 URL to demo.

We will consider HTTP 200 response code as a service alive event and treat it as 1, and any other response code(including timeout) as a service failure event or 0.

The following bash code(service_mon.sh) will output HTTP response code:

#!/bin/bash
CODE=`curl -k -s -o /dev/null -w "%{http_code}" http://httpbin.org/status/200`
echo $CODE

Run it and verify that script outputs 200:

$ ./service_mon.sh
200

Choose metric name

We’re going to name our metric ServiceStatus and will report it to CloudWatch using cloudwatch subcommand. Modify above mentioned service_mon.sh script to be the following:

#!/bin/bash

# get response code
CODE=`curl -k -s -o /dev/null -w "%{http_code}" http://httpbin.org/status/200`
if [ "$CODE" -eq 200 ] ; then
aws cloudwatch put-metric-data --metric-name ServiceStatus --namespace CL_AGENTS --value 1 --unit "Count"
else
aws cloudwatch put-metric-data --metric-name ServiceStatus --namespace CL_AGENTS --value 0 --unit "Count"
fi

Note CL_AGENTS in cmd parameters, this is a convenient group name for all your metrics, typically you’d want to put your application name in there. In this example we’re specifying a type for our data called Count, this is not the only possible option. Here’s a list of other units:

[ Seconds, Microseconds, Milliseconds, Bytes, Kilobytes, Megabytes, Gigabytes, Terabytes, Bits, Kilobits,
Megabits, Gigabits, Terabits, Percent, Count, Bytes/Second, Kilobytes/Second, Megabytes/Second, Gigabytes/Second,
Terabytes/Second, Bits/Second, Kilobits/Second, Megabits/Second, Gigabits/Second, Terabits/Second, Count/Second, None ]

 

Periodic script execution

Put the script we’ve written into crontab:

# m h  dom mon dow   command
*/5 * * * * /mnt/service_mon.sh

This will post service status to CloudWatch every 5 minutes. Here’s how it might look like in web based AWS console:

You may also report multiple values at the same time using JSON file:

[
{
"MetricName": "ServiceStatus",
"Timestamp": "Wednesday, June 12, 2013 8:28:20 PM",
"Value": 1,
"Unit": "Count"
},
{
"MetricName": "ServiceStatus",
"Timestamp": "Wednesday, June 12, 2013 8:30:20 PM",
"Value": 0,
"Unit": "Count"
}
]

To do bulk reporting use this format:

$ aws cloudwatch put-metric-data --namespace "CL_AGENTS" --metric-data file://metric.json

Setup alarms

Up until now our metric is almost useless, except that we may see nice graphs in the CloudWatch. To do it more usable, let’s add alarm if our API is down:

  • Select ServiceName metric in the CL_AGENTS namespace we’ve just created
  • Click Create Alarm button on the right side of the graph
  • In the subsequent popup, select alarm details:

Gotchas

  • Keep in mind that CloudWatch will store your time-series data only for 2 weeks (14 days), therefore if you want more history and insights for your data you’re probably will be better off using custom database like InfluxDB for example
  • Make sure to setup aws cli under the same user which your crontab is using, otherwise AWS configuration won’t be accessible for cron and your monitoring script will not work

Links