CloudWatch custom metrics
Wednesday, December 16th, 2015 05:40 pm GMT +2
In this post we’re going to investigate how to create custom metrics for the AWS CloudWatch service. In some cases it’s not enough to have default set of AWS metrics which are already defined for you in dashboard. Let’s say you want to monitor the number of active customers on your website, or you would like to know the number of failed requests to your API backend.
Whatever it is, CloudWatch will allow you to record any time-based data using AWS APIs.
Install AWS CLI
First off, in order to post data to CloudWatch you need to install and setup command-line tools to work with AWS APIs:
$ pip install awscli
If you don’t know what pip
is, head over pip installation manual.
Once AWS CLI is installed the aws
command should be available via your command shell:
$ aws
usage: aws [options] [ ...] [parameters]
To see help text, you can run:
aws help
aws: error: too few arguments
Before issuing real commands to AWS service you need to configure cli tools:
$ aws configure
AWS Access Key ID [****************KCEQ]:
AWS Secret Access Key [****************x++s]:
Default region name [us-west-2]:
Default output format [json]:
After successfull configuration you should be able to use command-line interface for AWS, for example, let’s list all available EC2 instances:
$ aws ec2 describe-instances | grep "PublicDnsName" | awk '{$1=$1};1' | uniq
"PublicDnsName": "ec2-44-148-96-212.us-west-2.compute.amazonaws.com",
"PublicDnsName": "ec2-42-25-174-143.us-west-2.compute.amazonaws.com",
"PublicDnsName": "ec2-44-68-194-93.us-west-2.compute.amazonaws.com",
"PublicDnsName": "ec2-44-69-229-202.us-west-2.compute.amazonaws.com",
Choose a metric
Let’s say that we want to monitor application API status by periodically requesting specific endpoint and checking HTTP response code. In our example we’ll use http://httpbin.org/status/200
URL to demo.
We will consider HTTP 200 response code as a service alive
event and treat it as 1
, and any other response code(including timeout) as a service failure
event or 0
.
The following bash code(service_mon.sh
) will output HTTP response code:
#!/bin/bash
CODE=`curl -k -s -o /dev/null -w "%{http_code}" http://httpbin.org/status/200`
echo $CODE
Run it and verify that script outputs 200
:
$ ./service_mon.sh
200
Choose metric name
We’re going to name our metric ServiceStatus
and will report it to CloudWatch using cloudwatch
subcommand. Modify above mentioned service_mon.sh
script to be the following:
#!/bin/bash
# get response code
CODE=`curl -k -s -o /dev/null -w "%{http_code}" http://httpbin.org/status/200`
if [ "$CODE" -eq 200 ] ; then
aws cloudwatch put-metric-data --metric-name ServiceStatus --namespace CL_AGENTS --value 1 --unit "Count"
else
aws cloudwatch put-metric-data --metric-name ServiceStatus --namespace CL_AGENTS --value 0 --unit "Count"
fi
Note CL_AGENTS
in cmd parameters, this is a convenient group name for all your metrics, typically you’d want to put your application name in there. In this example we’re specifying a type for our data called Count
, this is not the only possible option. Here’s a list of other units:
[ Seconds, Microseconds, Milliseconds, Bytes, Kilobytes, Megabytes, Gigabytes, Terabytes, Bits, Kilobits,
Megabits, Gigabits, Terabits, Percent, Count, Bytes/Second, Kilobytes/Second, Megabytes/Second, Gigabytes/Second,
Terabytes/Second, Bits/Second, Kilobits/Second, Megabits/Second, Gigabits/Second, Terabits/Second, Count/Second, None ]
Periodic script execution
Put the script we’ve written into crontab:
# m h dom mon dow command
*/5 * * * * /mnt/service_mon.sh
This will post service status to CloudWatch every 5 minutes. Here’s how it might look like in web based AWS console:
You may also report multiple values at the same time using JSON file:
[
{
"MetricName": "ServiceStatus",
"Timestamp": "Wednesday, June 12, 2013 8:28:20 PM",
"Value": 1,
"Unit": "Count"
},
{
"MetricName": "ServiceStatus",
"Timestamp": "Wednesday, June 12, 2013 8:30:20 PM",
"Value": 0,
"Unit": "Count"
}
]
To do bulk reporting use this format:
$ aws cloudwatch put-metric-data --namespace "CL_AGENTS" --metric-data file://metric.json
Setup alarms
Up until now our metric is almost useless, except that we may see nice graphs in the CloudWatch. To do it more usable, let’s add alarm if our API is down:
- Select
ServiceName
metric in theCL_AGENTS
namespace we’ve just created - Click
Create Alarm
button on the right side of the graph - In the subsequent popup, select alarm details:
Gotchas
- Keep in mind that CloudWatch will store your time-series data only for 2 weeks (14 days), therefore if you want more history and insights for your data you’re probably will be better off using custom database like InfluxDB for example
- Make sure to setup aws cli under the same user which your crontab is using, otherwise AWS configuration won’t be accessible for cron and your monitoring script will not work