topics

Operational Governance

January 7, 2025

CloudHealth recommends adding the following AWS Best Practice Policies for Operational Governance.

Identify Expiring Reserved Instances
Identify Expiring Savings Plans
Identify and Terminate Zombie Instances
Identify and Terminate Zombie Volumes
Identify and Delete Old Snapshots
Release Disassociated Elastic IPs
Instance Scheduling (Lights On/Lights Off)
Identify Older Generation Instances

Step 1 of 8

Identify Expiring Reserved Instances

Get notified about RIs that are expiring in the next 60-90 days so that you have enough time to do a rightsizing analysis and determine which new ones should be purchased.

Sample Expiring RI Policy: Expiring Reservations in 60 days, high alert. Expiring Reservations in 30 days, critical alert.

Step 2 of 8

Identify Expiring Savings Plans

AWS Savings Plans are a discount mechanism where users commit to a certain amount of AWS spend for every hour, for 1 year or 3 years. Using the CloudHealth platform, you can create an alert to track the Savings Plans that are expiring shortly. For example, you can set an alert 60-90 days in advance of your Savings Plans expiration date, so that you will have enough time to make informed business decisions like if and how you plan to replace that Savings Plan.

Sample Expiring Savings Plan Policy: Expiring Savings Plans in 30 days, high alert.

Add Condition for Savings Plan

Sample Expiring Savings Plan Policy

Step 3 of 8

Identify and Terminate Zombie Instances

These are running instances that are idle, most likely forgotten, and costing you money. Identify instances that are running with a daily average CPU rate lower than 10% for 2 weeks in a row and Network I/O less than 5 MB for 4 or more days. If you want to be more specific, isolate instances based on their instance type.

Example: C-type instances (compute intensive) that have a Maximum CPU less than 10% for the last 14 days are most likely to be running idle and are good candidates to be terminated.

Sample Zombie Instance Identifying Policy: This policy identifies potential zombie EC2 instances. It looks at specific instance types that are either compute optimized (e.g., C family) or storage and I/O optimized. Two rules make up this policy:

Rule 1: Identify C-class instance types that have a low average CPU %, stop them, and notify the IAM user (i.e., owner) of these instances.
Rule 2: Identify HS-class instances that have low average read and write operations, stop them, and notify the IAM user of these instances.

These two rules would evaluate separately. In addition, by leveraging CloudHealth Perspectives, you can run these rules against specific non-production environments.

Variant: Add different rules that capture other performance metrics such as network traffic.

Step 4 of 8

Identify and Terminate Zombie Volumes

These are EBS volumes that were launched with an instance but left unattached after the instance was terminated, costing you money.

Example: Identify volumes that have been unattached for more than 2 weeks and terminate them after confirming that they do not contain critical data.

Sample Zombie Volume Identifying Policy: This policy identifies attached but potentially unused EBS volumes and terminates them.

Step 5 of 8

Identify and Delete Old Snapshots

These are old snapshots that have crossed a certain age threshold. Old snapshots can become a legal liability.

Example: Identify snapshots that are older than 6 months and terminate them after confirming that they do not contain critical data.

Sample old Snapshot Identifying Policy: This policy sends a notification when it identifies potential zombie EC2 snapshots that older than 6 months.

Step 6 of 8

Release Disassociated Elastic IPs

Amazon charges for Elastic IPs only when they are not associated with an instance. Sometimes when instances are terminated, the Elastic IP is not released, which will result in a charge.

When setting up a policy to automatically delete Elastic IP addresses in order to avoid incurring costs, the EIP is only terminated if it is not attached to both an instance and an Elastic Network Interface, a sort of virtual adapter applied to an instance for networking purposes. A single instance can have multiple ENIs (the exact number depending on instance type and size) each with its own IPv4 EIP and private IPv4 address. These ENIs can be freely removed from instances and attached to others.

For more information, see Elastic Network Interfaces in the AWS documentation.

If an ENI is removed from one instance and not reassigned to another, it will retain its EIP. When reporting on EIPs, this “floating” EIP will appear as unattached. If an EIP is attached to an orphaned ENI, it will not incur costs like it would if it were unattached from an instance normally. However, for reporting purposes it is best to ensure that you do not have any floating ENIs or EIPs.

Example: Identify and release EIPs that are no longer associated with a running instance for more than 1 week.

Sample Release Disassociated Elastic IP Identifying Policy: This policy identifies disassociated EIPs that exist for more than 1 week and send a notification and releases them.

Step 7 of 8

Instance Scheduling (Lights On/Lights Off)

Not all instances are in use 24x7x365, especially those outside of production. These instances can be periodically shut down to reduce cost.

Example: Stop EC2 instances in development environment at 7pm on Friday, Start deployment of EC2 instances at 6am on Monday

Sample Lights on/Lights off Policy: Turns off development environment over the weekend.

Step 8 of 8

Identify Older Generation Instances

Upgrade older generation instances to the latest generation for reduced costs and improved performance.

Example: Identify legacy AWS Instances (T1, M1, C1, CC2, M2, CR1, CG1, HI1, HS1) and notify owner so they can upgrade.

Identify Older Generation Instance Policy: Looks for M1 and C1 instance types with more than 1 hour of runtime in a given month, sends JIRA notification.