Logging and monitoring are techniques implemented to achieve a common goal. They work together to help ensure that a system's performance baselines and security guidelines are always met.
Logging refers to recording and storing data events as log files. Logs contain low-level details that can give you visibility into how your application or system performs under certain circumstances. From a security standpoint, logging helps security administrators identify red flags that are easily overlooked in their system.
Monitoring is the process of analyzing and collecting data to help ensure optimal performance. Monitoring helps detect unauthorized access and helps align your services' usage with organizational security.
In this project, I created an Amazon CloudWatch alarm that initiates when the Amazon Elastic Compute Cloud (Amazon EC2) instance exceeds a specific central processing unit (CPU) utilization threshold. I created a subscription using Amazon Simple Notification Service (Amazon SNS) that sends an email to me if this alarm goes off. I logged in to the EC2 instance and ran a stress test command that caused the CPU utilization of the EC2 instance to reach 100 percent.
This test simulated a malicious actor gaining control of the EC2 instance and spiking the CPU. CPU spiking has various possible causes, one of which is malware.
In this task, I created an SNS topic and then subscribed to it with my email address.
Amazon SNS is a fully managed messaging service for both application-to-application (A2A) and application-to-person (A2P) communication.
At this point, my SNS topic was now able to send alerts to the email address that I associated with the Amazon SNS subscription.
In this task, I viewed some metrics and logs stored within CloudWatch. I then created a CloudWatch alarm to initiate and send an email to my SNS topic if the Stress Test EC2 instance increased to more than 60 percent CPU utilization.
CloudWatch is a monitoring and observability service built for DevOps engineers, developers, site reliability engineers (SREs), IT managers, and product owners. CloudWatch provides you with data and actionable insights to monitor your applications, respond to system-wide performance changes, and optimize resource utilization. CloudWatch collects monitoring and operational data in the form of logs, metrics, and events. You get a unified view of operational health and gain visibility of your AWS resources, applications, and services running on AWS and on premises.
CloudWatch usually takes 5-10 minutes after the creation of an EC2 instance to start fetching metric details.
This option displayed the graph for the CPU utilization metric, which was approximately 0 because nothing had been done yet.
I now created a metric alarm. A metric alarm watches a single CloudWatch metric or the result of a math expression based on CloudWatch metrics. The alarm performs one or more actions based on the value of the metric or expression relative to a threshold over a number of time periods. The action then sends a notification to the SNS topic that I created earlier.
At this point, I had viewed some Amazon EC2 metrics within CloudWatch and created a CloudWatch alarm that initiates an In alarm state when the CPU utilization threshold exceeds 60 percent.
In this task, I logged in to the Stress Test EC2 instance and ran a command that stressed the CPU load to 100 percent. This increase in CPU utilization activated the CloudWatch alarm, which caused Amazon SNS to send an email notification to the email address associated with the SNS topic.
This link connected me to the Stress Test EC2 instance.
sudo stress --cpu 10 -v --timeout 400s
This command runs for 400 seconds, loads the CPU to 100 percent, and then decreases the CPU to 0 percent after the allotted time.
top
This command showed the live CPU usage.
It took a few minutes for the alarm status to change to In alarm and for an email to be sent.
On the graph, I could see where CPUUtilization had increased above the 60 percent threshold.
At this point, I had run a command to load the EC2 instance to 100 percent for 400 seconds. This increase in CPU utilization activated the alarm to go into the In alarm state, and I confirmed the spike in the CPU utilization by viewing the CloudWatch graph. I also received an email notification alerting me of the In alarm state.
In this task, I created a CloudWatch dashboard using the same CPUUtilization metrics that I had used throughout this project.
CloudWatch dashboards are customizable home pages in the CloudWatch console that you can use to monitor your resources in a single view. With CloudWatch dashboards, you can even monitor resources that are spread across different Regions. You can use CloudWatch dashboards to create customized views of the metrics and alarms for your AWS resources.
Now I had created a quick access shortcut to view the CPUUtilization metric for the Stress Test instance.
In this project, I created a CloudWatch alarm that activated when the Stress Test instance exceeded a specific CPU utilization threshold. I created a subscription using Amazon SNS that sent an email to me if this alarm went off. I logged in to the EC2 instance and ran a stress test command that spiked the EC2 instance to 100 percent CPU utilization.
This test simulated what could happen if a malicious actor were to gain control of an EC2 instance and spike CPU utilization. CPU spiking has various possible causes, one of which is malware.