How to Fix High Latency Issues in AWS EC2 Instances (Step-by-Step Guide)

High latency in your AWS EC2 instance can kill application performance, frustrate users, and cost you money. Here’s how to diagnose it, trace it to the root cause, and fix it — layer by layer.

Why EC2 Latency Problems Are Rarely Simple

I once spent an afternoon convinced a network issue was causing latency on an EC2 instance. Turned out it was a misconfigured EBS volume hitting its IOPS limit. The application was fine — the storage was choking it.

That experience taught me one rule: never assume you know where EC2 latency is coming from until you actually measure it. The causes span CPU, memory, storage, network, region selection, and application code. Getting it wrong wastes hours.

This guide walks you through the full diagnostic and fix process, step by step, so you find the real problem — not the one that looks most obvious.

Understanding the Main Causes of High Latency in EC2

Before jumping into fixes, it helps to understand the five main places latency can originate in an EC2 setup:

CPU and memory saturation — When your instance runs out of compute resources, response times spike. This is especially common on T2/T3 burstable instances that exhaust their CPU credit balance.

Storage bottlenecks (EBS) — Underprovisioned IOPS or throughput on EBS volumes causes disk I/O queuing. Every read and write operation waits in line, and your application feels every millisecond.

Network throttling — EC2 instances have defined network bandwidth limits. When your workload exceeds those limits — even briefly — packets queue up or drop, directly increasing latency.

Wrong region or Availability Zone — If your users are in Europe and your instance runs in us-east-1, the physical distance alone adds 80–120ms of round-trip latency. The same issue applies to inter-service communication across regions or AZs.

Application and database issues — Slow database queries, unoptimized code, and missing caching layers create latency that no amount of infrastructure tuning will fix.

You may be dealing with one cause or several at once. The steps below help you isolate each one systematically.

Step 1: Measure and Locate the Latency with CloudWatch

You can’t fix what you can’t measure. The first step is always to establish baselines and identify exactly where the slowdown is occurring.

Set up CloudWatch monitoring on your EC2 instance:

Open the AWS Management Console and navigate to CloudWatch
Go to Metrics → All Metrics and enter your EC2 instance ID
Under EC2 → Per-Instance Metrics, select the following metrics to monitor:
- CPUUtilization — overall CPU usage
- NetworkIn and NetworkOut — traffic volume in and out
- DiskReadOps and DiskWriteOps — disk I/O operations
- StatusCheckFailed — overall instance health

For EBS volumes specifically, check:

VolumeReadLatency and VolumeWriteLatency
VolumeQueueLength — a high queue length means I/O is backing up
VolumeIOPSExceededCheck — a value of 1 means your workload is demanding more IOPS than provisioned
EBSIOBalance% and EBSByteBalance% — consistently low values indicate IOPS throttling

Enable detailed (1-minute) monitoring:

By default, CloudWatch collects EC2 metrics at 5-minute intervals. For latency troubleshooting, switch to 1-minute detail. Go to your EC2 instance in the console → Monitoring → Enable Detailed Monitoring.

For even finer granularity, install the CloudWatch Agent to get metrics at 1-second intervals — particularly useful for catching burst events that 1-minute averages miss entirely.

Test end-to-end latency with curl:

SSH into your instance and run this command to break down request timing:

bash

curl -kso /dev/null -w "\n===============\n\
| DNS lookup:    %{time_namelookup}\n\
| Connect:       %{time_connect}\n\
| App connect:   %{time_appconnect}\n\
| Pre-transfer:  %{time_pretransfer}\n\
| Start transfer:%{time_starttransfer}\n\
| Total:         %{time_total}\n\
| HTTP Code:     %{http_code}\n\
===============\n" https://yourdomain.com/

This breaks total latency into individual stages — DNS resolution, TCP connection, TLS handshake, and server response. The stage with the highest number is where your problem lives.

Step 2: Fix CPU and Memory Bottlenecks

If CPUUtilization is consistently above 80–90%, or if you’re on a T2/T3 burstable instance and performance degrades under load, CPU is likely contributing to your latency.

Check for CPU Credit Exhaustion on Burstable Instances

T2, T3, and T4g instances run on a CPU credit system. When you’re idle, you earn credits. When you burst, you spend them. Once credits are exhausted, your CPU is throttled back to the baseline — often as low as 10–20% of a vCPU. For a T3.micro, the baseline is just 10% CPU.

To check: In CloudWatch, look at the CPUCreditBalance metric for your instance. If it’s approaching zero during your high-latency periods, you’ve identified the problem.

Fixes for CPU credit exhaustion:

Enable Unlimited mode on T3/T4g instances. This allows sustained bursting without throttling, at an extra charge of $0.05 per vCPU-hour above baseline. Go to EC2 → Instances → Actions → Instance Settings → Change Credit Specification → Unlimited.
Right-size to a compute-optimized instance. If your workload consistently needs more CPU than burstable instances provide, switch to a C5 or C6i instance, which offers dedicated CPU performance with no credit system.
Add more instances and use a load balancer to distribute CPU load across multiple nodes.

Check for Memory Pressure

CloudWatch doesn’t report memory utilization by default. You need the CloudWatch Agent or SSH access to check it directly.

SSH into your instance and run:

bash

free -m

Or install the CloudWatch Agent and configure it to report mem_used_percent. If memory usage is consistently above 85%, the OS is likely swapping to disk — which massively increases latency.

Fixes for memory pressure:

Upgrade to a larger instance type with more RAM
Optimize your application’s memory usage
Add swap space as a temporary measure, but treat it as a signal to right-size

Step 3: Fix EBS Storage Latency

EBS storage is one of the most common and underdiagnosed sources of latency on EC2 instances, particularly for database workloads.

Identify EBS Throttling

In CloudWatch, check:

VolumeIOPSExceededCheck — if this metric shows 1, your application is requesting more IOPS than your volume is provisioned for
VolumeQueueLength — a value above 1 consistently means I/O is queuing
EBSIOBalance% — if this is consistently low (under 20%), you’re burning through your IOPS burst balance

Fixes for EBS Latency

Increase provisioned IOPS:

For gp3 volumes, the default is 3,000 IOPS — but you can provision up to 16,000 IOPS independently of storage size. If you’re on an older gp2 volume, migrating to gp3 gives you more IOPS at lower cost.

For the highest IOPS performance on database-heavy workloads, switch to io2 (Provisioned IOPS SSD) volumes, which support up to 64,000 IOPS per volume and offer sub-millisecond latency.

To modify your volume: Go to EC2 → Volumes → Modify Volume and adjust the volume type and IOPS without stopping your instance.

Enable EBS Optimization on your instance:

EBS-optimized instances provide a dedicated network connection between the instance and EBS, eliminating contention with regular network traffic. Most modern instance types (C5, M5, R5 and newer) are EBS-optimized by default. For older instance types, you can enable it in the instance settings.

Handle snapshot-related latency:

New EBS volumes created from snapshots experience higher latency on first access to each block, as data is loaded from Amazon S3 in the background. If you create volumes from snapshots frequently, enable EBS Fast Snapshot Restore (FSR) to pre-warm the volume and eliminate initialization latency.

Step 4: Enable Enhanced Networking with ENA

If your network metrics show high packet-per-second (PPS) rates or you’re seeing latency spikes that correlate with network traffic bursts, your instance may need Enhanced Networking enabled.

What is Enhanced Networking?

AWS EC2 instances support the Elastic Network Adapter (ENA), a custom network driver that delivers up to 100 Gbps throughput, significantly reduced jitter, and lower latency compared to older virtual network drivers. All Nitro-based instances use ENA automatically. Older instance types may need it enabled manually.

Check if ENA is enabled:

SSH into your Linux instance and run:

bash

modinfo ena

Or check the ENA driver version:

bash

ethtool -i eth0 | grep driver

If the driver shows vif instead of ena, your instance is using the older virtual interface and will benefit from upgrading.

To enable ENA on an existing instance:

Stop the instance (not terminate — just stop)
Run the following AWS CLI command:

bash

   aws ec2 modify-instance-attribute --instance-id i-xxxxxxxxxxxxxxxxx --ena-support

Start the instance

Modern instance families (C5, M5, R5, C6i, M6i, and all Graviton instances) have ENA enabled by default and don’t need this step.

Consider ENA Express for ultra-low latency:

For latency-sensitive workloads communicating between instances in the same subnet, ENA Express (available on Nitro-based instances) reduces tail latencies further by using SRD (Scalable Reliable Datagram) transport. Enable it through the network interface settings in the EC2 console.

Step 5: Use Placement Groups for Instance-to-Instance Latency

If your latency problem is between EC2 instances — for example, between your application server and your database instance — rather than between your instance and end users, placement groups can make a dramatic difference.

Cluster Placement Groups place instances as physically close to each other as possible within a single Availability Zone. Instances in the same cluster placement group get:

The lowest possible inter-instance latency
Higher per-flow bandwidth limits for TCP/IP traffic
Up to 10 Gbps bandwidth between instances

How to create a Cluster Placement Group:

In the EC2 console, go to Network & Security → Placement Groups
Click Create placement group
Give it a name, select Cluster as the strategy, and click Create
When launching new instances, select this placement group under Network settings

Note: Cluster placement groups work within a single AZ. For high-availability across multiple AZs, use Spread Placement Groups instead, which trade some latency reduction for fault isolation.

Step 6: Choose the Right AWS Region and Availability Zone

Physical distance between your EC2 instance and your users adds unavoidable latency — roughly 1ms per 100km of fiber path, plus network hop overhead. If your instance is in us-east-1 and your primary users are in Southeast Asia, you’re adding 200–300ms of inherent latency that no configuration change will eliminate.

Diagnose region-related latency:

Use the AWS Latency Test or run ping commands from your target user geography to each AWS region endpoint to find the fastest one for your user base.

Solutions:

Deploy to the nearest AWS region to your users. For global audiences, deploy to multiple regions.
Use Amazon CloudFront to cache static assets and content at AWS edge locations worldwide. CloudFront has over 600 edge locations — requests hit the nearest one and get served locally rather than routing all the way back to your EC2 instance.
Use AWS Global Accelerator for dynamic content and APIs. It routes traffic from users to the nearest AWS edge location, then uses AWS’s private global network (not the public internet) to reach your EC2 instance. This reduces latency and jitter by up to 60% for internet-based connections.
Consider AWS Local Zones for workloads requiring single-digit millisecond latency for specific metro areas.

Step 7: Optimize Your Application and Database Layer

Infrastructure changes only go so far. If the latency source is inside your application — slow queries, blocking I/O, inefficient algorithms — no amount of EC2 tuning will fully resolve it.

Database query optimization:

Identify slow queries using database logs or AWS CloudWatch Application Signals
Add indexes to frequently queried columns — an unindexed full-table scan on a large table can add hundreds of milliseconds
Use Amazon ElastiCache (Redis or Memcached) to cache frequently accessed query results and reduce repeated database hits

Application-level fixes:

Use asynchronous processing for non-blocking I/O — don’t let one slow operation hold up the rest of the request
Enable connection pooling for your database connections — opening a new connection for every request is expensive
Move batch processing and background jobs to AWS Lambda or dedicated worker instances, freeing your EC2 instance’s resources for user-facing requests

Diagnose at the application level with AWS X-Ray:

AWS X-Ray traces requests across your entire application stack — EC2, Lambda, databases, and downstream services — and shows you exactly where time is being spent. For complex architectures, it’s the fastest way to find application-level bottlenecks without manual profiling.

Step 8: Check and Optimize Your VPC Network Configuration

Network configuration inside your VPC can silently add latency that’s easy to miss.

Check your Security Groups and NACLs:

Overly complex Security Group rules with many entries, or Network ACLs with a large number of rules evaluated in sequence, add processing overhead to every packet. Audit and simplify your rules where possible — keep the rules list short and specific.

Minimize cross-AZ traffic:

Data transfer between Availability Zones within the same region travels over physical network links, adding 1–5ms of latency and incurring data transfer costs. Keep tightly coupled components (your app servers and database) in the same AZ for latency-sensitive operations.

Run MTR to diagnose packet loss on the path:

MTR (My Traceroute) combines traceroute and ping to show you latency and packet loss at each hop between two endpoints.

Run from your EC2 instance:

bash

sudo apt-get install mtr   # Ubuntu/Debian
sudo yum install mtr       # Amazon Linux
mtr -n --report your-target-ip

Look for any hop where latency spikes significantly or where packet loss appears. Compare results from both directions (client to instance, instance to client) to isolate whether the issue is on the ingress or egress path.

Step 9: Set Up CloudWatch Alarms for Proactive Monitoring

Once you’ve fixed the immediate issue, put alarms in place so you’re notified before users notice the next problem.

Recommended CloudWatch alarms for EC2 latency:

CPUCreditBalance — Alert when it drops below 20 for T-series instances
CPUUtilization — Alert when above 80% for 5 consecutive minutes
VolumeAvgReadLatency / VolumeAvgWriteLatency — Alert when I/O latency exceeds your acceptable threshold
VolumeIOPSExceededCheck — Alert when value reaches 1 (IOPS exceeded)
NetworkIn / NetworkOut — Alert when approaching your instance’s bandwidth limit

To create an alarm:

In CloudWatch, go to Alarms → All Alarms → Create Alarm
Select the metric and instance
Set the threshold, period, and evaluation conditions
Add an SNS notification to alert your team via email or Slack

Proactive alerting turns latency from a reactive firefight into a manageable operational metric.

Quick Diagnostic Reference

Symptom	Most Likely Cause	First Step to Check
Latency spikes during peak traffic	CPU saturation or credit exhaustion	CloudWatch CPUUtilization + CPUCreditBalance
Slow disk reads/writes	EBS IOPS throttling	VolumeIOPSExceededCheck, VolumeQueueLength
High base latency from users	Wrong region or no CDN	Test with ping from user location
Latency between instances	No placement group, cross-AZ	Enable Cluster Placement Group
Intermittent spikes without pattern	Network throttling, PPS limits	ENA driver, CloudWatch NetworkPacketsDrop
Slow queries, high CPU + low traffic	Application/DB bottleneck	AWS X-Ray, database slow query log
Latency after volume restore	Snapshot initialization	Enable EBS Fast Snapshot Restore

FAQ: Fixing High Latency in AWS EC2

What is the most common cause of high latency in EC2?

The most common causes are CPU credit exhaustion on burstable T-series instances, EBS volume IOPS throttling, and instances deployed in a region far from the end users. Start by checking CloudWatch metrics for CPU utilization and EBS queue length before looking elsewhere.

How do I know if my EC2 instance’s EBS volume is the latency source?

Check the VolumeQueueLength, VolumeIOPSExceededCheck, and VolumeAvgReadLatency / VolumeAvgWriteLatency metrics in CloudWatch. A queue length above 1 or an exceeded IOPS check value of 1 confirms that storage is the bottleneck.

What’s the difference between Enhanced Networking and a Placement Group?

Enhanced Networking (ENA) improves the network driver on your instance for higher throughput and lower latency to any destination. A Placement Group physically places instances close together in AWS infrastructure for the lowest possible latency between those specific instances. For maximum inter-instance performance, use both together.

My T3 instance has very high latency under load but normal CPU utilization — what’s happening?

This is a CPU credit exhaustion scenario. Even when CloudWatch shows a low CPUUtilization percentage, the instance may be throttled if its credit balance is depleted. Check the CPUCreditBalance metric specifically. If it’s near zero, switch to Unlimited mode or right-size to a fixed-performance instance type like M5 or C5.

Can CloudFront reduce EC2 latency for API responses?

CloudFront caches static content and can accelerate some dynamic content, but it’s not a substitute for placing your EC2 instance in the right region. For dynamic, non-cacheable API responses, AWS Global Accelerator is the better tool — it routes requests over AWS’s private backbone rather than the public internet.

How long does it take to see improvement after enabling ENA or changing EBS volume type?

ENA changes take effect immediately after the instance restarts. EBS volume type and IOPS modifications are live and don’t require a reboot — changes typically apply within minutes. If you migrated from gp2 to gp3 with increased IOPS, you’ll see the improvement almost immediately in your CloudWatch metrics.

Do I need to stop my instance to change the EBS volume type or IOPS?

No. EBS volume modifications (including volume type, IOPS, and throughput) can be applied while the instance is running, with no downtime. The change may take a few minutes to complete. Go to EC2 → Volumes → select your volume → Modify Volume.

Editor’s Opinion

honestly ec2 latency troubleshooting is one of those things that looks overwhelming at first but once you know the layers it becomes quite methodical. for me the biggest surprise was how often it turns out to be ebs iops rather than cpu or network – i think because its the least visible thing. the cloudwatch alarms tip is one i wish i’d set up earlier, would have saved me a lot of “why is everything slow” moments. if ur on a t3 instance check your cpu credit balance first, seriously, its almost always that.