How to Fix Memory Leaks in Node.js Applications Under High Traffic

If you’ve watched your Node.js process climb steadily in memory usage on a dashboard until it OOM-kills itself right at peak traffic, you already know this isn’t a theoretical problem. I spent a genuinely frustrating week chasing a memory leak Node.js apps under high traffic kept hitting in production, and the fix wasn’t where I expected it to be at all. So let’s get into what actually causes this and which fixes are worth your time versus which ones just feel productive.

And before anyone asks — no, just adding --max-old-space-size is not a fix, it’s a slightly bigger bucket that still overflows eventually.

Quick Answer

If you need the short version first:

Most “memory leaks” under high traffic are actually unbounded caches, growing closures, or event listener accumulation — not classic leaks in the C++ sense.
--max-old-space-size only delays the crash; it doesn’t fix anything.
Heap snapshots taken under load (not idle) are the only reliable diagnostic — comparing two snapshots minutes apart under traffic.
The most common real culprit, from what I’ve seen, is something holding references in a closure or array that never gets cleared — often inside request middleware.
Restarting on a schedule (PM2 cron restart) is a common band-aid that works fine for low-stakes apps and badly for anything with long-lived connections or in-memory state.

Why Node.js Memory Leaks Show Up Specifically Under High Traffic

Here’s the thing that trips people up: the leak is usually present at low traffic too. It’s just small enough that garbage collection keeps up and nobody notices. High traffic doesn’t create the leak — it just exposes how bad it already was, because now there’s more allocation pressure and less time between request cycles for GC to catch its breath.

A few specific causes show up over and over:

Event listeners that never get removed. Every time you attach a listener inside a request handler without removing it — req.on('data', ...) inside a loop, or attaching to a shared EventEmitter per-request — you’re accumulating listeners that Node.js will eventually warn you about (the “possible EventEmitter memory leak detected” message) but only after you’re already in trouble. Under low traffic this grows slowly enough to ignore. Under high traffic it compounds fast.

Unbounded in-memory caches. A Map or plain object used as a cache with no eviction policy and no TTL is one of the most common patterns I see in actual production code. It works fine in dev, works fine in staging with synthetic traffic, and then under real traffic it just keeps growing because nobody ever calls .delete() on anything.

Closures capturing more than they need. This one’s sneaky. A function defined inside a request handler that references a large object from its enclosing scope keeps that object alive as long as the closure exists — and if that closure gets stored somewhere (a callback registered for later, a promise that never resolves), the object never gets collected.

Global variables and module-level state that accumulate. Something as simple as pushing to a module-level array for “logging” or “metrics” and forgetting to clear it.

A fifth cause people genuinely don’t expect: Buffer and Stream handling on large request/response bodies. If you’re not properly draining or destroying streams (especially on error paths), buffered data can sit in memory waiting for a callback that’s never coming because the connection already dropped.

Common Scenarios Where This Bites You

This shows up differently depending on what kind of app you’re running.

For Express/Fastify REST APIs, the leak is usually in middleware — something attached globally that accumulates per-request state and never tears it down.

For WebSocket servers (Socket.IO, ws), it’s almost always listeners or connection objects that aren’t cleaned up on disconnect. A client that disconnects ungracefully (mobile network drop, browser tab close) doesn’t always fire a clean close event, and if your cleanup logic only runs on that event, you’ve got orphaned references piling up.

For apps behind a load balancer doing health checks, I’ve seen leaks that only show up because the health check endpoint itself accumulates state — ironic, but it happens, especially if someone added logging or metrics middleware to “every route” without excluding the health check path.

For serverless / Lambda-style Node.js, this is less about classic leaks and more about container reuse — state from one invocation bleeding into the next because something was declared outside the handler function assuming it’d be reset each time. It isn’t, by design.

Memory Leak Causes vs. Fixes

Cause	Typical Symptom	Fix	Notes
Unbounded cache (Map/object)	Steady linear memory growth, never plateaus	Add TTL + max size (LRU)	Most common cause in my experience
Event listener accumulation	“MaxListenersExceededWarning” in logs	Always pair `.on()` with `.off()`/`.removeListener()`	Easy to miss in WebSocket code
Closures holding large scope	Hard to spot in profiler, looks like “everything” is retained	Avoid capturing large objects in long-lived callbacks	Often requires manual code review, not just tooling
Detached DOM-like patterns (timers, intervals)	Memory grows in steps matching interval frequency	Clear timers/intervals on cleanup paths	Forgotten on error/early-return paths constantly

That table’s missing some causes — buffer/stream issues didn’t fit cleanly into a row without overstating the fix — but those four cover most of what I run into.

Step-by-Step Fixes

Step 1: Confirm it’s actually a leak, not just normal GC sawtooth.
Healthy Node.js memory usage looks like a sawtooth — it climbs, GC runs, it drops, repeat. A leak looks like the floor of that sawtooth rising over time, not just the peaks. If you’re only looking at peak memory on a dashboard, you’ll misdiagnose this constantly. Graph the minimum memory point per GC cycle, not the max.

Step 2: Take heap snapshots under actual load, not idle.
Use node --inspect and Chrome DevTools, or heapdump in production (carefully — taking a snapshot pauses the event loop briefly, which under high traffic can cause its own problems). Take one snapshot, generate some load, wait a few minutes, take another. Compare with the “Comparison” view in DevTools — look for object types whose count keeps growing between snapshots, not just large objects in a single snapshot.

Step 3: Check process.memoryUsage() breakdown, not just RSS.
RSS tells you total memory but not where it’s going. Look at heapUsed versus external — a growing external value often points to Buffers or native bindings, not regular JS objects, which sends you down a completely different debugging path.

Step 4: Audit every .on() call against a matching cleanup.
Grep your codebase for .on( and manually check whether there’s a corresponding .off(), .removeListener(), or .once() instead. This is tedious. It’s also one of the highest hit-rate things to check.

Step 5: Add bounds to every cache.
If you’re using a plain object or Map as a cache, replace it with something that has a max size and TTL — lru-cache is the common choice, but the specific library matters less than just having any eviction policy at all.

Step 6: Profile under realistic concurrency, not single-request testing.
A leak that grows by 2KB per request is invisible in manual testing and catastrophic at 500 requests/second. Use something like autocannon or k6 to generate sustained load while you’re watching the heap.

What Actually Worked For Me

So the honest version of this story isn’t clean.

I’d inherited an Express API that was getting OOM-killed roughly every 6-8 hours under production traffic, and my first instinct — like most people’s — was to assume it was a database connection pool issue, since that’s the classic Node.js horror story everyone’s heard about. I spent two days auditing every Sequelize query for unclosed connections. Nothing. Connection pool metrics looked completely normal the entire time.

Then I tried the standard heap snapshot comparison approach, and it pointed at a huge number of retained closures, but the DevTools retainer tree was genuinely confusing to read — it kept showing the leak rooted in something related to our request logging middleware, which made no sense to me at first because the middleware looked completely stateless.

That’s not entirely accurate, actually — let me back up. It looked stateless because the leak wasn’t in the middleware function itself, it was in an array we were pushing request metadata into for a “recent requests” debug endpoint someone added months earlier and then forgot existed. Nobody had ever called .shift() or capped its length. It just grew. Forever. Under low traffic it took weeks to matter. Under the high-traffic launch we’d just had, it took hours.

I found it less through clean systematic debugging and more because a teammate, half-remembering a similar issue from a previous job, said “hey, check if anyone added some kind of debug array somewhere” — and that one offhand comment is what made me grep for .push( across the middleware folder instead of continuing to stare at the retainer tree. Not exactly rigorous. But it worked.

Advanced Fixes and Edge Cases

Using --inspect with production traffic safely. Running the inspector flag in production is generally discouraged for security reasons (it opens a debug port), but if you need to do it temporarily, bind it to localhost only and tunnel via SSH rather than exposing the port. Don’t leave it running longer than you need it.

Native module / addon leaks are a different beast entirely. If external memory in process.memoryUsage() keeps climbing while heapUsed stays flat, you’re probably looking at a native addon (image processing libraries, certain database drivers) not releasing memory on the C++ side. JS heap snapshots won’t show you this at all — you need tools like valgrind or, more practically, just testing with the native dependency removed to isolate it.

Worker threads and cluster mode complicate diagnosis. If you’re running Node.js in cluster mode (PM2 cluster, or cluster module directly), each worker process has its own heap, and a leak that seems “moderate” in aggregate dashboards might be severe in one specific worker that’s getting uneven traffic distribution from your load balancer. Check per-process memory, not just aggregate.

Domain sockets and keep-alive connections holding more than expected. Not commonly mentioned, but Agent keep-alive pooling (both in http module defaults and in libraries like axios) can retain more open sockets than you’d expect under sustained high concurrency, and each one holds buffers. Check agent.sockets and agent.freeSockets if you’re making a lot of outbound requests per incoming request.

Prevention Tips

Cap every in-memory cache or buffer-like structure with a max size, no exceptions, even for “temporary debug” features — those are exactly the ones that get forgotten.
Set up alerting on the floor of memory usage trending upward, not just on absolute thresholds.
Run load tests with heap snapshot comparisons as part of pre-release testing, not just after a production incident forces you to.
Be skeptical of any code that attaches event listeners inside a function that runs per-request — always trace whether there’s a matching removal.
PM2’s scheduled restart is a real mitigation for genuinely unfixable third-party leaks, but treat it as a stopgap, not a strategy — it masks the problem and can cause its own issues if you’ve got long-lived WebSocket connections that get dropped mid-restart.

Frequently Asked Questions

Does increasing --max-old-space-size fix a memory leak?
No. It just gives the leak more room before crashing, which usually means you crash less often but harder, and at a worse time.

Will switching to a different framework (Fastify instead of Express) fix this?
Not on its own. The leak is almost always in application code or a specific dependency, not the framework’s core handling.

Why does my staging environment never show this, only production?
Traffic volume and duration. Most leaks need either real concurrency or real uptime (days, not minutes) to become visible, and synthetic staging traffic rarely matches either.

Is this a V8 garbage collector problem?
Almost never. V8’s GC does what it’s supposed to do — it can only collect objects that aren’t referenced anymore. If something’s still referenced (even accidentally), GC is working correctly by not collecting it. The bug is in the code holding the reference, not in V8.

Should I just restart the process automatically when memory gets high?
It works as a stopgap. It does not work as a fix, and your mileage may vary depending on how much in-memory state you’d lose on restart — for stateless APIs it’s mostly fine, for anything with active WebSocket sessions it’s disruptive.

Editor’s Opinion

ok so the real lesson here, for me anyway, is that “memory leak” almost never means what it sounds like in Node.js. it’s not some mysterious C++ pointer issue, it’s usually just an array or cache someone forgot existed. check the boring stuff first — debug endpoints, logging middleware, anything someone added “temporarily.” that’s where it usually is. heap snapshots are great but don’t skip the dumb grep-for-push step first, it’s faster.