Debugging Production Issues: The Node.js Playbook
For .NET engineers who know: Attaching the Visual Studio debugger to a running process, capturing memory dumps with ProcDump or dotMemory, reading Application Insights live metrics, and using structured logs in Azure Monitor You’ll learn: How production debugging works in a Node.js stack — a fundamentally different model from the attach-and-step approach you’re used to Time: 15-20 min read
The .NET Way (What You Already Know)
In .NET production debugging, you have several overlapping tools. When something goes wrong, you can attach the Visual Studio debugger to the running IIS or Kestrel process, set breakpoints, and inspect live state — often without restarting. For deeper issues, you capture a memory dump with ProcDump, pull it into WinDbg or Visual Studio, and examine heap objects, thread stacks, and GC roots. Application Insights gives you distributed traces, dependency call timing, live metrics, and a query language (KQL) for slicing logs by request, exception type, or custom dimension.
The core assumption in .NET production debugging is that you can reconstruct program state after the fact. A memory dump is a snapshot of the entire heap. A call stack tells you exactly which thread was doing what. The CLR’s type system survives into the dump — you can ask WinDbg “show me every Order object on the heap” and it works.
That assumption does not transfer to Node.js.
The Node.js Way
Why the Debugging Model Is Different
Node.js is a single-threaded runtime executing on the V8 engine. There is no concept of “attaching to the process and pausing execution” in production — doing so would block all request handling for every user. There are no native memory dump utilities comparable to ProcDump. There is no built-in equivalent to Application Insights that ships with the runtime.
What Node.js production debugging gives you instead:
- Structured logs — your primary source of truth for what happened
- Error tracking with stack traces — Sentry captures exceptions with source-mapped stacks
- Heap snapshots — V8’s heap profiler, accessed via
--inspector programmatic APIs - CPU profiles — V8’s sampling profiler for identifying hot code paths
- Process metrics — memory usage, event loop lag, handle counts
The mental shift: in .NET, you debug by reconstructing state. In Node.js, you debug by reading what the code logged before it broke. This is closer to how you approach distributed systems debugging — trace IDs, log correlation, structured events — than traditional debugger usage.
Sentry: Your Application Insights Equivalent
Sentry is the primary error tracking tool. Install it early; retrofitting it later is painful.
pnpm add @sentry/node @sentry/profiling-node
// src/instrument.ts — load this FIRST, before any other imports
import * as Sentry from '@sentry/node';
import { nodeProfilingIntegration } from '@sentry/profiling-node';
Sentry.init({
dsn: process.env.SENTRY_DSN,
environment: process.env.NODE_ENV,
release: process.env.RENDER_GIT_COMMIT, // Render sets this automatically
integrations: [
nodeProfilingIntegration(),
],
tracesSampleRate: process.env.NODE_ENV === 'production' ? 0.1 : 1.0,
profilesSampleRate: 1.0,
});
// src/main.ts — instrument.ts must be the very first import
import './instrument';
import { NestFactory } from '@nestjs/core';
import { AppModule } from './app.module';
async function bootstrap() {
const app = await NestFactory.create(AppModule);
await app.listen(process.env.PORT ?? 3000);
}
bootstrap();
In NestJS, add the Sentry exception filter to capture unhandled errors:
// src/filters/sentry-exception.filter.ts
import { Catch, ArgumentsHost, HttpException } from '@nestjs/common';
import { BaseExceptionFilter } from '@nestjs/core';
import * as Sentry from '@sentry/node';
@Catch()
export class SentryExceptionFilter extends BaseExceptionFilter {
catch(exception: unknown, host: ArgumentsHost) {
// Only capture unexpected errors; don't report 4xx HTTP exceptions
if (!(exception instanceof HttpException)) {
Sentry.captureException(exception);
}
super.catch(exception, host);
}
}
// src/main.ts
const { httpAdapter } = app.get(HttpAdapterHost);
app.useGlobalFilters(new SentryExceptionFilter(httpAdapter));
Sentry Error Triage Workflow
When an alert fires or a user reports an error:
-
Open the Sentry issue. The issue page groups all occurrences of the same error by stack trace fingerprint — equivalent to Application Insights’ “exceptions by problem ID.”
-
Read the breadcrumbs. Sentry automatically records HTTP requests, console logs, and database queries in the 60 seconds before the error. This is your timeline of what the process was doing.
-
Check the source-mapped stack trace. If source maps are configured correctly, you see TypeScript line numbers, not compiled JavaScript. If you see minified code (
at e (index.js:1:4823)), source maps are not uploaded — fix this before debugging further. -
Check the user context and tags. Sentry shows which user triggered the error, what request they made, and what environment variables your code attached. Set meaningful context early:
// In your auth middleware or guard
Sentry.setUser({ id: user.id, email: user.email });
Sentry.setTag('tenant', user.tenantId);
-
Check for similar events. The “Events” tab shows every occurrence. Look for patterns: does it happen only for specific users, specific inputs, or specific times of day?
-
Mark it as “In Progress” and assign it before you start fixing it.
Source Maps in Production
Without source maps, Sentry shows minified stack traces that are unreadable. There are two approaches.
Option A: Upload source maps during CI/CD (recommended)
# Install the Sentry CLI
pnpm add -D @sentry/cli
# In your CI/CD pipeline, after building:
npx sentry-cli releases new "$RENDER_GIT_COMMIT"
npx sentry-cli releases files "$RENDER_GIT_COMMIT" upload-sourcemaps ./dist
npx sentry-cli releases finalize "$RENDER_GIT_COMMIT"
Add this to your GitHub Actions workflow, not your production startup. Source maps contain your full source code — never expose them to the public.
Option B: Sentry Vite/webpack plugin (build-time upload)
// vite.config.ts (for Vite-based frontends)
import { sentryVitePlugin } from '@sentry/vite-plugin';
export default defineConfig({
plugins: [
sentryVitePlugin({
authToken: process.env.SENTRY_AUTH_TOKEN,
org: 'your-org',
project: 'your-project',
}),
],
build: {
sourcemap: true, // required
},
});
For NestJS (built with tsc or webpack), configure source maps in tsconfig.json:
{
"compilerOptions": {
"sourceMap": true,
"inlineSources": true
}
}
Reading Node.js Stack Traces
A Node.js stack trace reads bottom-to-top, same as CLR stack traces. The error type and message appear first, then frames from innermost to outermost:
TypeError: Cannot read properties of undefined (reading 'id')
at OrderService.getOrderTotal (order.service.ts:47:24)
at OrdersController.getTotal (orders.controller.ts:23:38)
at Layer.handle [as handle_request] (express/lib/router/layer.js:95:5)
at next (express/lib/router/route.js:137:13)
...
Frame format: at [function] ([file]:[line]:[column]). The frames without recognizable file names (express/lib/router/layer.js) are framework internals — equivalent to ASP.NET Core’s middleware pipeline frames that you learn to skip past.
The key frames are the first ones with your application code. In the example above: order.service.ts:47 is where the error originated. orders.controller.ts:23 is what called it. Everything below those two frames is the framework plumbing.
Async stack traces are the Node.js-specific challenge. In synchronous .NET code, the call stack is linear. In async Node.js code, awaited calls create new microtask queue entries, which can lose stack context. Node.js 12+ preserves async stack traces via --async-context, but the output can be verbose:
at async OrderService.getOrderTotal (order.service.ts:47:24)
at async OrdersController.getTotal (orders.controller.ts:23:38)
If you see a stack trace that terminates at the async boundary and doesn’t show what called the async function, the caller did not await properly — a classic Node.js footgun.
Structured Log Analysis
Use a structured logger from the start. pino is the standard in NestJS projects.
pnpm add pino pino-pretty nestjs-pino
// src/app.module.ts
import { LoggerModule } from 'nestjs-pino';
@Module({
imports: [
LoggerModule.forRoot({
pinoHttp: {
level: process.env.NODE_ENV === 'production' ? 'info' : 'debug',
transport: process.env.NODE_ENV !== 'production'
? { target: 'pino-pretty' }
: undefined, // In production, output raw JSON for log aggregators
serializers: {
req: (req) => ({ method: req.method, url: req.url, id: req.id }),
res: (res) => ({ statusCode: res.statusCode }),
},
},
}),
],
})
export class AppModule {}
Production logs output as JSON, one object per line. Each log entry has at minimum:
level— numeric level (10=trace, 20=debug, 30=info, 40=warn, 50=error, 60=fatal)time— Unix timestamp in millisecondsmsg— the log messagepid— process ID- Any custom fields you added
When triaging an incident, correlate logs using request IDs:
// Log meaningful context alongside the message
this.logger.log({
msg: 'Order created',
orderId: order.id,
userId: user.id,
durationMs: Date.now() - startTime,
});
Render Log Viewer
Render’s log viewer (your hosting platform) provides a web interface for live and historical logs. Key features:
- Live tail — equivalent to
kubectl logs -for Azure’s Log Stream - Filter by text — basic grep-style filtering; for complex queries, export to your log aggregator
- Log retention — Render retains logs for 7 days on free plans, longer on paid plans
For production incidents, use the Render log viewer to:
- Confirm the error is occurring (not a Sentry reporting issue)
- Correlate timestamps with deployment events
- Check for patterns before the error: unusual traffic volume, repeated retry storms
For serious analysis beyond basic text search, pipe logs to a proper aggregator. Render supports log drains (forwarding to Datadog, Papertrail, etc.). For simpler setups, the Render CLI can tail and export logs:
render logs --service YOUR_SERVICE_ID --tail
Memory Leak Detection
The symptoms: Node.js process memory grows steadily over hours or days. The application becomes slower, then starts throwing out-of-memory errors, or the hosting platform restarts it.
The cause: In Node.js, the most common memory leaks are:
- Closures holding references to large objects longer than expected
- Event listeners added but never removed
- Caches with no eviction policy
- Database connection pools not being released
- Global state accumulating data over time
Detection step 1: Monitor memory trend
Node.js exposes process.memoryUsage():
// Scheduled health check or metrics endpoint
const usage = process.memoryUsage();
this.logger.log({
msg: 'Memory usage',
heapUsed: Math.round(usage.heapUsed / 1024 / 1024) + 'MB',
heapTotal: Math.round(usage.heapTotal / 1024 / 1024) + 'MB',
rss: Math.round(usage.rss / 1024 / 1024) + 'MB',
external: Math.round(usage.external / 1024 / 1024) + 'MB',
});
If heapUsed grows linearly over time without plateauing, you have a leak.
Detection step 2: Heap snapshot in staging
Never run --inspect in production (it opens a debugging port, is a security risk, and has performance overhead). Use staging:
# Start Node.js with the inspector open
node --inspect dist/main.js
# Or with NestJS:
pnpm start:debug
Open Chrome DevTools (chrome://inspect), connect to the Node.js process, go to the Memory tab, and take heap snapshots. Compare two snapshots taken minutes apart. The “Comparison” view shows which object types grew between snapshots — the same workflow as dotMemory’s “Object Sets” comparison.
Detection step 3: Programmatic heap snapshot
import { writeHeapSnapshot } from 'v8';
import { existsSync, mkdirSync } from 'fs';
// Trigger via HTTP endpoint (only in staging, never in production)
@Get('debug/heap-snapshot')
async takeHeapSnapshot() {
if (process.env.NODE_ENV === 'production') {
throw new ForbiddenException();
}
const dir = '/tmp/heapdumps';
if (!existsSync(dir)) mkdirSync(dir);
const filename = writeHeapSnapshot(dir);
return { filename };
}
Open the .heapsnapshot file in Chrome DevTools Memory tab.
CPU Profiling
If the application is slow without obvious memory growth, CPU profiling identifies hot code paths. The approach is similar to dotTrace’s sampling profiler.
import { Session } from 'inspector';
import { writeFileSync } from 'fs';
// Staging-only endpoint
@Get('debug/cpu-profile')
async takeCpuProfile(@Query('durationMs') durationMs = '5000') {
if (process.env.NODE_ENV === 'production') {
throw new ForbiddenException();
}
const session = new Session();
session.connect();
await new Promise<void>((resolve) => {
session.post('Profiler.enable', () => {
session.post('Profiler.start', () => {
setTimeout(() => {
session.post('Profiler.stop', (err, { profile }) => {
writeFileSync('/tmp/cpu-profile.cpuprofile', JSON.stringify(profile));
resolve();
});
}, parseInt(durationMs));
});
});
});
return { filename: '/tmp/cpu-profile.cpuprofile' };
}
Load the .cpuprofile file in Chrome DevTools Performance tab. The flame chart shows which functions consumed the most CPU time. Functions wider in the flame chart are hotter.
Common Node.js Production Issues
Event loop blocking
This is the Node.js-specific issue with no direct .NET equivalent. If your code runs a synchronous operation that takes more than a few milliseconds, all other requests queue behind it. Common culprits:
JSON.parse()on very large payloads (megabytes)- Synchronous crypto operations
fs.readFileSync()in request handlers- Complex regular expressions on large strings (ReDoS)
- Tight loops over large arrays
Detect event loop lag:
import { monitorEventLoopDelay } from 'perf_hooks';
const histogram = monitorEventLoopDelay({ resolution: 10 });
histogram.enable();
setInterval(() => {
const lagMs = histogram.mean / 1e6; // Convert nanoseconds to milliseconds
if (lagMs > 100) {
this.logger.warn({ msg: 'Event loop lag detected', lagMs });
}
histogram.reset();
}, 10_000);
Unhandled promise rejections
In Node.js, an unhandled promise rejection that is not caught will crash the process in Node.js 15+ or silently succeed in older versions. Both behaviors are bad.
// In main.ts — catch what everything else misses
process.on('unhandledRejection', (reason, promise) => {
logger.error({ msg: 'Unhandled promise rejection', reason });
Sentry.captureException(reason);
// Don't crash immediately — give Sentry time to flush
setTimeout(() => process.exit(1), 1000);
});
process.on('uncaughtException', (error) => {
logger.fatal({ msg: 'Uncaught exception', error });
Sentry.captureException(error);
setTimeout(() => process.exit(1), 1000);
});
Memory leaks from closures
// Leak: each request registers an event listener that holds a closure
// referencing the response object — it is never removed
app.get('/events', (req, res) => {
const handler = (data) => res.write(data); // closure over res
emitter.on('data', handler);
// Bug: emitter.off('data', handler) is never called when req closes
});
// Fix: clean up listeners when the connection closes
app.get('/events', (req, res) => {
const handler = (data) => res.write(data);
emitter.on('data', handler);
req.on('close', () => emitter.off('data', handler)); // cleanup
});
Connection pool exhaustion
Prisma, pg, and most database clients maintain a connection pool. If queries hold connections longer than the timeout, new queries queue indefinitely.
// Check pool status in health endpoint
@Get('health')
async health() {
const result = await this.prisma.$queryRaw`SELECT 1 as alive`;
return { status: 'ok', db: result };
}
If the health check times out but the process is otherwise running, suspect pool exhaustion. Check for long-running transactions or missing await on Prisma calls (a query without await starts but is never awaited, holding a connection until GC).
Key Differences
| .NET Approach | Node.js Approach |
|---|---|
| Attach debugger to live process | Read logs and Sentry; use staging for interactive debug |
| Memory dump + WinDbg/dotMemory | Heap snapshot via V8 inspector in staging |
| Thread-per-request model | Single-threaded; event loop lag is the unique failure mode |
| Application Insights built-in | Sentry + structured logs (pino) assembled manually |
| CLR type information in dumps | Heap snapshots show object shapes but not TypeScript types |
| Thread stacks show concurrent work | Single thread; “concurrent” issues are async sequencing bugs |
ConfigureAwait(false) matters | No thread switching; all code runs on the event loop thread |
| Background threads for CPU work | CPU work blocks the event loop; offload to worker threads or external processes |
Gotchas for .NET Engineers
Gotcha 1: Source maps must be explicitly configured or Sentry is useless.
In .NET, Application Insights records stack traces against your PDB symbols automatically. In Node.js, Sentry receives minified JavaScript stack traces. Until you configure source map upload in your CI/CD pipeline, every Sentry error will show at e (index.js:1:4823). Set this up before your first deployment, not after the first production incident.
Gotcha 2: --inspect in production is a security hole.
The Node.js --inspect flag opens a WebSocket debugging port. If that port is exposed — even accidentally — an attacker can execute arbitrary code in your process. Never pass --inspect to production Node.js. Use it only locally or in isolated staging environments with network-level access controls. If you see --inspect in a production Dockerfile or a hosting platform’s start command, treat it as a critical security finding.
Gotcha 3: Synchronous operations in async code block every user.
A .NET developer’s instinct is that synchronous work only blocks the current thread. In Node.js, synchronous work blocks the event loop, which is the only thread handling all requests. JSON.parse() on a 10MB payload, fs.readFileSync() in a route handler, or a CPU-intensive loop in a service method will stall every in-flight request until it completes. The fix is to either move the work to a Worker thread (Node.js built-in) or break it into smaller async chunks.
Gotcha 4: Unhandled promise rejections are silent data loss in older code.
In pre-Node.js 15 codebases, a rejected promise without a .catch() handler silently discards the error. If you see code like someAsyncFunction() without await or .catch(), and that code can reject, you have silent failures. Always await async calls in request handlers. Always add process.on('unhandledRejection') as a safety net.
Gotcha 5: Memory profiling requires staging access, not production.
In .NET, you can often take a memory dump from a production server (pausing one process on a multi-instance deployment). In Node.js on Render, you do not have direct process access. Plan your debugging workflow before you need it: have a staging environment with --inspect accessible, have a process for reproducing production traffic in staging, and have heap snapshot endpoints (protected) deployed in staging at all times.
Hands-On Exercise
Set up the full observability stack for a NestJS application and trigger a real error through each layer.
Part 1: Sentry integration
- Create a free Sentry account at sentry.io, create a Node.js project, and copy the DSN.
- Install
@sentry/nodeand initialize it as the first import inmain.ts. - Add the
SentryExceptionFilteras shown above. - Throw a deliberate error in a route handler:
throw new Error('Test Sentry integration'). - Call the endpoint. Verify the error appears in Sentry with a readable stack trace.
- If the stack trace shows minified code, configure
sourceMap: trueintsconfig.jsonand redeploy.
Part 2: Structured logging
- Install
pinoandnestjs-pino. - Configure
LoggerModulewith JSON output for production and pretty output for development. - Replace all
console.logcalls in one service withthis.logger.log({ msg: '...', ...context }). - Run the application and verify log output is structured JSON.
Part 3: Memory monitoring
- Add a
/healthendpoint that returnsprocess.memoryUsage()formatted as MB. - Add a memory warning log that fires when
heapUsedexceeds 80% ofheapTotal. - Write a test that calls the endpoint repeatedly in a loop and watches for memory growth.
Part 4: Event loop monitoring
- Add
monitorEventLoopDelayfromperf_hooksas shown above. - Write a route that performs a CPU-intensive synchronous operation (e.g., summing 10 million numbers in a loop).
- Call that route while also calling a fast route in parallel. Observe that the fast route is delayed.
- Log the event loop lag before and after the slow call.
Quick Reference
Observability Stack
| Need | Tool | .NET Equivalent |
|---|---|---|
| Error tracking | Sentry | Application Insights Exceptions |
| Structured logs | pino + nestjs-pino | Serilog + Application Insights |
| Log viewer (hosted) | Render log viewer | Azure Log Stream |
| Heap profiling | V8 inspector (Chrome DevTools) | dotMemory / WinDbg |
| CPU profiling | V8 profiler (.cpuprofile) | dotTrace |
| Event loop lag | perf_hooks.monitorEventLoopDelay | No direct equivalent |
| Process metrics | process.memoryUsage() | Azure Monitor / Kusto |
Incident Response Runbook
1. CHECK SENTRY
- Is there a new error group with rising occurrences?
- Is the stack trace readable (source maps working)?
- What are the breadcrumbs showing?
2. CHECK RENDER LOGS
- Is the process restarting? (look for process exit logs)
- Is there a burst of errors at a specific timestamp?
- Does the error correlate with a deployment?
3. CHECK MEMORY
GET /health -> heapUsed
If growing: take heap snapshot in staging, compare object counts
4. CHECK EVENT LOOP
- Is response latency elevated across ALL endpoints? (event loop blocking)
- Is it only specific endpoints? (slow query, external API)
5. ROLLBACK IF NEEDED
Render dashboard -> Deploys -> Rollback to previous deploy
This takes ~60 seconds; do it immediately if a deploy caused the issue
6. DOCUMENT
- What was the error?
- What was the root cause?
- What was the fix?
- Add to runbook if it could recur
Common Production Issues and Fixes
| Symptom | Likely Cause | Investigation |
|---|---|---|
| All endpoints slow simultaneously | Event loop blocking | Check event loop lag metric; look for sync operations in hot paths |
| Memory grows without bound | Closure leak / listener leak | Heap snapshot comparison in staging |
| Process crashes with OOM | Memory leak reached limit | Heap snapshot before crash; check Render restart logs |
UnhandledPromiseRejection in logs | Missing await somewhere | Search codebase for async calls without await; add global handler |
| 500 errors on specific endpoint | Uncaught exception in handler | Check Sentry for that endpoint; verify error handling |
| DB queries timing out | Pool exhaustion | Check for missing await on Prisma; look for long transactions |
| Sentry shows minified traces | Source maps not uploaded | Configure source map upload in CI; verify sourceMap: true in tsconfig |
Key Commands
# Run with inspector open (staging only)
node --inspect dist/main.js
# Take a heap snapshot to a file programmatically
node -e "const v8 = require('v8'); v8.writeHeapSnapshot();"
# Check Render logs via CLI
render logs --service SERVICE_ID --tail
# Check memory usage of running process
kill -USR2 <PID> # triggers V8 heap dump if configured
Sentry Configuration Checklist
-
SENTRY_DSNset in environment variables -
instrument.tsimported before all other imports inmain.ts -
SentryExceptionFilterregistered globally - Source maps configured (
sourceMap: true,inlineSources: truein tsconfig) - Source maps uploaded in CI/CD pipeline (not at runtime)
-
releaseset to git commit SHA for version tracking -
environmentset toproduction/staging/development -
tracesSampleRatereduced in production (0.05–0.1 for high-traffic services)
Further Reading
- Sentry Node.js SDK Documentation — initialization, filtering, performance
- Node.js Diagnostics Guide — official guide to heap snapshots, CPU profiles, and flame graphs
- Clinic.js — automated Node.js performance diagnosis tool; useful for identifying event loop issues
- pino Documentation — structured logging for Node.js
- V8 Inspector Protocol — the protocol underlying Chrome DevTools’ Node.js debugging