Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Semgrep: Static Analysis and Custom Rules

For .NET engineers who know: Roslyn Analyzers, DiagnosticAnalyzer, diagnostic descriptors, code fix providers, and the process of publishing custom analyzer NuGet packages You’ll learn: How Semgrep provides the same capability — pattern-based static analysis with custom rules — without requiring you to compile anything, using YAML rules that match AST patterns across TypeScript, React, and Node.js code Time: 15-20 min read

The .NET Way (What You Already Know)

Roslyn Analyzers are the extensibility point for custom static analysis in C#. You write a class that implements DiagnosticAnalyzer, register the syntax node types you want to inspect, walk the syntax tree using Roslyn’s symbol model, and emit Diagnostic instances when your rule fires. The analyzer ships as a NuGet package that gets loaded into the compiler. When a violation occurs, the IDE shows a squiggly line; in CI, the build fails with a diagnostic error.

Writing a custom Roslyn Analyzer is powerful and precise, but has real friction. You need a separate C# project, you work against Roslyn’s symbol API (which has a learning curve), and distributing the analyzer requires a NuGet package. For enforcing team conventions (“always use our Result<T> type instead of raw exceptions in service methods”), the overhead often outweighs the benefit.

// A minimal Roslyn Analyzer — considerable boilerplate for a simple rule
[DiagnosticAnalyzer(LanguageNames.CSharp)]
public class NoRawStringConnectionStringAnalyzer : DiagnosticAnalyzer
{
    private static readonly DiagnosticDescriptor Rule = new(
        id: "SV001",
        title: "Connection string should not be a raw string literal",
        messageFormat: "Use IConfiguration to read the connection string instead of hardcoding it",
        category: "Security",
        defaultSeverity: DiagnosticSeverity.Error,
        isEnabledByDefault: true
    );

    public override ImmutableArray<DiagnosticDescriptor> SupportedDiagnostics =>
        ImmutableArray.Create(Rule);

    public override void Initialize(AnalysisContext context)
    {
        context.RegisterSyntaxNodeAction(
            AnalyzeStringLiteral,
            SyntaxKind.StringLiteralExpression
        );
    }

    private static void AnalyzeStringLiteral(SyntaxNodeAnalysisContext context)
    {
        var literal = (LiteralExpressionSyntax)context.Node;
        var value = literal.Token.ValueText;
        if (value.Contains("Server=") || value.Contains("Data Source="))
        {
            context.ReportDiagnostic(Diagnostic.Create(Rule, literal.GetLocation()));
        }
    }
}

That is a lot of infrastructure for “don’t hardcode connection strings.” Semgrep lets you write the equivalent rule in eight lines of YAML.

The Semgrep Way

Semgrep is a static analysis tool that matches code patterns using a syntax-aware AST matching engine. Instead of walking an AST in code, you write patterns that look like the code you want to match — with metavariables ($X, $FUNC) as wildceholders. The engine handles the language parsing.

The key mental model shift: Semgrep rules are data (YAML), not code. They are readable, version-controlled alongside your application, and require no compilation or packaging.

Installing Semgrep

# Install via pip (the primary distribution channel)
pip3 install semgrep

# Or via Homebrew on macOS
brew install semgrep

# Verify installation
semgrep --version

# No authentication needed for open-source rules or local rules
# For Semgrep's managed service (semgrep.dev), authenticate:
semgrep login

Running Your First Scan

# Run Semgrep against the current directory using community rules
# The auto ruleset selects rules appropriate for the detected languages
semgrep --config=auto .

# Run a specific ruleset from the registry
semgrep --config=p/typescript .
semgrep --config=p/react .
semgrep --config=p/nodejs .
semgrep --config=p/owasp-top-ten .

# Run your own rules file
semgrep --config=.semgrep/rules.yml .

# Run all rules in a directory
semgrep --config=.semgrep/ .

# Fail with exit code 1 if any findings exist (for CI)
semgrep --config=.semgrep/ --error .

Rule Structure

A Semgrep rule has three required fields: an id, a pattern (or pattern combination), and a message. Everything else is optional metadata.

# .semgrep/rules.yml
rules:
  - id: no-hardcoded-connection-string
    patterns:
      - pattern: |
          "Server=$X;..."
      - pattern: |
          `Server=${...}...`
    message: >
      Hardcoded connection string detected. Read database credentials from
      environment variables via process.env or the config service.
    languages: [typescript, javascript]
    severity: ERROR

  - id: no-console-log-in-services
    pattern: console.log(...)
    paths:
      include:
        - 'src/*/services/**'
        - 'src/services/**'
    paths:
      exclude:
        - '**/*.spec.ts'
        - '**/*.test.ts'
    message: >
      Use the injected Logger service instead of console.log in service classes.
      console.log bypasses structured logging and log level configuration.
    languages: [typescript]
    severity: WARNING
    fix: |
      this.logger.log(...)

Pattern Syntax

Semgrep’s pattern language is designed to look like the code it matches. The key constructs:

SyntaxMeaningExample
$XMetavariable — matches any expression$X.password
$...ARGSSpread metavariable — matches zero or more argsfn($...ARGS)
...Ellipsis — matches zero or more statementstry { ... } catch { ... }
patternSingle pattern to matchDirect match
patternsAll patterns must match (AND)Multiple conditions
pattern-eitherAny pattern must match (OR)Alternatives
pattern-notExclude matches of this patternNegation
pattern-insideMatch only inside this contextScope restriction
pattern-not-insideMatch only outside this contextScope exclusion
metavariable-regexConstrain a metavariable to a regex$VAR where name matches pattern
rules:
  # Matches any method call where the argument contains user input going into a
  # raw SQL string — the AND combinator (patterns:)
  - id: no-raw-sql-with-user-input
    patterns:
      - pattern: $DB.query($QUERY)
      - pattern-either:
        - pattern: $QUERY = `${$INPUT}...`
        - pattern: $QUERY = $A + $INPUT
      - pattern-not-inside: |
          // nosemgrep
    message: Potential SQL injection — use parameterized queries
    languages: [typescript, javascript]
    severity: ERROR

  # Matches dangerouslySetInnerHTML with any non-sanitized value
  - id: no-dangerous-innerhtml
    pattern: <$EL dangerouslySetInnerHTML={{ __html: $X }} />
    pattern-not: <$EL dangerouslySetInnerHTML={{ __html: DOMPurify.sanitize($X) }} />
    message: >
      dangerouslySetInnerHTML with unsanitized content is an XSS vector.
      Wrap with DOMPurify.sanitize() or use a safer alternative.
    languages: [typescript, tsx]
    severity: ERROR

  # Matches JWT verification without algorithm specification
  - id: jwt-verify-without-algorithm
    pattern: jwt.verify($TOKEN, $SECRET)
    pattern-not: jwt.verify($TOKEN, $SECRET, { algorithms: [...] })
    message: >
      JWT verification without specifying algorithms allows algorithm confusion
      attacks. Explicitly specify: { algorithms: ['HS256'] }
    languages: [typescript, javascript]
    severity: ERROR

Community Rules for TypeScript and Node.js

Semgrep’s registry at semgrep.dev has hundreds of vetted rules for the JS/TS ecosystem. The most useful rulesets for our stack:

# TypeScript-specific rules (type assertion abuse, unused vars, etc.)
semgrep --config=p/typescript .

# React security and best practices
semgrep --config=p/react .

# Node.js security (command injection, path traversal, etc.)
semgrep --config=p/nodejs .

# Express.js specific patterns
semgrep --config=p/express .

# JWT security (the rules from the Gotchas section, from the registry)
semgrep --config=p/jwt .

# OWASP Top 10 mapped rules
semgrep --config=p/owasp-top-ten .

# Supply chain / secrets (complements Snyk)
semgrep --config=p/secrets .

Review the rules in a ruleset before adopting them. Each rule has a page on semgrep.dev with examples of what it matches and the rationale. Treat the registry as a starting point, not a complete solution — our custom rules enforce conventions the community rulesets cannot know about.

Writing Custom Rules for Team Conventions

This is where Semgrep earns its keep. Roslyn Analyzers enforce C# conventions that are too expensive to review manually in PRs. The same applies here.

Convention: Always use the injected LoggerService, never console.log

- id: use-logger-service-not-console
  pattern-either:
    - pattern: console.log($...ARGS)
    - pattern: console.warn($...ARGS)
    - pattern: console.error($...ARGS)
    - pattern: console.debug($...ARGS)
  paths:
    include:
      - 'src/**'
    exclude:
      - '**/*.spec.ts'
      - '**/*.test.ts'
      - 'src/instrument.ts'   # Sentry init before logger is available
  message: >
    Use the NestJS Logger service instead of console.*. Inject Logger with
    `private readonly logger = new Logger(MyService.name)` and use
    this.logger.log(), this.logger.warn(), this.logger.error().
  languages: [typescript]
  severity: WARNING

Convention: Never use any type assertion except in test files

- id: no-any-type-assertion
  pattern-either:
    - pattern: $X as any
    - pattern: <any>$X
  paths:
    include:
      - 'src/**'
    exclude:
      - '**/*.spec.ts'
      - '**/*.test.ts'
  message: >
    Avoid `as any` — it disables TypeScript's type safety at this call site.
    Use a specific type, `as unknown`, or a type guard instead.
  languages: [typescript]
  severity: WARNING

Convention: Zod must validate external input at API boundaries

- id: missing-zod-validation-on-request-body
  patterns:
    - pattern: |
        @Post(...)
        async $METHOD(@Body() $DTO: $TYPE) {
          ...
        }
    - pattern-not: |
        @Post(...)
        async $METHOD(@Body() $DTO: $TYPE) {
          const $PARSED = $SCHEMA.parse($DTO);
          ...
        }
  message: >
    POST handler receives a body parameter without Zod schema validation.
    Add `const parsed = CreateXxxSchema.parse(dto)` before using the body,
    or apply the ZodValidationPipe globally.
  languages: [typescript]
  severity: WARNING

Convention: Never call process.exit() in application code

- id: no-process-exit-in-application-code
  pattern: process.exit($CODE)
  paths:
    include:
      - 'src/**'
    exclude:
      - 'src/main.ts'   # Acceptable only at the top-level bootstrap
  message: >
    Do not call process.exit() in application code — it prevents NestJS from
    running shutdown hooks and can leave database connections open.
    Throw an exception and let the global exception filter handle it.
  languages: [typescript]
  severity: ERROR

Ignoring False Positives

Add an inline comment to suppress a specific rule at a specific line:

// nosemgrep: no-any-type-assertion
const response = await fetch(url) as any;

// Or suppress all rules at this line:
// nosemgrep
const legacyData = JSON.parse(raw) as any;

For file-level suppression:

// nosemgrep: no-console-log-in-services
// Reason: This file is a Sentry initialization shim that runs before
// the NestJS Logger is available.

False positives in rule design should be addressed in the rule itself, not suppressed at call sites. If you are suppressing the same rule frequently, the rule is too broad.

CI Integration

# .github/workflows/semgrep.yml
name: Semgrep

on:
  push:
    branches: [main, develop]
  pull_request:
    branches: [main, develop]

jobs:
  semgrep:
    name: Static Analysis
    runs-on: ubuntu-latest
    container:
      image: returntocorp/semgrep

    steps:
      - uses: actions/checkout@v4

      - name: Run Semgrep with community and custom rules
        run: |
          semgrep \
            --config=p/typescript \
            --config=p/nodejs \
            --config=p/react \
            --config=.semgrep/ \
            --error \          # Exit 1 on findings (blocks PR merge)
            --sarif \          # SARIF format for GitHub integration
            --output=semgrep.sarif \
            src/

      - name: Upload SARIF to GitHub Security
        uses: github/codeql-action/upload-sarif@v3
        if: always()
        with:
          sarif_file: semgrep.sarif

Uploading SARIF to GitHub Security means findings appear in the Security → Code Scanning tab and as inline annotations on the diff in PRs, which is the Semgrep equivalent of SonarCloud’s PR decoration.

Performance Considerations

Semgrep is fast on small codebases but can be slow on large ones because it analyzes every file. Tune performance with:

# Skip files that do not need analysis
semgrep --exclude='**/*.min.js' --exclude='**/vendor/**'

# Run only rules relevant to changed files (in CI — requires git diff)
semgrep --config=.semgrep/ $(git diff --name-only origin/main...HEAD | grep '\.ts$')

# Limit rule set to only high-confidence rules for PR checks
# Run the full set in scheduled jobs, not on every push

For large monorepos, split the Semgrep job to run in parallel by package directory.

Key Differences

ConcernRoslyn AnalyzersSemgrepNotes
Rule languageC# code (DiagnosticAnalyzer)YAML pattern rulesSemgrep is dramatically simpler to write
Compilation requiredYes — analyzer is a .NET assemblyNo — YAML is interpretedSemgrep rules can be edited and tested immediately
Pattern matchingRoslyn syntax/symbol APIAST-aware pattern languageDifferent API, similar power
DistributionNuGet packageYAML files in repositorySemgrep rules are version-controlled in your repo
Language supportC# / VB.NET only30+ languagesSame tool scans TS, Python, Go, etc.
IDE integrationNative VS/Rider integrationVS Code extension, JetBrains pluginRoslyn is more deeply integrated
Rule sharingNuGet packageSemgrep registry (semgrep.dev)Different distribution models
Fix suggestionsCode fix providers (CodeFixProvider)fix: field in rule YAMLSemgrep fixes are simpler but functional
Community rulesBuilt into .NET analyzersSemgrep registryBoth have mature community rulesets
CostFree (OSS)Free CLI, paid for team managementFree tier is sufficient for most teams

Gotchas for .NET Engineers

Gotcha 1: Pattern Matching Is Syntactic, Not Semantic

Roslyn Analyzers work at the semantic level. You can query the symbol table, resolve types, check if a method’s return type implements an interface, and traverse the full call graph. This lets you write rules like “warn if a method that returns IDisposable is not disposed in all control flow paths.”

Semgrep works at the syntax/AST level. It matches code patterns, not semantic relationships. This means you cannot ask “is this variable of type X?” or “does this method ever reach a code path that calls Y?” — at least not without Semgrep’s taint analysis mode (Pro tier).

Design your rules around what you can see syntactically:

# This works — pattern matching on syntax
- id: no-setTimeout-in-angular-or-react-components
  pattern: setTimeout($CALLBACK, $DELAY)
  paths:
    include:
      - 'src/components/**'
      - 'src/pages/**'
  message: Use useEffect cleanup or NgZone.runOutsideAngular instead

# This does NOT work — requires semantic knowledge of types
# (Semgrep cannot determine whether $SERVICE is an HttpClient instance)
- id: no-http-in-component  # INVALID — Semgrep cannot resolve this
  pattern: $SERVICE.get($URL)
  # Would need to know that $SERVICE is of type HttpService

Gotcha 2: YAML Indentation Errors Produce Confusing Failures

Semgrep YAML rules are indentation-sensitive. A misaligned pattern-not or an either block with inconsistent indentation will either silently do the wrong thing or fail with a cryptic parse error.

Always validate rules before running them in CI:

# Validate rule syntax
semgrep --validate --config=.semgrep/rules.yml

# Test rules against fixture files (the right way to develop rules)
# See: semgrep --test

Use Semgrep’s Playground at semgrep.dev/playground to write and test rules interactively before committing them. It shows you exactly what code each pattern matches, which is faster than the edit-run-check cycle locally.

Gotcha 3: Community Rules Have False Positives — Review Before Adopting

The Semgrep registry has excellent rules, but “excellent” in the context of a community ruleset means “works for most teams most of the time.” Some rules will fire on patterns in your codebase that are deliberate and correct. Running semgrep --config=p/nodejs --error on an existing project without reviewing each rule first will cause spurious CI failures.

The correct workflow:

  1. Run semgrep --config=p/nodejs (without --error) and review all findings
  2. For each finding: decide if it is a real issue or a false positive
  3. For real issues: fix them
  4. For false positives: either suppress inline with // nosemgrep or exclude the rule from your configuration
  5. Once clean, add --error to the CI invocation
# Run a ruleset and output JSON for systematic review
semgrep --config=p/nodejs --json > findings.json

# Review the rules that fired (to exclude the noisy ones)
cat findings.json | jq '[.results[].check_id] | unique'

Gotcha 4: Semgrep Does Not Replace ESLint or TypeScript Compiler Checks

Semgrep, ESLint, and the TypeScript compiler check different things. They are complementary, not redundant:

  • TypeScript compiler: Type errors, structural type violations, undefined variables
  • ESLint: Style conventions, common antipatterns, React hooks rules, accessibility
  • Semgrep: Security patterns, team-specific conventions, cross-file enforcement

The common mistake is trying to enforce style rules (import ordering, naming conventions) with Semgrep when ESLint has better support for those. Semgrep shines for rules that ESLint cannot express — multi-file patterns, complex condition combinations, and security-specific patterns where the rule needs to be opaque to the developer (to prevent cargo-culting around it).

Gotcha 5: Taint Tracking Requires Pro Tier for Full Effectiveness

Semgrep’s free tier has excellent pattern matching. Its taint analysis — tracing untrusted input through the code to an unsafe sink — requires the Pro (paid) tier. Taint analysis is what lets Semgrep catch:

// User input from the request body flows to a shell command without sanitization
const filename = req.body.filename;
exec(`ls ${filename}`);  // Command injection — taint flows from req.body to exec

In the free tier, you can write a pattern that catches exec( + template literal, but you cannot trace whether the template variable contains user input. For the most impactful security rules, taint analysis is worth the cost. For convention enforcement, the free tier is sufficient.

Hands-On Exercise

Write and deploy three custom Semgrep rules for your project’s conventions.

  1. Create a .semgrep/ directory at your repository root and add a rules.yml file.

  2. Write a rule that prevents console.log in service files (under src/*/services/ or similar) and allows it in test files. Test it against your codebase with semgrep --config=.semgrep/rules.yml src/.

  3. Write a rule that catches missing await on async function calls where the result is discarded. Hint: look at the Semgrep pattern for pattern: $X($...ARGS) combined with context about async calls.

  4. Write a rule that enforces that all NestJS route handlers using @Body() include a validation step. (See the example in the article for the starting point.)

  5. Add the Semgrep GitHub Actions workflow. Run it on a PR branch and verify findings appear as annotations on the diff.

  6. Browse the Semgrep registry at semgrep.dev/r and find three rules from p/nodejs or p/typescript that apply to your codebase. Add them to your CI config.

  7. Introduce a deliberate violation of one of your custom rules (add a console.log to a service), push to a branch, open a PR, and verify the CI job fails with a clear message referencing the rule.

Quick Reference

TaskCommand
Run with community rulessemgrep --config=auto .
Run specific rulesetsemgrep --config=p/typescript .
Run custom rulessemgrep --config=.semgrep/ .
Fail CI on findingssemgrep --config=.semgrep/ --error .
Output SARIFsemgrep --sarif --output=results.sarif .
Validate rule syntaxsemgrep --validate --config=.semgrep/rules.yml
Test rules against fixturessemgrep --test .semgrep/
Suppress at a line// nosemgrep: rule-id
Suppress all rules at line// nosemgrep

Minimal Rule Template

rules:
  - id: rule-id-in-kebab-case
    pattern: |
      the_pattern_to_match(...)
    message: >
      What is wrong and how to fix it. One paragraph.
    languages: [typescript]
    severity: ERROR   # ERROR | WARNING | INFO
    paths:
      include:
        - 'src/**'
      exclude:
        - '**/*.spec.ts'
        - '**/*.test.ts'

Pattern Combination Reference

# AND — all must match
patterns:
  - pattern: A
  - pattern-not: B

# OR — any must match
pattern-either:
  - pattern: A
  - pattern: B

# Context restriction
patterns:
  - pattern: dangerous_call()
  - pattern-inside: |
      function $FUNC(...) { ... }

Severity Levels and CI Behavior

SeverityExit CodePR Impact
ERROR1 (with --error)Blocks merge
WARNING0Annotation only
INFO0Annotation only

Further Reading