Semgrep: Static Analysis and Custom Rules
For .NET engineers who know: Roslyn Analyzers,
DiagnosticAnalyzer, diagnostic descriptors, code fix providers, and the process of publishing custom analyzer NuGet packages You’ll learn: How Semgrep provides the same capability — pattern-based static analysis with custom rules — without requiring you to compile anything, using YAML rules that match AST patterns across TypeScript, React, and Node.js code Time: 15-20 min read
The .NET Way (What You Already Know)
Roslyn Analyzers are the extensibility point for custom static analysis in C#. You write a class that implements DiagnosticAnalyzer, register the syntax node types you want to inspect, walk the syntax tree using Roslyn’s symbol model, and emit Diagnostic instances when your rule fires. The analyzer ships as a NuGet package that gets loaded into the compiler. When a violation occurs, the IDE shows a squiggly line; in CI, the build fails with a diagnostic error.
Writing a custom Roslyn Analyzer is powerful and precise, but has real friction. You need a separate C# project, you work against Roslyn’s symbol API (which has a learning curve), and distributing the analyzer requires a NuGet package. For enforcing team conventions (“always use our Result<T> type instead of raw exceptions in service methods”), the overhead often outweighs the benefit.
// A minimal Roslyn Analyzer — considerable boilerplate for a simple rule
[DiagnosticAnalyzer(LanguageNames.CSharp)]
public class NoRawStringConnectionStringAnalyzer : DiagnosticAnalyzer
{
private static readonly DiagnosticDescriptor Rule = new(
id: "SV001",
title: "Connection string should not be a raw string literal",
messageFormat: "Use IConfiguration to read the connection string instead of hardcoding it",
category: "Security",
defaultSeverity: DiagnosticSeverity.Error,
isEnabledByDefault: true
);
public override ImmutableArray<DiagnosticDescriptor> SupportedDiagnostics =>
ImmutableArray.Create(Rule);
public override void Initialize(AnalysisContext context)
{
context.RegisterSyntaxNodeAction(
AnalyzeStringLiteral,
SyntaxKind.StringLiteralExpression
);
}
private static void AnalyzeStringLiteral(SyntaxNodeAnalysisContext context)
{
var literal = (LiteralExpressionSyntax)context.Node;
var value = literal.Token.ValueText;
if (value.Contains("Server=") || value.Contains("Data Source="))
{
context.ReportDiagnostic(Diagnostic.Create(Rule, literal.GetLocation()));
}
}
}
That is a lot of infrastructure for “don’t hardcode connection strings.” Semgrep lets you write the equivalent rule in eight lines of YAML.
The Semgrep Way
Semgrep is a static analysis tool that matches code patterns using a syntax-aware AST matching engine. Instead of walking an AST in code, you write patterns that look like the code you want to match — with metavariables ($X, $FUNC) as wildceholders. The engine handles the language parsing.
The key mental model shift: Semgrep rules are data (YAML), not code. They are readable, version-controlled alongside your application, and require no compilation or packaging.
Installing Semgrep
# Install via pip (the primary distribution channel)
pip3 install semgrep
# Or via Homebrew on macOS
brew install semgrep
# Verify installation
semgrep --version
# No authentication needed for open-source rules or local rules
# For Semgrep's managed service (semgrep.dev), authenticate:
semgrep login
Running Your First Scan
# Run Semgrep against the current directory using community rules
# The auto ruleset selects rules appropriate for the detected languages
semgrep --config=auto .
# Run a specific ruleset from the registry
semgrep --config=p/typescript .
semgrep --config=p/react .
semgrep --config=p/nodejs .
semgrep --config=p/owasp-top-ten .
# Run your own rules file
semgrep --config=.semgrep/rules.yml .
# Run all rules in a directory
semgrep --config=.semgrep/ .
# Fail with exit code 1 if any findings exist (for CI)
semgrep --config=.semgrep/ --error .
Rule Structure
A Semgrep rule has three required fields: an id, a pattern (or pattern combination), and a message. Everything else is optional metadata.
# .semgrep/rules.yml
rules:
- id: no-hardcoded-connection-string
patterns:
- pattern: |
"Server=$X;..."
- pattern: |
`Server=${...}...`
message: >
Hardcoded connection string detected. Read database credentials from
environment variables via process.env or the config service.
languages: [typescript, javascript]
severity: ERROR
- id: no-console-log-in-services
pattern: console.log(...)
paths:
include:
- 'src/*/services/**'
- 'src/services/**'
paths:
exclude:
- '**/*.spec.ts'
- '**/*.test.ts'
message: >
Use the injected Logger service instead of console.log in service classes.
console.log bypasses structured logging and log level configuration.
languages: [typescript]
severity: WARNING
fix: |
this.logger.log(...)
Pattern Syntax
Semgrep’s pattern language is designed to look like the code it matches. The key constructs:
| Syntax | Meaning | Example |
|---|---|---|
$X | Metavariable — matches any expression | $X.password |
$...ARGS | Spread metavariable — matches zero or more args | fn($...ARGS) |
... | Ellipsis — matches zero or more statements | try { ... } catch { ... } |
pattern | Single pattern to match | Direct match |
patterns | All patterns must match (AND) | Multiple conditions |
pattern-either | Any pattern must match (OR) | Alternatives |
pattern-not | Exclude matches of this pattern | Negation |
pattern-inside | Match only inside this context | Scope restriction |
pattern-not-inside | Match only outside this context | Scope exclusion |
metavariable-regex | Constrain a metavariable to a regex | $VAR where name matches pattern |
rules:
# Matches any method call where the argument contains user input going into a
# raw SQL string — the AND combinator (patterns:)
- id: no-raw-sql-with-user-input
patterns:
- pattern: $DB.query($QUERY)
- pattern-either:
- pattern: $QUERY = `${$INPUT}...`
- pattern: $QUERY = $A + $INPUT
- pattern-not-inside: |
// nosemgrep
message: Potential SQL injection — use parameterized queries
languages: [typescript, javascript]
severity: ERROR
# Matches dangerouslySetInnerHTML with any non-sanitized value
- id: no-dangerous-innerhtml
pattern: <$EL dangerouslySetInnerHTML={{ __html: $X }} />
pattern-not: <$EL dangerouslySetInnerHTML={{ __html: DOMPurify.sanitize($X) }} />
message: >
dangerouslySetInnerHTML with unsanitized content is an XSS vector.
Wrap with DOMPurify.sanitize() or use a safer alternative.
languages: [typescript, tsx]
severity: ERROR
# Matches JWT verification without algorithm specification
- id: jwt-verify-without-algorithm
pattern: jwt.verify($TOKEN, $SECRET)
pattern-not: jwt.verify($TOKEN, $SECRET, { algorithms: [...] })
message: >
JWT verification without specifying algorithms allows algorithm confusion
attacks. Explicitly specify: { algorithms: ['HS256'] }
languages: [typescript, javascript]
severity: ERROR
Community Rules for TypeScript and Node.js
Semgrep’s registry at semgrep.dev has hundreds of vetted rules for the JS/TS ecosystem. The most useful rulesets for our stack:
# TypeScript-specific rules (type assertion abuse, unused vars, etc.)
semgrep --config=p/typescript .
# React security and best practices
semgrep --config=p/react .
# Node.js security (command injection, path traversal, etc.)
semgrep --config=p/nodejs .
# Express.js specific patterns
semgrep --config=p/express .
# JWT security (the rules from the Gotchas section, from the registry)
semgrep --config=p/jwt .
# OWASP Top 10 mapped rules
semgrep --config=p/owasp-top-ten .
# Supply chain / secrets (complements Snyk)
semgrep --config=p/secrets .
Review the rules in a ruleset before adopting them. Each rule has a page on semgrep.dev with examples of what it matches and the rationale. Treat the registry as a starting point, not a complete solution — our custom rules enforce conventions the community rulesets cannot know about.
Writing Custom Rules for Team Conventions
This is where Semgrep earns its keep. Roslyn Analyzers enforce C# conventions that are too expensive to review manually in PRs. The same applies here.
Convention: Always use the injected LoggerService, never console.log
- id: use-logger-service-not-console
pattern-either:
- pattern: console.log($...ARGS)
- pattern: console.warn($...ARGS)
- pattern: console.error($...ARGS)
- pattern: console.debug($...ARGS)
paths:
include:
- 'src/**'
exclude:
- '**/*.spec.ts'
- '**/*.test.ts'
- 'src/instrument.ts' # Sentry init before logger is available
message: >
Use the NestJS Logger service instead of console.*. Inject Logger with
`private readonly logger = new Logger(MyService.name)` and use
this.logger.log(), this.logger.warn(), this.logger.error().
languages: [typescript]
severity: WARNING
Convention: Never use any type assertion except in test files
- id: no-any-type-assertion
pattern-either:
- pattern: $X as any
- pattern: <any>$X
paths:
include:
- 'src/**'
exclude:
- '**/*.spec.ts'
- '**/*.test.ts'
message: >
Avoid `as any` — it disables TypeScript's type safety at this call site.
Use a specific type, `as unknown`, or a type guard instead.
languages: [typescript]
severity: WARNING
Convention: Zod must validate external input at API boundaries
- id: missing-zod-validation-on-request-body
patterns:
- pattern: |
@Post(...)
async $METHOD(@Body() $DTO: $TYPE) {
...
}
- pattern-not: |
@Post(...)
async $METHOD(@Body() $DTO: $TYPE) {
const $PARSED = $SCHEMA.parse($DTO);
...
}
message: >
POST handler receives a body parameter without Zod schema validation.
Add `const parsed = CreateXxxSchema.parse(dto)` before using the body,
or apply the ZodValidationPipe globally.
languages: [typescript]
severity: WARNING
Convention: Never call process.exit() in application code
- id: no-process-exit-in-application-code
pattern: process.exit($CODE)
paths:
include:
- 'src/**'
exclude:
- 'src/main.ts' # Acceptable only at the top-level bootstrap
message: >
Do not call process.exit() in application code — it prevents NestJS from
running shutdown hooks and can leave database connections open.
Throw an exception and let the global exception filter handle it.
languages: [typescript]
severity: ERROR
Ignoring False Positives
Add an inline comment to suppress a specific rule at a specific line:
// nosemgrep: no-any-type-assertion
const response = await fetch(url) as any;
// Or suppress all rules at this line:
// nosemgrep
const legacyData = JSON.parse(raw) as any;
For file-level suppression:
// nosemgrep: no-console-log-in-services
// Reason: This file is a Sentry initialization shim that runs before
// the NestJS Logger is available.
False positives in rule design should be addressed in the rule itself, not suppressed at call sites. If you are suppressing the same rule frequently, the rule is too broad.
CI Integration
# .github/workflows/semgrep.yml
name: Semgrep
on:
push:
branches: [main, develop]
pull_request:
branches: [main, develop]
jobs:
semgrep:
name: Static Analysis
runs-on: ubuntu-latest
container:
image: returntocorp/semgrep
steps:
- uses: actions/checkout@v4
- name: Run Semgrep with community and custom rules
run: |
semgrep \
--config=p/typescript \
--config=p/nodejs \
--config=p/react \
--config=.semgrep/ \
--error \ # Exit 1 on findings (blocks PR merge)
--sarif \ # SARIF format for GitHub integration
--output=semgrep.sarif \
src/
- name: Upload SARIF to GitHub Security
uses: github/codeql-action/upload-sarif@v3
if: always()
with:
sarif_file: semgrep.sarif
Uploading SARIF to GitHub Security means findings appear in the Security → Code Scanning tab and as inline annotations on the diff in PRs, which is the Semgrep equivalent of SonarCloud’s PR decoration.
Performance Considerations
Semgrep is fast on small codebases but can be slow on large ones because it analyzes every file. Tune performance with:
# Skip files that do not need analysis
semgrep --exclude='**/*.min.js' --exclude='**/vendor/**'
# Run only rules relevant to changed files (in CI — requires git diff)
semgrep --config=.semgrep/ $(git diff --name-only origin/main...HEAD | grep '\.ts$')
# Limit rule set to only high-confidence rules for PR checks
# Run the full set in scheduled jobs, not on every push
For large monorepos, split the Semgrep job to run in parallel by package directory.
Key Differences
| Concern | Roslyn Analyzers | Semgrep | Notes |
|---|---|---|---|
| Rule language | C# code (DiagnosticAnalyzer) | YAML pattern rules | Semgrep is dramatically simpler to write |
| Compilation required | Yes — analyzer is a .NET assembly | No — YAML is interpreted | Semgrep rules can be edited and tested immediately |
| Pattern matching | Roslyn syntax/symbol API | AST-aware pattern language | Different API, similar power |
| Distribution | NuGet package | YAML files in repository | Semgrep rules are version-controlled in your repo |
| Language support | C# / VB.NET only | 30+ languages | Same tool scans TS, Python, Go, etc. |
| IDE integration | Native VS/Rider integration | VS Code extension, JetBrains plugin | Roslyn is more deeply integrated |
| Rule sharing | NuGet package | Semgrep registry (semgrep.dev) | Different distribution models |
| Fix suggestions | Code fix providers (CodeFixProvider) | fix: field in rule YAML | Semgrep fixes are simpler but functional |
| Community rules | Built into .NET analyzers | Semgrep registry | Both have mature community rulesets |
| Cost | Free (OSS) | Free CLI, paid for team management | Free tier is sufficient for most teams |
Gotchas for .NET Engineers
Gotcha 1: Pattern Matching Is Syntactic, Not Semantic
Roslyn Analyzers work at the semantic level. You can query the symbol table, resolve types, check if a method’s return type implements an interface, and traverse the full call graph. This lets you write rules like “warn if a method that returns IDisposable is not disposed in all control flow paths.”
Semgrep works at the syntax/AST level. It matches code patterns, not semantic relationships. This means you cannot ask “is this variable of type X?” or “does this method ever reach a code path that calls Y?” — at least not without Semgrep’s taint analysis mode (Pro tier).
Design your rules around what you can see syntactically:
# This works — pattern matching on syntax
- id: no-setTimeout-in-angular-or-react-components
pattern: setTimeout($CALLBACK, $DELAY)
paths:
include:
- 'src/components/**'
- 'src/pages/**'
message: Use useEffect cleanup or NgZone.runOutsideAngular instead
# This does NOT work — requires semantic knowledge of types
# (Semgrep cannot determine whether $SERVICE is an HttpClient instance)
- id: no-http-in-component # INVALID — Semgrep cannot resolve this
pattern: $SERVICE.get($URL)
# Would need to know that $SERVICE is of type HttpService
Gotcha 2: YAML Indentation Errors Produce Confusing Failures
Semgrep YAML rules are indentation-sensitive. A misaligned pattern-not or an either block with inconsistent indentation will either silently do the wrong thing or fail with a cryptic parse error.
Always validate rules before running them in CI:
# Validate rule syntax
semgrep --validate --config=.semgrep/rules.yml
# Test rules against fixture files (the right way to develop rules)
# See: semgrep --test
Use Semgrep’s Playground at semgrep.dev/playground to write and test rules interactively before committing them. It shows you exactly what code each pattern matches, which is faster than the edit-run-check cycle locally.
Gotcha 3: Community Rules Have False Positives — Review Before Adopting
The Semgrep registry has excellent rules, but “excellent” in the context of a community ruleset means “works for most teams most of the time.” Some rules will fire on patterns in your codebase that are deliberate and correct. Running semgrep --config=p/nodejs --error on an existing project without reviewing each rule first will cause spurious CI failures.
The correct workflow:
- Run
semgrep --config=p/nodejs(without--error) and review all findings - For each finding: decide if it is a real issue or a false positive
- For real issues: fix them
- For false positives: either suppress inline with
// nosemgrepor exclude the rule from your configuration - Once clean, add
--errorto the CI invocation
# Run a ruleset and output JSON for systematic review
semgrep --config=p/nodejs --json > findings.json
# Review the rules that fired (to exclude the noisy ones)
cat findings.json | jq '[.results[].check_id] | unique'
Gotcha 4: Semgrep Does Not Replace ESLint or TypeScript Compiler Checks
Semgrep, ESLint, and the TypeScript compiler check different things. They are complementary, not redundant:
- TypeScript compiler: Type errors, structural type violations, undefined variables
- ESLint: Style conventions, common antipatterns, React hooks rules, accessibility
- Semgrep: Security patterns, team-specific conventions, cross-file enforcement
The common mistake is trying to enforce style rules (import ordering, naming conventions) with Semgrep when ESLint has better support for those. Semgrep shines for rules that ESLint cannot express — multi-file patterns, complex condition combinations, and security-specific patterns where the rule needs to be opaque to the developer (to prevent cargo-culting around it).
Gotcha 5: Taint Tracking Requires Pro Tier for Full Effectiveness
Semgrep’s free tier has excellent pattern matching. Its taint analysis — tracing untrusted input through the code to an unsafe sink — requires the Pro (paid) tier. Taint analysis is what lets Semgrep catch:
// User input from the request body flows to a shell command without sanitization
const filename = req.body.filename;
exec(`ls ${filename}`); // Command injection — taint flows from req.body to exec
In the free tier, you can write a pattern that catches exec( + template literal, but you cannot trace whether the template variable contains user input. For the most impactful security rules, taint analysis is worth the cost. For convention enforcement, the free tier is sufficient.
Hands-On Exercise
Write and deploy three custom Semgrep rules for your project’s conventions.
-
Create a
.semgrep/directory at your repository root and add arules.ymlfile. -
Write a rule that prevents
console.login service files (undersrc/*/services/or similar) and allows it in test files. Test it against your codebase withsemgrep --config=.semgrep/rules.yml src/. -
Write a rule that catches missing
awaiton async function calls where the result is discarded. Hint: look at the Semgrep pattern forpattern: $X($...ARGS)combined with context about async calls. -
Write a rule that enforces that all NestJS route handlers using
@Body()include a validation step. (See the example in the article for the starting point.) -
Add the Semgrep GitHub Actions workflow. Run it on a PR branch and verify findings appear as annotations on the diff.
-
Browse the Semgrep registry at semgrep.dev/r and find three rules from
p/nodejsorp/typescriptthat apply to your codebase. Add them to your CI config. -
Introduce a deliberate violation of one of your custom rules (add a
console.logto a service), push to a branch, open a PR, and verify the CI job fails with a clear message referencing the rule.
Quick Reference
| Task | Command |
|---|---|
| Run with community rules | semgrep --config=auto . |
| Run specific ruleset | semgrep --config=p/typescript . |
| Run custom rules | semgrep --config=.semgrep/ . |
| Fail CI on findings | semgrep --config=.semgrep/ --error . |
| Output SARIF | semgrep --sarif --output=results.sarif . |
| Validate rule syntax | semgrep --validate --config=.semgrep/rules.yml |
| Test rules against fixtures | semgrep --test .semgrep/ |
| Suppress at a line | // nosemgrep: rule-id |
| Suppress all rules at line | // nosemgrep |
Minimal Rule Template
rules:
- id: rule-id-in-kebab-case
pattern: |
the_pattern_to_match(...)
message: >
What is wrong and how to fix it. One paragraph.
languages: [typescript]
severity: ERROR # ERROR | WARNING | INFO
paths:
include:
- 'src/**'
exclude:
- '**/*.spec.ts'
- '**/*.test.ts'
Pattern Combination Reference
# AND — all must match
patterns:
- pattern: A
- pattern-not: B
# OR — any must match
pattern-either:
- pattern: A
- pattern: B
# Context restriction
patterns:
- pattern: dangerous_call()
- pattern-inside: |
function $FUNC(...) { ... }
Severity Levels and CI Behavior
| Severity | Exit Code | PR Impact |
|---|---|---|
ERROR | 1 (with --error) | Blocks merge |
WARNING | 0 | Annotation only |
INFO | 0 | Annotation only |
Further Reading
- Semgrep Documentation — Complete reference for rule syntax, pattern language, and CLI options
- Semgrep Rule Registry — Browse and search community rules by language and category
- Semgrep Playground — Write and test rules interactively against sample code
- Writing Custom Rules Tutorial — Official guide to pattern syntax, metavariables, and advanced matching