Getting Started with SAST and Semgrep CLI

Securing software is difficult and not always top of mind when developing an application. A security engineer at a large bank once told our team that if its development stopped, he calculated it would still take over 100 years for them to get through their vulnerability backlog using traditional Static Application Security Testing (SAST) tools. Semgrep solves this problem by understanding the semantics and cutting through the noise. What is Semgrep? Semgrep is used to find application security vulnerabilities through enforced guardrails and coding standards. Semgrep Community Edition is a fast, open-source, static code analysis engine at the heart of the services. While a common tool like grep can search with regular expressions to match exact strings, Semgrep understands the semantics of source code to identify patterns and data flow which helps remove false positives. For example, a search for 2 with grep would find many false positives, but with semgrep, a rule can more precisely match pattern expressions including variations like: x = 1; y = x + 1. A few reasons security-conscious teams added Semgrep to their development pipeline: Support for 30+ programming languages Simple rule syntax that allows for customization and extensibility without DSLs, managing abstract syntax trees, or regex wrangling Can run locally in a command line interface (CLI), integrated with your favorite integrated development environment (IDE), as a source control pre-commit hook, in continuous integration and delivery (CI/CD) pipelines, or as a managed platform service. Semgrep rules exist to help find everything from logic errors, code smells, and security vulnerabilities such as SQL injection, cross-site scripting, secrets leaking, and much much more by analyzing the source code itself. Installation A typical first step is to look at findings for an individual file using the CLI. Installation for macOS: brew install semgrep Installation for Linux/BSD/macOS: python3 -m pip install semgrep Can also be run from a Docker container on Windows: docker run -it -v “${PWD}:/src” semgrep/semgrep To test that installation was successful and that semgrep can be found in your path try: semgrep –h Run a First Scan All CLI processing is done locally on your computer or build environment, not uploaded to a service for analysis. To try this for yourself, you can use your own existing project or start with a simple test project. $ ls foo.py bar.js For example, this code: import sys import subprocess input = “ “.join(sys.argv[1:]) print(“Python is easy”) subprocess.run(input, shell=True) Of course, you would never write code like this, but is that true for everybody on your team? From the root directory of your project, run: semgrep scan –config auto The CLI will pull down rules from the rule registry to test your source code. The output may look similar to this: Scanning 2 files (only git-tracked) with: ✔ Semgrep OSS ✔ Basic security coverage for first-party code vulnerabilities. ✔ Semgrep Code (SAST) ✔ Find and fix vulnerabilities in the code you write with advanced scanning and expert security rules. ✘ Semgrep Supply Chain (SCA) ✘ Find and fix the reachable vulnerabilities in your OSS dependencies. Supply Chain (SCA) and Secrets rules are only available when you sign up at semgrep.dev. When using the free Community Edition you’ll only have access to a subset of the total rules. Here’s our results: ┌────────────────┐ │ 1 Code Finding │ └────────────────┘ foo.py ❯❯❱ python.lang.security.audit.subprocess-shell-true.subprocess-shell-true Found 'subprocess' function 'run' with 'shell=True'. This is dangerous because this call will spawn the command using a shell process. Doing so propagates current shell settings and variables, which makes it much easier for a malicious actor to execute commands. Use 'shell=False' instead. Details: https://sg.run/J92w ▶▶┆ Autofix ▶ False 7┆ subprocess.run(input, shell=True) The finding was triggered by the rule python.lang.security.audit.subprocess-shell-true.subprocess-shell-true which you can learn more about in the Rule Registry. You can create custom rules or even run one-off checks such as finding any output you forgot to remove. $ semgrep -e ‘console.log(...)’ –lang=js ./bar.js ┌────────────────┐ │ 1 Code Finding │ └────────────────┘ bar.js 1337┆ console.log("DEBUG: remove this later"); ┌──────────────┐ │ Scan Summary │ └──────────────┘ Ran 1 rule on 1 file: 1 finding. These were quick Python and JavaScript examples, but Semgrep tools have the motto yes we scan with support for: Apex, Bash, C, C++, C#, Clojure, Dart, Dockerfile, Elixir, HTML, Go, Java, JavaScript, JSX, JSON, Jul

Apr 12, 2025 - 00:53
 0
Getting Started with SAST and Semgrep CLI

Securing software is difficult and not always top of mind when developing an application. A security engineer at a large bank once told our team that if its development stopped, he calculated it would still take over 100 years for them to get through their vulnerability backlog using traditional Static Application Security Testing (SAST) tools. Semgrep solves this problem by understanding the semantics and cutting through the noise.

What is Semgrep?

Semgrep is used to find application security vulnerabilities through enforced guardrails and coding standards. Semgrep Community Edition is a fast, open-source, static code analysis engine at the heart of the services. While a common tool like grep can search with regular expressions to match exact strings, Semgrep understands the semantics of source code to identify patterns and data flow which helps remove false positives.

For example, a search for 2 with grep would find many false positives, but with semgrep, a rule can more precisely match pattern expressions including variations like: x = 1; y = x + 1.

A few reasons security-conscious teams added Semgrep to their development pipeline:

Semgrep rules exist to help find everything from logic errors, code smells, and security vulnerabilities such as SQL injection, cross-site scripting, secrets leaking, and much much more by analyzing the source code itself.

Semgrep Architecture and DevEx Overview

Installation

A typical first step is to look at findings for an individual file using the CLI.

Installation for macOS:

brew install semgrep

Installation for Linux/BSD/macOS:

python3 -m pip install semgrep

Can also be run from a Docker container on Windows:

docker run -it -v${PWD}:/src” semgrep/semgrep

To test that installation was successful and that semgrep can be found in your path try:

semgrep –h

Run a First Scan

All CLI processing is done locally on your computer or build environment, not uploaded to a service for analysis.

To try this for yourself, you can use your own existing project or start with a simple test project.

$ ls
foo.py
bar.js

For example, this code:

import sys
import subprocess

input =  .join(sys.argv[1:])

print(Python is easy)
subprocess.run(input, shell=True)

Of course, you would never write code like this, but is that true for everybody on your team?

From the root directory of your project, run:

semgrep scan –config auto

The CLI will pull down rules from the rule registry to test your source code.

The output may look similar to this:

Scanning 2 files (only git-tracked) with:

✔ Semgrep OSS
  ✔ Basic security coverage for first-party code vulnerabilities.

✔ Semgrep Code (SAST)
  ✔ Find and fix vulnerabilities in the code you write with advanced scanning and expert security rules.

✘ Semgrep Supply Chain (SCA)
  ✘ Find and fix the reachable vulnerabilities in your OSS dependencies.

Supply Chain (SCA) and Secrets rules are only available when you sign up at semgrep.dev. When using the free Community Edition you’ll only have access to a subset of the total rules. Here’s our results:

┌────────────────┐
│ 1 Code Finding │
└────────────────┘

    foo.py
   ❯❯❱ python.lang.security.audit.subprocess-shell-true.subprocess-shell-true
          Found 'subprocess' function 'run' with 'shell=True'. This is dangerous because this call will spawn
          the command using a shell process. Doing so propagates current shell settings and variables, which 
          makes it much easier for a malicious actor to execute commands. Use 'shell=False' instead.         
          Details: https://sg.run/J92w                                                                       

           ▶▶┆ Autofix ▶ False
            7┆ subprocess.run(input, shell=True)

The finding was triggered by the rule python.lang.security.audit.subprocess-shell-true.subprocess-shell-true which you can learn more about in the Rule Registry.

You can create custom rules or even run one-off checks such as finding any output you forgot to remove.

$ semgrep -e ‘console.log(...)’ –lang=js ./bar.js

┌────────────────┐
│ 1 Code Finding │
└────────────────┘

    bar.js
            1337┆ console.log("DEBUG: remove this later");
┌──────────────┐
│ Scan Summary │
└──────────────┘

Ran 1 rule on 1 file: 1 finding.            

These were quick Python and JavaScript examples, but Semgrep tools have the motto yes we scan with support for: Apex, Bash, C, C++, C#, Clojure, Dart, Dockerfile, Elixir, HTML, Go, Java, JavaScript, JSX, JSON, Julia, Jsonnet, Kotlin, Lisp, Lua, OCaml, PHP, Python, R, Ruby, Rust, Scala, Scheme, Solidity, Swift, Terraform, TypeScript, TSX, YAML, XML, etc.

Scaling Development Team Workflows

The point of all this is that while we may try to be security conscious when developing software, there are lots of gotchas to know about and when collaborating with other software developers it can be difficult to know if everybody on the team is as well versed on security best practices for every language in use.

For more complex projects, you also need to be able to find cross-file issues, supply chain attacks, and prevent secrets from leaking before they are committed. For these use cases, there is more information in the Semgrep Docs or by joining a webinar or book a demo to learn more about setting up more complex team workflows.