Unraveling Tree-Sitter Queries: Your Guide to Code Analysis Magic

Hi there! I'm Shrijith Venkatrama, founder of Hexmos. Right now, I’m building LiveAPI, a first of its kind tool for helping you automatically index API endpoints across all your repositories. LiveAPI helps you discover, understand and use APIs in large tech infrastructures with ease. Tree-Sitter is a parser generator tool that creates fast, incremental parsers for programming languages. Its query mechanism lets developers extract specific patterns from code’s abstract syntax tree (AST). This post dives into how Tree-Sitter queries work, why they’re useful, and how you can wield them for tasks like code analysis, linting, or building editor features. Expect practical examples, clear explanations, and a focus on making this powerful tool approachable. What Is Tree-Sitter’s Query Mechanism? Tree-Sitter parses code into an AST, a tree structure representing the code’s syntax. The query mechanism is a way to search and match patterns in this tree. Think of it as a super-powered regex for code structure, letting you find things like function declarations, variable assignments, or specific syntax errors. Queries are written in a Lisp-like syntax and executed against the AST to return matching nodes. They’re fast, precise, and language-agnostic, making them ideal for tools like Neovim, VS Code, or custom linters. Why it matters: Unlike string-based searches, queries understand code’s structure, so you can target syntactic elements accurately. Tree-Sitter Documentation How Queries Are Structured A Tree-Sitter query is a sequence of patterns written in a parenthesized syntax. Each pattern describes a node type or structure in the AST. You can also use captures to name parts of the match for later use. Here’s a basic query example for JavaScript: // Query to find all function declarations (function_declaration name: (identifier) @function.name) Key components: (function_declaration ...): Matches nodes of type function_declaration. name: (identifier): Specifies the name field of the node is an identifier. @function.name: Captures the identifier as function.name for processing. Output: When run on JavaScript code, this query captures the names of all function declarations, like myFunction in function myFunction() {}. Queries can include wildcards, alternations, or predicates for more complex matching. We’ll see more examples later. Setting Up Tree-Sitter for Queries To use queries, you need a Tree-Sitter parser for your target language and a way to run queries. Most developers interact with Tree-Sitter via a library in languages like Rust, JavaScript, or Lua (e.g., in Neovim). Here’s how to set up Tree-Sitter in Node.js to query JavaScript code: const Parser = require('tree-sitter'); const JavaScript = require('tree-sitter-javascript'); // Initialize parser const parser = new Parser(); parser.setLanguage(JavaScript); // Sample code to parse const code = ` function greet() { console.log("Hello!"); } `; // Parse code into AST const tree = parser.parse(code); // Run a query const query = new Parser.Query( JavaScript, '(function_declaration name: (identifier) @function.name)' ); const matches = query.matches(tree.rootNode); // Output results matches.forEach(match => { match.captures.forEach(capture => { console.log(`Found function: ${capture.node.text}`); }); }); // Output: // Found function: greet Steps: Install tree-sitter and tree-sitter-javascript via npm. Initialize a parser with the JavaScript grammar. Parse code into an AST. Create and run a query to extract matches. This setup works for any Tree-Sitter-supported language. Matching Simple Patterns Let’s start with a simple query to find all variable declarations in JavaScript (let, const, var). This is useful for linting or analyzing variable usage. const Parser = require('tree-sitter'); const JavaScript = require('tree-sitter-javascript'); const parser = new Parser(); parser.setLanguage(JavaScript); const code = ` let x = 10; const y = 20; var z = 30; `; const tree = parser.parse(code); const query = new Parser.Query( JavaScript, '(variable_declarator name: (identifier) @var.name)' ); const matches = query.matches(tree.rootNode); matches.forEach(match => { match.captures.forEach(capture => { console.log(`Variable: ${capture.node.text}`); }); }); // Output: // Variable: x // Variable: y // Variable: z What’s happening: The query (variable_declarator name: (identifier) @var.name) matches variable_declarator nodes and captures their name field. Each match corresponds to a variable like x, y, or z. This pattern is great for static analysis tasks, like checking for unused variables. Advanced Pattern Matching with Predicates Queries can use predicates to add conditions to matches. For example, you might want to find function declarations with a specific name. Predicates like #eq? let you filter based on node text. Here’s an

May 8, 2025 - 18:55
 0
Unraveling Tree-Sitter Queries: Your Guide to Code Analysis Magic

Hi there! I'm Shrijith Venkatrama, founder of Hexmos. Right now, I’m building LiveAPI, a first of its kind tool for helping you automatically index API endpoints across all your repositories. LiveAPI helps you discover, understand and use APIs in large tech infrastructures with ease.

Tree-Sitter is a parser generator tool that creates fast, incremental parsers for programming languages. Its query mechanism lets developers extract specific patterns from code’s abstract syntax tree (AST). This post dives into how Tree-Sitter queries work, why they’re useful, and how you can wield them for tasks like code analysis, linting, or building editor features. Expect practical examples, clear explanations, and a focus on making this powerful tool approachable.

What Is Tree-Sitter’s Query Mechanism?

Tree-Sitter parses code into an AST, a tree structure representing the code’s syntax. The query mechanism is a way to search and match patterns in this tree. Think of it as a super-powered regex for code structure, letting you find things like function declarations, variable assignments, or specific syntax errors.

Queries are written in a Lisp-like syntax and executed against the AST to return matching nodes. They’re fast, precise, and language-agnostic, making them ideal for tools like Neovim, VS Code, or custom linters.

Why it matters: Unlike string-based searches, queries understand code’s structure, so you can target syntactic elements accurately.

Tree-Sitter Documentation

How Queries Are Structured

A Tree-Sitter query is a sequence of patterns written in a parenthesized syntax. Each pattern describes a node type or structure in the AST. You can also use captures to name parts of the match for later use.

Here’s a basic query example for JavaScript:

// Query to find all function declarations
(function_declaration
  name: (identifier) @function.name)

Key components:

  • (function_declaration ...): Matches nodes of type function_declaration.
  • name: (identifier): Specifies the name field of the node is an identifier.
  • @function.name: Captures the identifier as function.name for processing.

Output: When run on JavaScript code, this query captures the names of all function declarations, like myFunction in function myFunction() {}.

Queries can include wildcards, alternations, or predicates for more complex matching. We’ll see more examples later.

Setting Up Tree-Sitter for Queries

To use queries, you need a Tree-Sitter parser for your target language and a way to run queries. Most developers interact with Tree-Sitter via a library in languages like Rust, JavaScript, or Lua (e.g., in Neovim).

Here’s how to set up Tree-Sitter in Node.js to query JavaScript code:

const Parser = require('tree-sitter');
const JavaScript = require('tree-sitter-javascript');

// Initialize parser
const parser = new Parser();
parser.setLanguage(JavaScript);

// Sample code to parse
const code = `
function greet() {
  console.log("Hello!");
}
`;

// Parse code into AST
const tree = parser.parse(code);

// Run a query
const query = new Parser.Query(
  JavaScript,
  '(function_declaration name: (identifier) @function.name)'
);
const matches = query.matches(tree.rootNode);

// Output results
matches.forEach(match => {
  match.captures.forEach(capture => {
    console.log(`Found function: ${capture.node.text}`);
  });
});

// Output:
// Found function: greet

Steps:

  1. Install tree-sitter and tree-sitter-javascript via npm.
  2. Initialize a parser with the JavaScript grammar.
  3. Parse code into an AST.
  4. Create and run a query to extract matches.

This setup works for any Tree-Sitter-supported language.

Matching Simple Patterns

Let’s start with a simple query to find all variable declarations in JavaScript (let, const, var). This is useful for linting or analyzing variable usage.

const Parser = require('tree-sitter');
const JavaScript = require('tree-sitter-javascript');

const parser = new Parser();
parser.setLanguage(JavaScript);

const code = `
let x = 10;
const y = 20;
var z = 30;
`;

const tree = parser.parse(code);

const query = new Parser.Query(
  JavaScript,
  '(variable_declarator name: (identifier) @var.name)'
);
const matches = query.matches(tree.rootNode);

matches.forEach(match => {
  match.captures.forEach(capture => {
    console.log(`Variable: ${capture.node.text}`);
  });
});

// Output:
// Variable: x
// Variable: y
// Variable: z

What’s happening:

  • The query (variable_declarator name: (identifier) @var.name) matches variable_declarator nodes and captures their name field.
  • Each match corresponds to a variable like x, y, or z.

This pattern is great for static analysis tasks, like checking for unused variables.

Advanced Pattern Matching with Predicates

Queries can use predicates to add conditions to matches. For example, you might want to find function declarations with a specific name. Predicates like #eq? let you filter based on node text.

Here’s an example to find functions named greet:

const Parser = require('tree-sitter');
const JavaScript = require('tree-sitter-javascript');

const parser = new Parser();
parser.setLanguage(JavaScript);

const code = `
function greet() {}
function farewell() {}
`;

const tree = parser.parse(code);

const query = new Parser.Query(
  JavaScript,
  `
  (function_declaration
    name: (identifier) @function.name
    (#eq? @function.name "greet"))
  `
);
const matches = query.matches(tree.rootNode);

matches.forEach(match => {
  match.captures.forEach(capture => {
    console.log(`Matched function: ${capture.node.text}`);
  });
});

// Output:
// Matched function: greet

Key points:

  • (#eq? @function.name "greet") ensures the captured function.name equals greet.
  • Predicates make queries highly specific, ideal for targeted analysis.

Tree-Sitter Query Predicates

Combining Patterns for Complex Queries

You can combine patterns to match nested or alternative structures. Let’s find all function calls in JavaScript where the function is console.log.

const Parser = require('tree-sitter');
const JavaScript = require('tree-sitter-javascript');

const parser = new Parser();
parser.setLanguage(JavaScript);

const code = `
console.log("Hello");
console.error("Error");
`;

const tree = parser.parse(code);

const query = new Parser.Query(
  JavaScript,
  `
  (call_expression
    function: (member_expression
      object: (identifier) @object
      property: (property_identifier) @property
      (#eq? @object "console")
      (#eq? @property "log")))
  `
);
const matches = query.matches(tree.rootNode);

matches.forEach(match => {
  console.log(`Found console.log call at line ${match.captures[0].node.startPosition.row + 1}`);
});

// Output:
// Found console.log call at line 1

Breakdown:

  • (call_expression ...) matches function calls.
  • (member_expression ...) ensures the function is a property access like console.log.
  • Predicates #eq? filter for console and log.

This is useful for linting rules, like banning console.log in production code.

Practical Use Cases and Examples

Tree-Sitter queries shine in real-world scenarios. Here are some use cases with examples:

Use Case Query Example Purpose
Find unused variables (variable_declarator name: (identifier) @var.name) Check for variables declared but not used
Detect deprecated APIs (call_expression function: (identifier) @func (#eq? @func "oldAPI")) Flag deprecated function calls
Enforce naming conventions (function_declaration name: (identifier) @name (#match? @name "^[a-z]+")) Ensure camelCase function names

Example: Enforce camelCase function names

const Parser = require('tree-sitter');
const JavaScript = require('tree-sitter-javascript');

const parser = new Parser();
parser.setLanguage(JavaScript);

const code = `
function myFunction() {}
function BadFunction() {}
`;

const tree = parser.parse(code);

const query = new Parser.Query(
  JavaScript,
  `
  (function_declaration
    name: (identifier) @function.name
    (#match? @function.name "^[a-z][a-zA-Z]*$"))
  `
);
const matches = query.matches(tree.rootNode);

matches.forEach(match => {
  match.captures.forEach(capture => {
    console.log(`Valid function name: ${capture.node.text}`);
  });
});

// Output:
// Valid function name: myFunction

This query uses #match? to ensure function names start with a lowercase letter and follow camelCase.

Tips for Writing Effective Queries

Here are practical tips to make your Tree-Sitter queries robust:

  • Understand the AST: Use Tree-Sitter’s playground to inspect the AST for your language. This helps you write accurate node types.
  • Start simple: Begin with basic patterns and add complexity gradually.
  • Use captures wisely: Name captures clearly (e.g., @function.name) for readable code.
  • Test incrementally: Run queries on small code snippets to verify matches.
  • Leverage predicates: Use #eq?, #match?, or custom predicates for precise filtering.

Example: Inspecting AST

To see the AST for JavaScript code, use the Tree-Sitter CLI:

tree-sitter parse example.js

This outputs the AST, helping you identify node types like function_declaration or call_expression.

Tree-Sitter Playground

Where to Go from Here

Tree-Sitter’s query mechanism is a game-changer for code analysis, but it’s just the start. You can integrate queries into editor plugins (e.g., Neovim’s Treesitter), build custom linters, or create code metrics tools. Start by experimenting with simple queries on your codebase, then explore advanced features like incremental parsing for real-time analysis.

Try combining queries with other tools, like ESLint for JavaScript or RuboCop for Ruby, to enhance their structural analysis capabilities. If you’re building a tool, check out Tree-Sitter’s bindings for Rust, Python, or Lua to embed queries in your app.

The key is to play around and iterate. Write queries, test them, and refine based on what you learn. With practice, you’ll be slicing through ASTs like a pro, making your dev tools smarter and your workflows smoother.