Digging Deeper into Code with Tree-Sitter: How to Query Your Syntax Tree

Hello everyone! I'm Shailendra. In my last post — 'Tree-Sitter: From Code to Syntax-Tree' — I talked about how we can use Tree-sitter to generate a syntax tree for a given language using its grammar. In this post, we’ll go one step further and see how Tree-sitter can be used to make queries on your source text to understand and analyze various parts of the code. This is another very interesting feature of Tree-sitter that can help you get deeper insights into your source code. For example, you can write a query to list all the functions present in a repo's source code. And that's exactly what we’ll explore in this post. Just like the last post, I’ll use GoLang for demonstrating the use cases. If you’re not familiar with how to use Tree-sitter in GoLang, please check the previous post for installation steps and the basic code setup. Our Use Case Now, let’s get into our use case. As mentioned, I’ll demonstrate how to list all the functions from a given source file. Here’s the GoLang code we’ll be analyzing: package main import "fmt" func add(num1 int32, num2 int32) int32 { return num1 + num2 } func subtract(num1, num2 int32) int32 { return num1 - num2 } func calculator(operator string, num1, num2 int32) int32 { switch operator { case "+": return add(num1, num2) case "-": return subtract(num1, num2) default: fmt.Errorf("Unknown operator") } return -1 } Parse Tree Output Let’s quickly look at how the parse tree would look for this code: (source_file (package_clause (package_identifier)) (import_declaration (import_spec path: (interpreted_string_literal))) (function_declaration name: (identifier) parameters: (parameter_list (parameter_declaration name: (identifier) type: (type_identifier)) (parameter_declaration name: (identifier) type: (type_identifier))) result: (type_identifier) body: (block (return_statement (expression_list (binary_expression left: (identifier) right: (identifier)))))) (function_declaration name: (identifier) parameters: (parameter_list (parameter_declaration name: (identifier) name: (identifier) type: (type_identifier))) result: (type_identifier) body: (block (return_statement (expression_list (binary_expression left: (identifier) right: (identifier)))))) (function_declaration name: (identifier) parameters: (parameter_list (parameter_declaration name: (identifier) type: (type_identifier)) (parameter_declaration name: (identifier) name: (identifier) type: (type_identifier))) result: (type_identifier) body: (block (expression_switch_statement value: (identifier) (expression_case value: (expression_list (interpreted_string_literal)) (return_statement (expression_list (call_expression function: (identifier) arguments: (argument_list (identifier) (identifier)))))) (expression_case value: (expression_list (interpreted_string_literal)) (return_statement (expression_list (call_expression function: (identifier) arguments: (argument_list (identifier) (identifier)))))) (default_case (expression_statement (call_expression function: (selector_expression operand: (identifier) field: (field_identifier)) arguments: (argument_list (interpreted_string_literal)))))) (return_statement (expression_list (unary_expression operand: (int_literal)))))) ) Notice how the parameters in the add and subtract functions are presented differently. That’s because Go allows you to group parameters with the same type, and Tree-sitter reflects that nicely in the syntax tree. Printing the Syntax Tree Here’s the main code that prints the above Tree-sitter output: package main import ( "context" "fmt" "log" "os" tree_sitter "github.com/smacker/go-tree-sitter" "github.com/smacker/go-tree-sitter/golang" ) func main() { parser := tree_sitter.NewParser() parser.SetLanguage(golang.GetLanguage()) data, err := os.ReadFile("example.go") if err != nil { log.Fatal(err) } tree, err := parser.ParseCtx(context.Background(), nil, data) if err != nil { log.Fatal(err) } root := tree.RootNode() fmt.Println("Root type:", root.Type()) fmt.Println("Tree: ", root.String()) } Note: The source code above is part of the e

May 14, 2025 - 01:58
 0
Digging Deeper into Code with Tree-Sitter: How to Query Your Syntax Tree

Hello everyone! I'm Shailendra.
In my last post — 'Tree-Sitter: From Code to Syntax-Tree' — I talked about how we can use Tree-sitter to generate a syntax tree for a given language using its grammar.

In this post, we’ll go one step further and see how Tree-sitter can be used to make queries on your source text to understand and analyze various parts of the code.

This is another very interesting feature of Tree-sitter that can help you get deeper insights into your source code. For example, you can write a query to list all the functions present in a repo's source code. And that's exactly what we’ll explore in this post.

Just like the last post, I’ll use GoLang for demonstrating the use cases. If you’re not familiar with how to use Tree-sitter in GoLang, please check the previous post for installation steps and the basic code setup.

Our Use Case

Now, let’s get into our use case. As mentioned, I’ll demonstrate how to list all the functions from a given source file.

Here’s the GoLang code we’ll be analyzing:

package main

import "fmt"

func add(num1 int32, num2 int32) int32 {
    return num1 + num2
}

func subtract(num1, num2 int32) int32 {
    return num1 - num2
}

func calculator(operator string, num1, num2 int32) int32 {
    switch operator {
    case "+":
        return add(num1, num2)
    case "-":
        return subtract(num1, num2)
    default:
        fmt.Errorf("Unknown operator")
    }

    return -1
}

Parse Tree Output

Let’s quickly look at how the parse tree would look for this code:

(source_file
  (package_clause
    (package_identifier))

  (import_declaration
    (import_spec
      path: (interpreted_string_literal)))

  (function_declaration
    name: (identifier)
    parameters: (parameter_list
      (parameter_declaration
        name: (identifier)
        type: (type_identifier))
      (parameter_declaration
        name: (identifier)
        type: (type_identifier)))
    result: (type_identifier)
    body: (block
      (return_statement
        (expression_list
          (binary_expression
            left: (identifier)
            right: (identifier))))))

  (function_declaration
    name: (identifier)
    parameters: (parameter_list
      (parameter_declaration
        name: (identifier)
        name: (identifier)
        type: (type_identifier)))
    result: (type_identifier)
    body: (block
      (return_statement
        (expression_list
          (binary_expression
            left: (identifier)
            right: (identifier))))))

  (function_declaration
    name: (identifier)
    parameters: (parameter_list
      (parameter_declaration
        name: (identifier)
        type: (type_identifier))
      (parameter_declaration
        name: (identifier)
        name: (identifier)
        type: (type_identifier)))
    result: (type_identifier)
    body: (block
      (expression_switch_statement
        value: (identifier)

        (expression_case
          value: (expression_list
            (interpreted_string_literal))
          (return_statement
            (expression_list
              (call_expression
                function: (identifier)
                arguments: (argument_list
                  (identifier)
                  (identifier))))))

        (expression_case
          value: (expression_list
            (interpreted_string_literal))
          (return_statement
            (expression_list
              (call_expression
                function: (identifier)
                arguments: (argument_list
                  (identifier)
                  (identifier))))))

        (default_case
          (expression_statement
            (call_expression
              function: (selector_expression
                operand: (identifier)
                field: (field_identifier))
              arguments: (argument_list
                (interpreted_string_literal))))))

      (return_statement
        (expression_list
          (unary_expression
            operand: (int_literal))))))
)

Notice how the parameters in the add and subtract functions are presented differently. That’s because Go allows you to group parameters with the same type, and Tree-sitter reflects that nicely in the syntax tree.

Printing the Syntax Tree

Here’s the main code that prints the above Tree-sitter output:

package main

import (
    "context"
    "fmt"
    "log"
    "os"

    tree_sitter "github.com/smacker/go-tree-sitter"
    "github.com/smacker/go-tree-sitter/golang"
)

func main() {
    parser := tree_sitter.NewParser()
    parser.SetLanguage(golang.GetLanguage())

    data, err := os.ReadFile("example.go")
    if err != nil {
        log.Fatal(err)
    }
    tree, err := parser.ParseCtx(context.Background(), nil, data)
    if err != nil {
        log.Fatal(err)
    }

    root := tree.RootNode()
    fmt.Println("Root type:", root.Type())
    fmt.Println("Tree:
", root.String())
}

Note: The source code above is part of the example.go file.

Querying Function Names

Now, let’s try to capture all the function names in the source. To do this, we’ll provide a query that matches all the function names in the syntax tree.

A function node looks like this:

(function_declaration
    name: (identifier)
    parameters: (parameter_list)
    result: (type_identifier)
    body: (block)
)

To extract just the function name:

(function_declaration
    name: (identifier) @function-name
)

Code for this query:

package main

import (
    "context"
    "fmt"
    "log"
    "os"

    tree_sitter "github.com/smacker/go-tree-sitter"
    "github.com/smacker/go-tree-sitter/golang"
)

var (
    function_declarations_query = `
        (function_declaration
            name: (identifier) @function-name
        )
    `
)

func main() {
    parser := tree_sitter.NewParser()
    parser.SetLanguage(golang.GetLanguage())

    data, err := os.ReadFile("example.go")
    if err != nil {
        log.Fatal(err)
    }
    tree, err := parser.ParseCtx(context.Background(), nil, data)
    if err != nil {
        log.Fatal(err)
    }

    query, err := tree_sitter.NewQuery([]byte(function_declarations_query), golang.GetLanguage())
    if err != nil {
        log.Fatal(err)
    }

    cursor := tree_sitter.NewQueryCursor()
    cursor.Exec(query, tree.RootNode())

    for {
        match, more := cursor.NextMatch()
        if !more {
            break
        }

        for _, capture := range match.Captures {
            node := capture.Node
            fmt.Println("Found function:", node.Content(data))
        }
    }
}

Output:

$ go run main.go 
Found function: add
Found function: subtract
Found function: calculator

Querying Parameters as Well

To print the parameters as well, update the query like this:

(function_declaration
    name: (identifier) @function-name
    parameters: (parameter_list) @parameter-list
)

We just added this:
parameters: (parameter_list) @parameter-list

Output:

$ go run main.go 
Found function: add
Parameters list: (num1 int32, num2 int32)
Found function: subtract
Parameters list: (num1, num2 int32)
Found function: calculator
Parameters list: (operator string, num1, num2 int32)

Final Thoughts

Hope this helps clarify how we can use queries in Tree-sitter to analyze different parts of the code. I encourage you to explore more by experimenting with different queries and observing the output. That’s the best way to understand how Tree-sitter represents various parts of source code using syntax tree nodes.