Digging Deeper into Code with Tree-Sitter: How to Query Your Syntax Tree
Hello everyone! I'm Shailendra. In my last post — 'Tree-Sitter: From Code to Syntax-Tree' — I talked about how we can use Tree-sitter to generate a syntax tree for a given language using its grammar. In this post, we’ll go one step further and see how Tree-sitter can be used to make queries on your source text to understand and analyze various parts of the code. This is another very interesting feature of Tree-sitter that can help you get deeper insights into your source code. For example, you can write a query to list all the functions present in a repo's source code. And that's exactly what we’ll explore in this post. Just like the last post, I’ll use GoLang for demonstrating the use cases. If you’re not familiar with how to use Tree-sitter in GoLang, please check the previous post for installation steps and the basic code setup. Our Use Case Now, let’s get into our use case. As mentioned, I’ll demonstrate how to list all the functions from a given source file. Here’s the GoLang code we’ll be analyzing: package main import "fmt" func add(num1 int32, num2 int32) int32 { return num1 + num2 } func subtract(num1, num2 int32) int32 { return num1 - num2 } func calculator(operator string, num1, num2 int32) int32 { switch operator { case "+": return add(num1, num2) case "-": return subtract(num1, num2) default: fmt.Errorf("Unknown operator") } return -1 } Parse Tree Output Let’s quickly look at how the parse tree would look for this code: (source_file (package_clause (package_identifier)) (import_declaration (import_spec path: (interpreted_string_literal))) (function_declaration name: (identifier) parameters: (parameter_list (parameter_declaration name: (identifier) type: (type_identifier)) (parameter_declaration name: (identifier) type: (type_identifier))) result: (type_identifier) body: (block (return_statement (expression_list (binary_expression left: (identifier) right: (identifier)))))) (function_declaration name: (identifier) parameters: (parameter_list (parameter_declaration name: (identifier) name: (identifier) type: (type_identifier))) result: (type_identifier) body: (block (return_statement (expression_list (binary_expression left: (identifier) right: (identifier)))))) (function_declaration name: (identifier) parameters: (parameter_list (parameter_declaration name: (identifier) type: (type_identifier)) (parameter_declaration name: (identifier) name: (identifier) type: (type_identifier))) result: (type_identifier) body: (block (expression_switch_statement value: (identifier) (expression_case value: (expression_list (interpreted_string_literal)) (return_statement (expression_list (call_expression function: (identifier) arguments: (argument_list (identifier) (identifier)))))) (expression_case value: (expression_list (interpreted_string_literal)) (return_statement (expression_list (call_expression function: (identifier) arguments: (argument_list (identifier) (identifier)))))) (default_case (expression_statement (call_expression function: (selector_expression operand: (identifier) field: (field_identifier)) arguments: (argument_list (interpreted_string_literal)))))) (return_statement (expression_list (unary_expression operand: (int_literal)))))) ) Notice how the parameters in the add and subtract functions are presented differently. That’s because Go allows you to group parameters with the same type, and Tree-sitter reflects that nicely in the syntax tree. Printing the Syntax Tree Here’s the main code that prints the above Tree-sitter output: package main import ( "context" "fmt" "log" "os" tree_sitter "github.com/smacker/go-tree-sitter" "github.com/smacker/go-tree-sitter/golang" ) func main() { parser := tree_sitter.NewParser() parser.SetLanguage(golang.GetLanguage()) data, err := os.ReadFile("example.go") if err != nil { log.Fatal(err) } tree, err := parser.ParseCtx(context.Background(), nil, data) if err != nil { log.Fatal(err) } root := tree.RootNode() fmt.Println("Root type:", root.Type()) fmt.Println("Tree: ", root.String()) } Note: The source code above is part of the e

Hello everyone! I'm Shailendra.
In my last post — 'Tree-Sitter: From Code to Syntax-Tree' — I talked about how we can use Tree-sitter to generate a syntax tree for a given language using its grammar.
In this post, we’ll go one step further and see how Tree-sitter can be used to make queries on your source text to understand and analyze various parts of the code.
This is another very interesting feature of Tree-sitter that can help you get deeper insights into your source code. For example, you can write a query to list all the functions present in a repo's source code. And that's exactly what we’ll explore in this post.
Just like the last post, I’ll use GoLang for demonstrating the use cases. If you’re not familiar with how to use Tree-sitter in GoLang, please check the previous post for installation steps and the basic code setup.
Our Use Case
Now, let’s get into our use case. As mentioned, I’ll demonstrate how to list all the functions from a given source file.
Here’s the GoLang code we’ll be analyzing:
package main
import "fmt"
func add(num1 int32, num2 int32) int32 {
return num1 + num2
}
func subtract(num1, num2 int32) int32 {
return num1 - num2
}
func calculator(operator string, num1, num2 int32) int32 {
switch operator {
case "+":
return add(num1, num2)
case "-":
return subtract(num1, num2)
default:
fmt.Errorf("Unknown operator")
}
return -1
}
Parse Tree Output
Let’s quickly look at how the parse tree would look for this code:
(source_file
(package_clause
(package_identifier))
(import_declaration
(import_spec
path: (interpreted_string_literal)))
(function_declaration
name: (identifier)
parameters: (parameter_list
(parameter_declaration
name: (identifier)
type: (type_identifier))
(parameter_declaration
name: (identifier)
type: (type_identifier)))
result: (type_identifier)
body: (block
(return_statement
(expression_list
(binary_expression
left: (identifier)
right: (identifier))))))
(function_declaration
name: (identifier)
parameters: (parameter_list
(parameter_declaration
name: (identifier)
name: (identifier)
type: (type_identifier)))
result: (type_identifier)
body: (block
(return_statement
(expression_list
(binary_expression
left: (identifier)
right: (identifier))))))
(function_declaration
name: (identifier)
parameters: (parameter_list
(parameter_declaration
name: (identifier)
type: (type_identifier))
(parameter_declaration
name: (identifier)
name: (identifier)
type: (type_identifier)))
result: (type_identifier)
body: (block
(expression_switch_statement
value: (identifier)
(expression_case
value: (expression_list
(interpreted_string_literal))
(return_statement
(expression_list
(call_expression
function: (identifier)
arguments: (argument_list
(identifier)
(identifier))))))
(expression_case
value: (expression_list
(interpreted_string_literal))
(return_statement
(expression_list
(call_expression
function: (identifier)
arguments: (argument_list
(identifier)
(identifier))))))
(default_case
(expression_statement
(call_expression
function: (selector_expression
operand: (identifier)
field: (field_identifier))
arguments: (argument_list
(interpreted_string_literal))))))
(return_statement
(expression_list
(unary_expression
operand: (int_literal))))))
)
Notice how the parameters in the add
and subtract
functions are presented differently. That’s because Go allows you to group parameters with the same type, and Tree-sitter reflects that nicely in the syntax tree.
Printing the Syntax Tree
Here’s the main code that prints the above Tree-sitter output:
package main
import (
"context"
"fmt"
"log"
"os"
tree_sitter "github.com/smacker/go-tree-sitter"
"github.com/smacker/go-tree-sitter/golang"
)
func main() {
parser := tree_sitter.NewParser()
parser.SetLanguage(golang.GetLanguage())
data, err := os.ReadFile("example.go")
if err != nil {
log.Fatal(err)
}
tree, err := parser.ParseCtx(context.Background(), nil, data)
if err != nil {
log.Fatal(err)
}
root := tree.RootNode()
fmt.Println("Root type:", root.Type())
fmt.Println("Tree:
", root.String())
}
Note: The source code above is part of the
example.go
file.
Querying Function Names
Now, let’s try to capture all the function names in the source. To do this, we’ll provide a query that matches all the function names in the syntax tree.
A function node looks like this:
(function_declaration
name: (identifier)
parameters: (parameter_list)
result: (type_identifier)
body: (block)
)
To extract just the function name:
(function_declaration
name: (identifier) @function-name
)
Code for this query:
package main
import (
"context"
"fmt"
"log"
"os"
tree_sitter "github.com/smacker/go-tree-sitter"
"github.com/smacker/go-tree-sitter/golang"
)
var (
function_declarations_query = `
(function_declaration
name: (identifier) @function-name
)
`
)
func main() {
parser := tree_sitter.NewParser()
parser.SetLanguage(golang.GetLanguage())
data, err := os.ReadFile("example.go")
if err != nil {
log.Fatal(err)
}
tree, err := parser.ParseCtx(context.Background(), nil, data)
if err != nil {
log.Fatal(err)
}
query, err := tree_sitter.NewQuery([]byte(function_declarations_query), golang.GetLanguage())
if err != nil {
log.Fatal(err)
}
cursor := tree_sitter.NewQueryCursor()
cursor.Exec(query, tree.RootNode())
for {
match, more := cursor.NextMatch()
if !more {
break
}
for _, capture := range match.Captures {
node := capture.Node
fmt.Println("Found function:", node.Content(data))
}
}
}
Output:
$ go run main.go
Found function: add
Found function: subtract
Found function: calculator
Querying Parameters as Well
To print the parameters as well, update the query like this:
(function_declaration
name: (identifier) @function-name
parameters: (parameter_list) @parameter-list
)
We just added this:
parameters: (parameter_list) @parameter-list
Output:
$ go run main.go
Found function: add
Parameters list: (num1 int32, num2 int32)
Found function: subtract
Parameters list: (num1, num2 int32)
Found function: calculator
Parameters list: (operator string, num1, num2 int32)
Final Thoughts
Hope this helps clarify how we can use queries in Tree-sitter to analyze different parts of the code. I encourage you to explore more by experimenting with different queries and observing the output. That’s the best way to understand how Tree-sitter represents various parts of source code using syntax tree nodes.