Tinkering with Tree-Sitter Using Go

Hi there! I'm Shrijith Venkatrama, founder of Hexmos. Right now, I’m building LiveAPI, a first of its kind tool for helping you automatically index API endpoints across all your repositories. LiveAPI helps you discover, understand and use APIs in large tech infrastructures with ease. Tree-Sitter is a powerful parsing library that generates abstract syntax trees (ASTs) for code in various languages. It’s fast, incremental, and widely used in tools like Neovim and GitHub’s code search. If you’re a Go developer curious about code analysis, Tree-Sitter is a great tool to experiment with. In this post, I’ll walk you through setting up Tree-Sitter in Go, parsing JavaScript code, and exploring the resulting AST. We’ll dive into practical examples, break down the output, and uncover patterns to make sense of it all. This isn’t a formal lecture—just me sharing my experiments, complete with code you can run and outputs you can expect. Let’s get started. Setting Up Tree-Sitter in Your Go Project To tinker with Tree-Sitter, you need a Go project and the Tree-Sitter library. Let’s set up a minimal project to parse some JavaScript code. First, create a new directory and initialize it as a Go module: mkdir gg && cd gg go mod init github.com/yourusername/gg Next, you’ll need the Tree-Sitter Go bindings and a language grammar (we’ll use JavaScript for this example). Install them with: go get github.com/tree-sitter/go-tree-sitter go get github.com/tree-sitter/tree-sitter-javascript/bindings/go These packages give you the core Tree-Sitter library and the JavaScript grammar. The Tree-Sitter Go bindings are lightweight and easy to use, making setup a breeze. Now, create a file called main.go with the following code to test the setup: package main import ( "fmt" tree_sitter "github.com/tree-sitter/go-tree-sitter" tree_sitter_javascript "github.com/tree-sitter/tree-sitter-javascript/bindings/go" ) func main() { code := []byte("const foo = 1 + 2") parser := tree_sitter.NewParser() defer parser.Close() parser.SetLanguage(tree_sitter.NewLanguage(tree_sitter_javascript.Language())) tree := parser.Parse(code, nil) defer tree.Close() root := tree.RootNode() fmt.Println(root.ToSexp()) } // Output: // (program (lexical_declaration (variable_declarator name: (identifier) value: (binary_expression left: (number) right: (number))))) Run it with: go run main.go This code parses a simple JavaScript snippet (const foo = 1 + 2) and prints the AST in S-expression format. The output is a nested structure representing the code’s syntax. Don’t worry about the S-expression yet—we’ll decode it soon. Key points: Initialize a Go module to manage dependencies. Install Tree-Sitter and a language grammar (like JavaScript). Use defer to clean up parser and tree resources. Parsing Code and Understanding the AST Now that we have a working setup, let’s dig into what the parser does. The code above takes a JavaScript snippet and turns it into an AST. The root.ToSexp() method gives us a string representation of the tree, but it’s dense. Let’s format it for clarity: ( program ( lexical_declaration ( variable_declarator name: (identifier) value: ( binary_expression left: (number) right: (number) ) ) ) ) This structure represents the JavaScript code const foo = 1 + 2. Here’s how it breaks down: Node Type Description Example in Code program Root of the AST Entire snippet lexical_declaration A const or let declaration const foo = ... variable_declarator A variable and its value foo = 1 + 2 identifier A variable name foo binary_expression An operation like addition 1 + 2 number A numeric literal 1 or 2 The AST tells us the hierarchical structure of the code. For example, variable_declarator has two children: name (the identifier foo) and value (the expression 1 + 2). The binary_expression node breaks down 1 + 2 into left (1) and right (2). Key points: ASTs are hierarchical: Each node represents a syntax construct. S-expressions are compact but need formatting to read easily. Nodes have types like identifier or number that map to code elements. Decoding S-Expression Patterns The S-expression output can feel cryptic, but it follows clear patterns. Let’s analyze the output to make it less intimidating. From the formatted S-expression, I noticed two recurring patterns: Parameter-Definition Pattern (parameter_name: definition): When you see a colon (:), the left side is a parameter name, and the right side is its definition. Example: name: (identifier) means the name parameter is defined as an identifier node (in our case, foo). Node-Children Pattern (name (...)): A node name followed by parentheses contains its children. Example: binary_expression (...) means binary_expression is a node with children like left and

May 7, 2025 - 18:55
 0
Tinkering with Tree-Sitter Using Go

Hi there! I'm Shrijith Venkatrama, founder of Hexmos. Right now, I’m building LiveAPI, a first of its kind tool for helping you automatically index API endpoints across all your repositories. LiveAPI helps you discover, understand and use APIs in large tech infrastructures with ease.

Tree-Sitter is a powerful parsing library that generates abstract syntax trees (ASTs) for code in various languages. It’s fast, incremental, and widely used in tools like Neovim and GitHub’s code search. If you’re a Go developer curious about code analysis, Tree-Sitter is a great tool to experiment with. In this post, I’ll walk you through setting up Tree-Sitter in Go, parsing JavaScript code, and exploring the resulting AST. We’ll dive into practical examples, break down the output, and uncover patterns to make sense of it all.

This isn’t a formal lecture—just me sharing my experiments, complete with code you can run and outputs you can expect. Let’s get started.

Setting Up Tree-Sitter in Your Go Project

To tinker with Tree-Sitter, you need a Go project and the Tree-Sitter library. Let’s set up a minimal project to parse some JavaScript code.

First, create a new directory and initialize it as a Go module:

mkdir gg && cd gg
go mod init github.com/yourusername/gg

Next, you’ll need the Tree-Sitter Go bindings and a language grammar (we’ll use JavaScript for this example). Install them with:

go get github.com/tree-sitter/go-tree-sitter
go get github.com/tree-sitter/tree-sitter-javascript/bindings/go

These packages give you the core Tree-Sitter library and the JavaScript grammar. The Tree-Sitter Go bindings are lightweight and easy to use, making setup a breeze.

Now, create a file called main.go with the following code to test the setup:

package main

import (
    "fmt"

    tree_sitter "github.com/tree-sitter/go-tree-sitter"
    tree_sitter_javascript "github.com/tree-sitter/tree-sitter-javascript/bindings/go"
)

func main() {
    code := []byte("const foo = 1 + 2")

    parser := tree_sitter.NewParser()
    defer parser.Close()
    parser.SetLanguage(tree_sitter.NewLanguage(tree_sitter_javascript.Language()))

    tree := parser.Parse(code, nil)
    defer tree.Close()

    root := tree.RootNode()
    fmt.Println(root.ToSexp())
}

// Output:
// (program (lexical_declaration (variable_declarator name: (identifier) value: (binary_expression left: (number) right: (number)))))

Run it with:

go run main.go

This code parses a simple JavaScript snippet (const foo = 1 + 2) and prints the AST in S-expression format. The output is a nested structure representing the code’s syntax. Don’t worry about the S-expression yet—we’ll decode it soon.

Key points:

  • Initialize a Go module to manage dependencies.
  • Install Tree-Sitter and a language grammar (like JavaScript).
  • Use defer to clean up parser and tree resources.

Parsing Code and Understanding the AST

Now that we have a working setup, let’s dig into what the parser does. The code above takes a JavaScript snippet and turns it into an AST. The root.ToSexp() method gives us a string representation of the tree, but it’s dense. Let’s format it for clarity:

(
  program (
    lexical_declaration (
      variable_declarator
        name: (identifier)
        value: (
          binary_expression
            left: (number)
            right: (number)
        )
    )
  )
)

This structure represents the JavaScript code const foo = 1 + 2. Here’s how it breaks down:

Node Type Description Example in Code
program Root of the AST Entire snippet
lexical_declaration A const or let declaration const foo = ...
variable_declarator A variable and its value foo = 1 + 2
identifier A variable name foo
binary_expression An operation like addition 1 + 2
number A numeric literal 1 or 2

The AST tells us the hierarchical structure of the code. For example, variable_declarator has two children: name (the identifier foo) and value (the expression 1 + 2). The binary_expression node breaks down 1 + 2 into left (1) and right (2).

Key points:

  • ASTs are hierarchical: Each node represents a syntax construct.
  • S-expressions are compact but need formatting to read easily.
  • Nodes have types like identifier or number that map to code elements.

Decoding S-Expression Patterns

The S-expression output can feel cryptic, but it follows clear patterns. Let’s analyze the output to make it less intimidating.

From the formatted S-expression, I noticed two recurring patterns:

  1. Parameter-Definition Pattern (parameter_name: definition):

    • When you see a colon (:), the left side is a parameter name, and the right side is its definition.
    • Example: name: (identifier) means the name parameter is defined as an identifier node (in our case, foo).
  2. Node-Children Pattern (name (...)):

    • A node name followed by parentheses contains its children.
    • Example: binary_expression (...) means binary_expression is a node with children like left and right.

To verify these patterns, let’s parse a slightly more complex JavaScript snippet. Update main.go to:

package main

import (
    "fmt"

    tree_sitter "github.com/tree-sitter/go-tree-sitter"
    tree_sitter_javascript "github.com/tree-sitter/tree-sitter-javascript/bindings/go"
)

func main() {
    code := []byte("function add(a, b) { return a + b; }")

    parser := tree_sitter.NewParser()
    defer parser.Close()
    parser.SetLanguage(tree_sitter.NewLanguage(tree_sitter_javascript.Language()))

    tree := parser.Parse(code, nil)
    defer tree.Close()

    root := tree.RootNode()
    fmt.Println(root.ToSexp())
}

// Output:
// (program (function_declaration name: (identifier) parameters: (formal_parameters (identifier) (identifier)) body: (statement_block (return_statement (binary_expression left: (identifier) right: (identifier))))))

Run it with go run main.go and format the output:

(
  program (
    function_declaration
      name: (identifier)
      parameters: (
        formal_parameters
          (identifier)
          (identifier)
      )
      body: (
        statement_block
          (
            return_statement
              (
                binary_expression
                  left: (identifier)
                  right: (identifier)
              )
          )
      )
  )
)

This represents function add(a, b) { return a + b; }. Notice the patterns:

  • name: (identifier) for the function name (add).
  • parameters: (formal_parameters ...) for the parameter list (a, b).
  • body: (statement_block ...) for the function body.

These patterns make S-expressions predictable once you get the hang of them.

Key points:

  • Colons separate parameters from definitions.
  • Parentheses group a node’s children.
  • Practice with different code snippets to spot patterns.

Walking the AST for Fun and Profit

Parsing is cool, but the real power comes from traversing the AST to extract information. Let’s modify our code to walk the tree and print node types and their content. This is useful for tasks like code analysis or linting.

Here’s an example that traverses the AST and prints each node’s type and text:

package main

import (
    "fmt"

    tree_sitter "github.com/tree-sitter/go-tree-sitter"
    tree_sitter_javascript "github.com/tree-sitter/tree-sitter-javascript/bindings/go"
)

func main() {
    code := []byte("const foo = 1 + 2")

    parser := tree_sitter.NewParser()
    defer parser.Close()
    parser.SetLanguage(tree_sitter.NewLanguage(tree_sitter_javascript.Language()))

    tree := parser.Parse(code, nil)
    defer tree.Close()

    root := tree.RootNode()
    traverse(root, code, 0)
}

func traverse(node *tree_sitter.Node, code []byte, depth int) {
    indent := strings.Repeat("  ", depth)
    nodeType := node.Type()
    nodeText := string(code[node.StartByte():node.EndByte()])
    fmt.Printf("%s%s: %s\n", indent, nodeType, nodeText)

    for i := 0; i < int(node.NamedChildCount()); i++ {
        child := node.NamedChild(i)
        traverse(child, code, depth+1)
    }
}

// Output:
// program: const foo = 1 + 2
//   lexical_declaration: const foo = 1 + 2
//     variable_declarator: foo = 1 + 2
//       identifier: foo
//       binary_expression: 1 + 2
//         number: 1
//         number: 2

Run it with go run main.go. This code uses a recursive traverse function to visit each named node, printing its type and the corresponding code snippet. The depth parameter adds indentation for readability.

This traversal is handy for understanding the AST’s structure or extracting specific nodes (e.g., finding all identifier nodes for variable names).

Key points:

  • Use NamedChild to iterate over significant nodes (ignoring punctuation like =).
  • Access node text with StartByte and EndByte.
  • Traversal is recursive, so handle depth to avoid stack overflows on large trees.

Where to Go Next with Tree-Sitter and Go

Tree-Sitter’s power lies in its flexibility. Here are some ideas to keep tinkering:

  • Try other languages: Tree-Sitter supports many grammars (e.g., Python, Rust). Install their bindings (like tree-sitter-python) and swap the language in SetLanguage.
  • Build a linter: Traverse the AST to enforce coding rules, like checking for unused variables.
  • Integrate with tools: Use Tree-Sitter in a CLI tool or editor plugin for real-time code analysis.
  • Explore queries: Tree-Sitter’s query language lets you search ASTs for patterns (e.g., find all function declarations). Check the Tree-Sitter docs for details.

To experiment further, try parsing larger codebases or combining Tree-Sitter with other Go libraries. For example, you could pair it with go-git to analyze code in repositories.

Key points:

  • Tree-Sitter is versatile for parsing any language with a grammar.
  • AST traversal enables linting, refactoring, or code metrics.
  • Keep experimenting with small projects to master Tree-Sitter’s API.

This journey into Tree-Sitter with Go has been a fun way to understand code parsing. The examples here are just the start—play with different snippets, explore the AST, and build something cool. If you hit snags, the Tree-Sitter community and docs are great resources. Happy coding!