Decoding Tree-sitter Playground Output For Fun

Tree-sitter is a powerful parser generator that turns your code into a structured tree, and its Playground (try here) lets you see that tree in action. But when you paste code into the Playground and get a wall of output like module [0, 0] - [2, 0], it can feel like deciphering alien hieroglyphs. Let’s make sense of it with a simple Python example, break it down step-by-step, and build intuition for what’s happening under the hood. This guide is for developers who want to grok Tree-sitter’s output without drowning in jargon. We’ll use this Python input: print("hello world") print("bye world") Input: And its Tree-sitter Playground output: module [0, 0] - [2, 0] expression_statement [0, 0] - [0, 20] call [0, 0] - [0, 20] function: identifier [0, 0] - [0, 5] arguments: argument_list [0, 5] - [0, 20] string [0, 6] - [0, 19] string_start [0, 6] - [0, 7] string_content [0, 7] - [0, 18] string_end [0, 18] - [0, 19] expression_statement [1, 0] - [1, 18] call [1, 0] - [1, 18] function: identifier [1, 0] - [1, 5] arguments: argument_list [1, 5] - [1, 18] string [1, 6] - [1, 17] string_start [1, 6] - [1, 7] string_content [1, 7] - [1, 16] string_end [1, 16] - [1, 17] Let’s dive in and unpack this output to make it less intimidating and more actionable. What’s a Tree-sitter Parse Tree, Anyway? Tree-sitter breaks your code into a syntax tree, where every piece—functions, strings, even quotes—becomes a node. The Playground shows this tree as a text-based hierarchy. Each line in the output represents a node, with: Node type: Like module, call, or string. Range: [start_line, start_column] - [end_line, end_column], showing where the node begins and ends in the code. Indentation: Indicates parent-child relationships. More indent means a child node. Think of it like a file explorer: the module is the root folder, expression_statement is a subfolder, and string_content is a file deep inside. Our goal is to map this output back to the Python code and understand why it’s structured this way. For our example, the top-level module node contains two expression_statement nodes (one for each print line). Each expression_statement has a call node, which breaks down into the function name (identifier) and arguments (argument_list). This hierarchy is the key to interpreting the output. Decoding the Node Ranges: Line and Column Magic Every node comes with a range like [0, 0] - [0, 20]. Here’s how to read it: First pair [line, column]: Where the node starts. Second pair [line, column]: Where the node ends (exclusive, meaning “up to but not including”). Lines and columns are zero-based. Line 0 is the first line, column 0 is the first character. Let’s map the first expression_statement from our output: expression_statement [0, 0] - [0, 20] This covers print("hello world"). Count the characters: # Line 0: print("hello world") # 01234567890123456789 # Length: 20 characters Starts at [0, 0] (beginning of the line). Ends at [0, 20] (just after the closing parenthesis). The child node call [0, 0] - [0, 20] spans the same range because the entire expression is a function call. But its children get more specific: function: identifier [0, 0] - [0, 5]: The print keyword (columns 0 to 4, ending at 5). arguments: argument_list [0, 5] - [0, 20]: From the opening ( to the closing ). Here’s a table to visualize the first call node’s breakdown: Node Type Range Code Snippet call [0, 0] - [0, 20] print("hello world") identifier [0, 0] - [0, 5] print argument_list [0, 5] - [0, 20] ("hello world") string [0, 6] - [0, 19] "hello world" string_start [0, 6] - [0, 7] " string_content [0, 7] - [0, 18] hello world string_end [0, 18] - [0, 19] " This table shows how Tree-sitter slices the code into precise pieces. Try this in the Playground yourself to see how ranges shift with different code. Why So Many String Nodes? Understanding Granularity Notice how the string "hello world" is split into string, string_start, string_content, and string_end? This granularity is Tree-sitter’s strength. It doesn’t just see "hello world" as one blob—it breaks it into: string: The entire thing, including quotes. string_start: The opening quote. string_content: The actual text. string_end: The closing quote. Why? Because tools using Tree-sitter (like code editors or linters) might need to manipulate specific parts. For example, a syntax highlighter could style the quotes differently from the content. Let’s look at the string for "hello world": string [0, 6] - [0, 19] string_start [0, 6] - [0, 7] string_content [0, 7] - [0, 18] string_end [0, 18] - [0, 19] Map it to the code: # Line 0: print("hello world") # ^ start (col 6) # ^ content starts (col 7) # ^ content

May 9, 2025 - 18:56
 0
Decoding Tree-sitter Playground Output For Fun

Tree-sitter is a powerful parser generator that turns your code into a structured tree, and its Playground (try here) lets you see that tree in action.

But when you paste code into the Playground and get a wall of output like module [0, 0] - [2, 0], it can feel like deciphering alien hieroglyphs.

Let’s make sense of it with a simple Python example, break it down step-by-step, and build intuition for what’s happening under the hood. This guide is for developers who want to grok Tree-sitter’s output without drowning in jargon.

We’ll use this Python input:

print("hello world")
print("bye world")

Input:

input

And its Tree-sitter Playground output:

module [0, 0] - [2, 0]
  expression_statement [0, 0] - [0, 20]
    call [0, 0] - [0, 20]
      function: identifier [0, 0] - [0, 5]
      arguments: argument_list [0, 5] - [0, 20]
        string [0, 6] - [0, 19]
          string_start [0, 6] - [0, 7]
          string_content [0, 7] - [0, 18]
          string_end [0, 18] - [0, 19]
  expression_statement [1, 0] - [1, 18]
    call [1, 0] - [1, 18]
      function: identifier [1, 0] - [1, 5]
      arguments: argument_list [1, 5] - [1, 18]
        string [1, 6] - [1, 17]
          string_start [1, 6] - [1, 7]
          string_content [1, 7] - [1, 16]
          string_end [1, 16] - [1, 17]

Let’s dive in and unpack this output to make it less intimidating and more actionable.

What’s a Tree-sitter Parse Tree, Anyway?

Tree-sitter breaks your code into a syntax tree, where every piece—functions, strings, even quotes—becomes a node. The Playground shows this tree as a text-based hierarchy. Each line in the output represents a node, with:

  • Node type: Like module, call, or string.
  • Range: [start_line, start_column] - [end_line, end_column], showing where the node begins and ends in the code.
  • Indentation: Indicates parent-child relationships. More indent means a child node.

Think of it like a file explorer: the module is the root folder, expression_statement is a subfolder, and string_content is a file deep inside. Our goal is to map this output back to the Python code and understand why it’s structured this way.

For our example, the top-level module node contains two expression_statement nodes (one for each print line). Each expression_statement has a call node, which breaks down into the function name (identifier) and arguments (argument_list). This hierarchy is the key to interpreting the output.

Lines and Columns

Decoding the Node Ranges: Line and Column Magic

Every node comes with a range like [0, 0] - [0, 20]. Here’s how to read it:

  • First pair [line, column]: Where the node starts.
  • Second pair [line, column]: Where the node ends (exclusive, meaning “up to but not including”).
  • Lines and columns are zero-based. Line 0 is the first line, column 0 is the first character.

Let’s map the first expression_statement from our output:

expression_statement [0, 0] - [0, 20]

This covers print("hello world"). Count the characters:

# Line 0: print("hello world")
#        01234567890123456789
# Length: 20 characters
  • Starts at [0, 0] (beginning of the line).
  • Ends at [0, 20] (just after the closing parenthesis).

The child node call [0, 0] - [0, 20] spans the same range because the entire expression is a function call. But its children get more specific:

  • function: identifier [0, 0] - [0, 5]: The print keyword (columns 0 to 4, ending at 5).
  • arguments: argument_list [0, 5] - [0, 20]: From the opening ( to the closing ).

Here’s a table to visualize the first call node’s breakdown:

Node Type Range Code Snippet
call [0, 0] - [0, 20] print("hello world")
identifier [0, 0] - [0, 5] print
argument_list [0, 5] - [0, 20] ("hello world")
string [0, 6] - [0, 19] "hello world"
string_start [0, 6] - [0, 7] "
string_content [0, 7] - [0, 18] hello world
string_end [0, 18] - [0, 19] "

This table shows how Tree-sitter slices the code into precise pieces. Try this in the Playground yourself to see how ranges shift with different code.

Why So Many String Nodes? Understanding Granularity

Notice how the string "hello world" is split into string, string_start, string_content, and string_end? This granularity is Tree-sitter’s strength. It doesn’t just see "hello world" as one blob—it breaks it into:

  • string: The entire thing, including quotes.
  • string_start: The opening quote.
  • string_content: The actual text.
  • string_end: The closing quote.

Why? Because tools using Tree-sitter (like code editors or linters) might need to manipulate specific parts. For example, a syntax highlighter could style the quotes differently from the content.

Let’s look at the string for "hello world":

string [0, 6] - [0, 19]
  string_start [0, 6] - [0, 7]
  string_content [0, 7] - [0, 18]
  string_end [0, 18] - [0, 19]

Map it to the code:

# Line 0: print("hello world")
#               ^ start (col 6)
#                ^ content starts (col 7)
#                        ^ content ends (col 18)
#                         ^ end (col 19)

The string node spans from the opening quote (column 6) to the closing quote (column 19). The string_content is just hello world (columns 7 to 18). This level of detail lets Tree-sitter handle edge cases, like escaped quotes or multi-line strings.

Handling Multiple Statements: Spotting Patterns

The second expression_statement for print("bye world") follows the same structure:

expression_statement [1, 0] - [1, 18]
  call [1, 0] - [1, 18]
    function: identifier [1, 0] - [1, 5]
    arguments: argument_list [1, 5] - [1, 18]
      string [1, 6] - [1, 17]
        string_start [1, 6] - [1, 7]
        string_content [1, 7] - [1, 16]
        string_end [1, 16] - [1, 17]

Why is the end column [1, 18] instead of [1, 20] like the first line? Because "bye world" is shorter:

# Line 1: print("bye world")
#        01234567890123456
# Length: 18 characters

The pattern is identical: expression_statementcallidentifier + argument_liststring with its parts. Once you spot this, you can predict the structure for any simple Python print statement. For example, try this in the Playground:

print("test")

You’ll get:

module [0, 0] - [1, 0]
  expression_statement [0, 0] - [0, 12]
    call [0, 0] - [0, 12]
      function: identifier [0, 0] - [0, 5]
      arguments: argument_list [0, 5] - [0, 12]
        string [0, 6] - [0, 11]
          string_start [0, 6] - [0, 7]
          string_content [0, 7] - [0, 10]
          string_end [0, 10] - [0, 11]

This consistency is your friend. It means you can write tools that rely on Tree-sitter’s predictable output.

Practical Example: Parsing a More Complex Snippet

Let’s level up with a slightly more complex Python snippet to see how Tree-sitter handles nested structures. Here’s the code:

def greet(name):
    print("Hello, " + name)

Paste this into the Playground (try it here). You’ll get something like:

module [0, 0] - [2, 0]
  function_definition [0, 0] - [1, 23]
    name: identifier [0, 4] - [0, 9]
    parameters: parameter_list [0, 9] - [0, 15]
      identifier [0, 10] - [0, 14]
    body: block [1, 4] - [1, 23]
      expression_statement [1, 4] - [1, 23]
        call [1, 4] - [1, 23]
          function: identifier [1, 4] - [1, 9]
          arguments: argument_list [1, 9] - [1, 23]
            binary_operator [1, 10] - [1, 22]
              left: string [1, 10] - [1, 18]
                string_start [1, 10] - [1, 11]
                string_content [1, 11] - [1, 17]
                string_end [1, 17] - [1, 18]
              operator: + [1, 19] - [1, 20]
              right: identifier [1, 21] - [1, 25]

Key differences:

  • function_definition: Replaces expression_statement as the top-level child of module.
  • parameters: The (name) part is parsed as a parameter_list with an identifier.
  • binary_operator: The "Hello, " + name is a single argument, parsed as a binary_operator with left, operator, and right nodes.

This shows Tree-sitter’s ability to handle nested structures like function definitions and expressions. The ranges still follow the same logic, but the node types reflect Python’s syntax rules.

Where to Go From Here

Now that you can read Tree-sitter’s output, you’re ready to use it in real projects. Here are some practical next steps:

  • Experiment in the Playground: Try different Python snippets (loops, classes, etc.) to see how the tree changes. The Playground is your sandbox.
  • Build Tools: Use Tree-sitter in your projects with libraries like tree-sitter-python. For example, you could write a script to extract all function names from a file.
  • Debug Syntax Errors: The precise ranges in the output can help pinpoint syntax errors in your code or tools.
  • Visualize the Tree: Some tools, like Neovim with Tree-sitter integration, show the parse tree visually, which can reinforce your intuition.

To solidify your understanding, try parsing this snippet and predict the output before checking the Playground:

x = 42
print(x)

This will introduce an assignment node and reuse the call structure you’ve seen. The key is to practice mapping nodes to code until it feels second nature.

Tree-sitter’s output might look dense at first, but it’s just a map of your code’s structure. By breaking it down into ranges, node types, and hierarchies, you can turn that map into a tool for building better software. Keep experimenting, and you’ll be navigating parse trees like a pro.