Making Sense of tree-sitter's C API
Hi there! I'm Shrijith Venkatrama, founder of Hexmos. Right now, I’m building LiveAPI, a first of its kind tool for helping you automatically index API endpoints across all your repositories. LiveAPI helps you discover, understand and use APIs in large tech infrastructures with ease. Tree-sitter is a powerful parsing library that generates syntax trees for code, making it a go-to for tools like code editors and linters. Its C API is the backbone for integrating Tree-sitter into projects, but it can feel daunting with its many types and functions. This guide breaks down the Tree-sitter C API, focusing on practical usage with clear examples. We'll explore how to set up a parser, parse code, navigate syntax trees, and query them, all while keeping things developer-friendly. Why Tree-sitter's C API Matters The C API is the core interface for Tree-sitter, offering fine-grained control over parsing and syntax tree manipulation. It's used by language bindings (like Rust or Python) and directly in C/C++ projects for performance-critical applications. Understanding it helps you: Integrate Tree-sitter into custom tools. Optimize parsing for specific use cases. Debug issues when higher-level bindings fall short. The API is defined in tree_sitter/api.h (available on GitHub). It revolves around a few key concepts: parsers, trees, nodes, and queries. Let’s dive into the essentials. Setting Up a Parser To use Tree-sitter, you first need a parser. The TSParser struct is your entry point, and setting it up involves creating it and assigning a language. Key Functions Function Description ts_parser_new Creates a new parser. ts_parser_set_language Assigns a language to the parser. ts_parser_delete Frees the parser. Example: Initializing a Parser Here’s how to set up a parser for JavaScript using a hypothetical tree_sitter_javascript language (you’d typically get this from a compiled language module). #include #include // Assume tree_sitter_javascript is defined elsewhere extern const TSLanguage *tree_sitter_javascript(); int main() { // Create parser TSParser *parser = ts_parser_new(); // Set language const TSLanguage *lang = tree_sitter_javascript(); if (!ts_parser_set_language(parser, lang)) { fprintf(stderr, "Language version mismatch\n"); ts_parser_delete(parser); return 1; } // Clean up ts_parser_delete(parser); return 0; } Output: No output if successful; prints an error if the language version is incompatible. Notes: Language versioning is critical. The API supports languages with ABI versions between TREE_SITTER_MIN_COMPATIBLE_LANGUAGE_VERSION (13) and TREE_SITTER_LANGUAGE_VERSION (15). Always check the return value of ts_parser_set_language to catch version mismatches. Parsing Code into a Syntax Tree Once you have a parser, you can parse code to create a TSTree. The tree represents the code’s structure, with nodes for each syntactic element (e.g., functions, variables). Key Functions Function Description ts_parser_parse_string Parses a string into a syntax tree. ts_tree_root_node Gets the root node of the tree. ts_tree_delete Frees the tree. Example: Parsing JavaScript Code This example parses a simple JavaScript function and prints the root node’s type. #include #include #include extern const TSLanguage *tree_sitter_javascript(); int main() { TSParser *parser = ts_parser_new(); ts_parser_set_language(parser, tree_sitter_javascript()); const char *code = "function hello() { return 'world'; }"; TSTree *tree = ts_parser_parse_string( parser, NULL, // No old tree for first parse code, strlen(code) ); if (tree == NULL) { fprintf(stderr, "Parsing failed\n"); ts_parser_delete(parser); return 1; } TSNode root = ts_tree_root_node(tree); printf("Root node type: %s\n", ts_node_type(root)); ts_tree_delete(tree); ts_parser_delete(parser); return 0; } Output: Root node type: program Notes: The NULL old_tree parameter is used for initial parses. For incremental parsing (e.g., after code edits), pass the previous tree. The root node’s type (program) is language-specific, defined in the language’s grammar. Navigating the Syntax Tree The syntax tree is a hierarchy of TSNode objects, each representing a syntactic construct. You can traverse the tree to inspect nodes, their types, and their positions. Key Functions Function Description ts_node_child Gets a child node by index. ts_node_named_child Gets a named child (excludes anonymous nodes like string literals). ts_node_type Returns the node’s type as a string. ts_node_start_point Gets the node’s start position (row, column). Example: Traversing a Tree This code parses a JavaScript function and prints its

Hi there! I'm Shrijith Venkatrama, founder of Hexmos. Right now, I’m building LiveAPI, a first of its kind tool for helping you automatically index API endpoints across all your repositories. LiveAPI helps you discover, understand and use APIs in large tech infrastructures with ease.
Tree-sitter is a powerful parsing library that generates syntax trees for code, making it a go-to for tools like code editors and linters. Its C API is the backbone for integrating Tree-sitter into projects, but it can feel daunting with its many types and functions. This guide breaks down the Tree-sitter C API, focusing on practical usage with clear examples. We'll explore how to set up a parser, parse code, navigate syntax trees, and query them, all while keeping things developer-friendly.
Why Tree-sitter's C API Matters
The C API is the core interface for Tree-sitter, offering fine-grained control over parsing and syntax tree manipulation. It's used by language bindings (like Rust or Python) and directly in C/C++ projects for performance-critical applications. Understanding it helps you:
- Integrate Tree-sitter into custom tools.
- Optimize parsing for specific use cases.
- Debug issues when higher-level bindings fall short.
The API is defined in tree_sitter/api.h
(available on GitHub). It revolves around a few key concepts: parsers, trees, nodes, and queries. Let’s dive into the essentials.
Setting Up a Parser
To use Tree-sitter, you first need a parser. The TSParser
struct is your entry point, and setting it up involves creating it and assigning a language.
Key Functions
Function | Description |
---|---|
ts_parser_new |
Creates a new parser. |
ts_parser_set_language |
Assigns a language to the parser. |
ts_parser_delete |
Frees the parser. |
Example: Initializing a Parser
Here’s how to set up a parser for JavaScript using a hypothetical tree_sitter_javascript
language (you’d typically get this from a compiled language module).
#include
#include
// Assume tree_sitter_javascript is defined elsewhere
extern const TSLanguage *tree_sitter_javascript();
int main() {
// Create parser
TSParser *parser = ts_parser_new();
// Set language
const TSLanguage *lang = tree_sitter_javascript();
if (!ts_parser_set_language(parser, lang)) {
fprintf(stderr, "Language version mismatch\n");
ts_parser_delete(parser);
return 1;
}
// Clean up
ts_parser_delete(parser);
return 0;
}
Output: No output if successful; prints an error if the language version is incompatible.
Notes:
-
Language versioning is critical. The API supports languages with ABI versions between
TREE_SITTER_MIN_COMPATIBLE_LANGUAGE_VERSION
(13) andTREE_SITTER_LANGUAGE_VERSION
(15). - Always check the return value of
ts_parser_set_language
to catch version mismatches.
Parsing Code into a Syntax Tree
Once you have a parser, you can parse code to create a TSTree
. The tree represents the code’s structure, with nodes for each syntactic element (e.g., functions, variables).
Key Functions
Function | Description |
---|---|
ts_parser_parse_string |
Parses a string into a syntax tree. |
ts_tree_root_node |
Gets the root node of the tree. |
ts_tree_delete |
Frees the tree. |
Example: Parsing JavaScript Code
This example parses a simple JavaScript function and prints the root node’s type.
#include
#include
#include
extern const TSLanguage *tree_sitter_javascript();
int main() {
TSParser *parser = ts_parser_new();
ts_parser_set_language(parser, tree_sitter_javascript());
const char *code = "function hello() { return 'world'; }";
TSTree *tree = ts_parser_parse_string(
parser,
NULL, // No old tree for first parse
code,
strlen(code)
);
if (tree == NULL) {
fprintf(stderr, "Parsing failed\n");
ts_parser_delete(parser);
return 1;
}
TSNode root = ts_tree_root_node(tree);
printf("Root node type: %s\n", ts_node_type(root));
ts_tree_delete(tree);
ts_parser_delete(parser);
return 0;
}
Output:
Root node type: program
Notes:
- The
NULL
old_tree parameter is used for initial parses. For incremental parsing (e.g., after code edits), pass the previous tree. - The root node’s type (
program
) is language-specific, defined in the language’s grammar.
Navigating the Syntax Tree
The syntax tree is a hierarchy of TSNode
objects, each representing a syntactic construct. You can traverse the tree to inspect nodes, their types, and their positions.
Key Functions
Function | Description |
---|---|
ts_node_child |
Gets a child node by index. |
ts_node_named_child |
Gets a named child (excludes anonymous nodes like string literals). |
ts_node_type |
Returns the node’s type as a string. |
ts_node_start_point |
Gets the node’s start position (row, column). |
Example: Traversing a Tree
This code parses a JavaScript function and prints its named children.
#include
#include
#include
extern const TSLanguage *tree_sitter_javascript();
int main() {
TSParser *parser = ts_parser_new();
ts_parser_set_language(parser, tree_sitter_javascript());
const char *code = "function hello() { return 'world'; }";
TSTree *tree = ts_parser_parse_string(parser, NULL, code, strlen(code));
TSNode root = ts_tree_root_node(tree);
uint32_t child_count = ts_node_named_child_count(root);
printf("Named children of root (%s):\n", ts_node_type(root));
for (uint32_t i = 0; i < child_count; i++) {
TSNode child = ts_node_named_child(root, i);
TSPoint start = ts_node_start_point(child);
printf(" %u: %s at (%u, %u)\n", i, ts_node_type(child), start.row, start.column);
}
ts_tree_delete(tree);
ts_parser_delete(parser);
return 0;
}
Output:
Named children of root (program):
0: function_declaration at (0, 0)
Notes:
-
Named vs. anonymous nodes: Named nodes (e.g.,
function_declaration
) correspond to grammar rules, while anonymous nodes (e.g.,"("
) are literals. - Use
ts_node_start_point
andts_node_end_point
for precise code positions.
Using Tree Cursors for Efficient Traversal
For large trees, iterating with ts_node_child
can be slow. The TSTreeCursor
provides a more efficient way to traverse trees by maintaining state.
Key Functions
Function | Description |
---|---|
ts_tree_cursor_new |
Creates a cursor starting at a node. |
ts_tree_cursor_goto_first_child |
Moves to the first child. |
ts_tree_cursor_goto_next_sibling |
Moves to the next sibling. |
ts_tree_cursor_current_node |
Gets the current node. |
Example: Using a Tree Cursor
This example traverses the tree to find all named nodes.
#include
#include
#include
extern const TSLanguage *tree_sitter_javascript();
void traverse(TSNode node) {
TSTreeCursor cursor = ts_tree_cursor_new(node);
if (ts_tree_cursor_goto_first_child(&cursor)) {
do {
TSNode current = ts_tree_cursor_current_node(&cursor);
if (ts_node_is_named(current)) {
printf("Node: %s\n", ts_node_type(current));
traverse(current); // Recurse
}
} while (ts_tree_cursor_goto_next_sibling(&cursor));
}
ts_tree_cursor_delete(&cursor);
}
int main() {
TSParser *parser = ts_parser_new();
ts_parser_set_language(parser, tree_sitter_javascript());
const char *code = "function hello() { return 'world'; }";
TSTree *tree = ts_parser_parse_string(parser, NULL, code, strlen(code));
TSNode root = ts_tree_root_node(tree);
printf("Starting traversal:\n");
traverse(root);
ts_tree_delete(tree);
ts_parser_delete(parser);
return 0;
}
Output:
Starting traversal:
Node: function_declaration
Node: identifier
Node: formal_parameters
Node: statement_block
Node: return_statement
Node: string
Notes:
- Cursors are faster than repeated
ts_node_child
calls because they cache traversal state. - Always call
ts_tree_cursor_delete
to avoid memory leaks.
Querying the Syntax Tree
Queries let you search for patterns in the syntax tree, like finding all function declarations. The TSQuery
API uses S-expressions to define patterns.
Key Functions
Function | Description |
---|---|
ts_query_new |
Creates a query from an S-expression. |
ts_query_cursor_new |
Creates a cursor for executing queries. |
ts_query_cursor_exec |
Runs the query on a node. |
ts_query_cursor_next_match |
Gets the next match. |
Example: Finding Function Declarations
This code queries for function declarations and prints their names.
#include
#include
#include
extern const TSLanguage *tree_sitter_javascript();
int main() {
TSParser *parser = ts_parser_new();
ts_parser_set_language(parser, tree_sitter_javascript());
const char *code = "function hello() { return 'world'; }\nfunction bye() {}";
TSTree *tree = ts_parser_parse_string(parser, NULL, code, strlen(code));
// Create query
const char *query_str = "(function_declaration name: (identifier) @func-name)";
uint32_t error_offset;
TSQueryError error_type;
TSQuery *query = ts_query_new(
tree_sitter_javascript(),
query_str,
strlen(query_str),
&error_offset,
&error_type
);
if (!query) {
fprintf(stderr, "Query error at offset %u\n", error_offset);
return 1;
}
// Execute query
TSQueryCursor *cursor = ts_query_cursor_new();
ts_query_cursor_exec(cursor, query, ts_tree_root_node(tree));
TSQueryMatch match;
while (ts_query_cursor_next_match(cursor, &match)) {
for (uint16_t i = 0; i < match.capture_count; i++) {
TSQueryCapture capture = match.captures[i];
char *name = ts_node_string(capture.node);
printf("Found function: %s\n", name);
free(name);
}
}
ts_query_cursor_delete(cursor);
ts_query_delete(query);
ts_tree_delete(tree);
ts_parser_delete(parser);
return 0;
}
Output:
Found function: (identifier "hello")
Found function: (identifier "bye")
Notes:
- The query
(function_declaration name: (identifier) @func-name)
captures theidentifier
node asfunc-name
. - Check
ts_query_new
for errors, as invalid S-expressions will returnNULL
. - Learn more about query syntax in the Tree-sitter documentation.
Handling Code Edits
Tree-sitter supports incremental parsing, which is crucial for real-time applications like editors. You edit the tree to reflect code changes and reparse only the affected parts.
Key Functions
Function | Description |
---|---|
ts_tree_edit |
Updates the tree for an edit. |
ts_parser_parse |
Reparses with the old tree for efficiency. |
Example: Updating a Tree
This code edits a JavaScript function and re-parses it.
#include
#include
#include
extern const TSLanguage *tree_sitter_javascript();
int main() {
TSParser *parser = ts_parser_new();
ts_parser_set_language(parser, tree_sitter_javascript());
const char *old_code = "function hello() { return 'world'; }";
TSTree *tree = ts_parser_parse_string(parser, NULL, old_code, strlen(old_code));
// Simulate edit: change "hello" to "greet"
TSInputEdit edit = {
.start_byte = 9, // Start of "hello"
.old_end_byte = 14, // End of "hello"
.new_end_byte = 14, // End of "greet"
.start_point = {0, 9},
.old_end_point = {0, 14},
.new_end_point = {0, 14}
};
ts_tree_edit(tree, &edit);
const char *new_code = "function greet() { return 'world'; }";
TSTree *new_tree = ts_parser_parse_string(parser, tree, new_code, strlen(new_code));
TSNode root = ts_tree_root_node(new_tree);
TSNode func = ts_node_named_child(root, 0);
char *func_name = ts_node_string(ts_node_named_child(func, 0));
printf("Updated function name: %s\n", func_name);
free(func_name);
ts_tree_delete(tree);
ts_tree_delete(new_tree);
ts_parser_delete(parser);
return 0;
}
Output:
Updated function name: (identifier "greet")
Notes:
- The
TSInputEdit
struct requires precise byte and point offsets, which you’d typically compute from an editor’s change events. - Incremental parsing is much faster than re-parsing from scratch.
Debugging and Logging
Tree-sitter provides tools to debug parsing, like logging and generating DOT graphs for visualization.
Key Functions
Function | Description |
---|---|
ts_parser_set_logger |
Sets a callback for parse/lex logs. |
ts_parser_print_dot_graphs |
Outputs DOT graphs to a file descriptor. |
Example: Adding a Logger
This code logs parsing events to stderr.
#include
#include
#include
extern const TSLanguage *tree_sitter_javascript();
void log_callback(void *payload, TSLogType type, const char *msg) {
fprintf(stderr, "[%s] %s\n", type == TSLogTypeParse ? "PARSE" : "LEX", msg);
}
int main() {
TSParser *parser = ts_parser_new();
ts_parser_set_language(parser, tree_sitter_javascript());
TSLogger logger = { .payload = NULL, .log = log_callback };
ts_parser_set_logger(parser, logger);
const char *code = "function hello() { return 'world'; }";
TSTree *tree = ts_parser_parse_string(parser, NULL, code, strlen(code));
ts_tree_delete(tree);
ts_parser_delete(parser);
return 0;
}
Output (example, varies by language):
[PARSE] parsing rule: program
[LEX] token: function
...
Notes:
- Use
ts_parser_print_dot_graphs
with a file descriptor to visualize trees (pipe todot -Tsvg
for SVG output). - Logging is verbose but invaluable for debugging grammar issues.
Practical Tips for Using the C API
To wrap up, here are actionable tips for working with Tree-sitter’s C API:
- Start small: Begin with simple parsing and traversal before tackling queries or incremental parsing.
-
Check return values: Functions like
ts_parser_set_language
andts_query_new
can fail silently if not checked. - Use cursors for traversal: They’re faster and cleaner than manual node iteration for large trees.
- Leverage incremental parsing: For real-time applications, always edit and reuse trees to save time.
- Debug with logs and graphs: Enable logging or DOT output to understand parsing issues.
-
Read the source: The
api.h
file is well-documented and the ultimate reference.
The C API is low-level but gives you total control over Tree-sitter’s capabilities. Whether you’re building a code editor, linter, or custom tool, mastering it unlocks powerful parsing features. Experiment with the examples, tweak them for your language, and you’ll be parsing like a pro in no time.