Abstract Syntax Tree (AST) Deep Dive: From Theory to Practical Compiler Implementation
Introduction In programming, parsing and transforming source code is a fundamental process. Whether it's syntax checking, formatting, or advanced operations like code minification and transpilation, a deep understanding of code structure is essential. At the heart of this process lies the Abstract Syntax Tree (AST)—a powerful tool for representing code in a structured, tree-like format. This article will guide you through the core concepts of AST and demonstrate its practical application by building a mini-compiler from scratch. By the end, you’ll learn: What is an AST? How does it represent code syntax? How to implement a compiler step-by-step, from parsing source code to generating output. Real-world applications of AST, including code transformation and syntax validation. 1. What is an AST? An Abstract Syntax Tree (AST) is a tree representation of source code where each node corresponds to a syntactic unit (e.g., variables, functions, operators). Its primary role is to convert raw text-based code into structured data, enabling programmatic analysis and manipulation. 2. Building a Mini-Compiler We’ll implement a compiler that transforms LISP-style function calls into C-style syntax: LISP: (add 2 (subtract 4 2)) C: add(2, subtract(4, 2)) 2.1. Compiler Workflow Parsing: Lexical Analysis: Convert source code into tokens. Syntax Analysis: Convert tokens into an AST. Transformation: Modify the AST (e.g., for language conversion). Code Generation: Generate new code from the transformed AST. 2.2. Lexical Analysis (Tokenizer) Convert source code into tokens: function tokenizer(input) { let current = 0; let tokens = []; while (current {};`; const ast = parser.parse(code); traverse.default(ast, { Identifier(path) { if (path.node.name === "hello") path.node.name = "world"; }, }); const result = generator.default(ast, {}, code); console.log(result.code); // Output: const world = () => {}; Conclusion ASTs are foundational to modern programming tools. By understanding them, you gain the ability to: Build custom compilers/transpilers. Analyze and refactor code programmatically. Create domain-specific languages (DSLs). For further reading, explore: Babel Handbook ESTree Specification AST Explorer Original article: https://webfem.com/post/ast. Translated to English.

Introduction
In programming, parsing and transforming source code is a fundamental process. Whether it's syntax checking, formatting, or advanced operations like code minification and transpilation, a deep understanding of code structure is essential. At the heart of this process lies the Abstract Syntax Tree (AST)—a powerful tool for representing code in a structured, tree-like format.
This article will guide you through the core concepts of AST and demonstrate its practical application by building a mini-compiler from scratch. By the end, you’ll learn:
- What is an AST? How does it represent code syntax?
- How to implement a compiler step-by-step, from parsing source code to generating output.
- Real-world applications of AST, including code transformation and syntax validation.
1. What is an AST?
An Abstract Syntax Tree (AST) is a tree representation of source code where each node corresponds to a syntactic unit (e.g., variables, functions, operators). Its primary role is to convert raw text-based code into structured data, enabling programmatic analysis and manipulation.
2. Building a Mini-Compiler
We’ll implement a compiler that transforms LISP-style function calls into C-style syntax:
LISP: (add 2 (subtract 4 2))
C: add(2, subtract(4, 2))
2.1. Compiler Workflow
-
Parsing:
- Lexical Analysis: Convert source code into tokens.
- Syntax Analysis: Convert tokens into an AST.
- Transformation: Modify the AST (e.g., for language conversion).
- Code Generation: Generate new code from the transformed AST.
2.2. Lexical Analysis (Tokenizer)
Convert source code into tokens:
function tokenizer(input) {
let current = 0;
let tokens = [];
while (current < input.length) {
let char = input[current];
if (char === '(') tokens.push({ type: 'paren', value: '(' });
else if (char === ')') tokens.push({ type: 'paren', value: ')' });
else if (/\s/.test(char)) current++; // Skip whitespace
else if (/[0-9]/.test(char)) {
let value = '';
while (/[0-9]/.test(char)) {
value += char;
char = input[++current];
}
tokens.push({ type: 'number', value });
}
else if (/[a-z]/i.test(char)) {
let value = '';
while (/[a-z]/i.test(char)) {
value += char;
char = input[++current];
}
tokens.push({ type: 'name', value });
}
else throw new TypeError(`Unknown character: ${char}`);
}
return tokens;
}
Output:
[
{ type: 'paren', value: '(' },
{ type: 'name', value: 'add' },
{ type: 'number', value: '2' },
{ type: 'paren', value: '(' },
{ type: 'name', value: 'subtract' },
{ type: 'number', value: '4' },
{ type: 'number', value: '2' },
{ type: 'paren', value: ')' },
{ type: 'paren', value: ')' },
]
2.3. Syntax Analysis (Parser)
Convert tokens into an AST:
function parser(tokens) {
let current = 0;
function walk() {
let token = tokens[current];
if (token.type === 'number') {
current++;
return { type: 'NumberLiteral', value: token.value };
}
if (token.type === 'paren' && token.value === '(') {
token = tokens[++current];
let node = { type: 'CallExpression', name: token.value, params: [] };
token = tokens[++current];
while (token.type !== 'paren' || token.value !== ')') {
node.params.push(walk());
token = tokens[current];
}
current++;
return node;
}
throw new TypeError(`Unknown token: ${token.type}`);
}
let ast = { type: 'Program', body: [] };
while (current < tokens.length) ast.body.push(walk());
return ast;
}
AST Output:
{
type: 'Program',
body: [{
type: 'CallExpression',
name: 'add',
params: [
{ type: 'NumberLiteral', value: '2' },
{
type: 'CallExpression',
name: 'subtract',
params: [
{ type: 'NumberLiteral', value: '4' },
{ type: 'NumberLiteral', value: '2' },
],
},
],
}],
}
2.4. Transformation
Modify the AST to match the target output structure:
function transformer(ast) {
let newAst = { type: 'Program', body: [] };
ast._context = newAst.body;
traverser(ast, {
NumberLiteral: {
enter(node, parent) {
parent._context.push({ type: 'NumberLiteral', value: node.value });
},
},
CallExpression: {
enter(node, parent) {
let expression = {
type: 'CallExpression',
callee: { type: 'Identifier', name: node.name },
arguments: [],
};
node._context = expression.arguments;
if (parent.type !== 'CallExpression') {
expression = {
type: 'ExpressionStatement',
expression: expression,
};
}
parent._context.push(expression);
},
},
});
return newAst;
}
Transformed AST:
{
type: 'Program',
body: [{
type: 'ExpressionStatement',
expression: {
type: 'CallExpression',
callee: { type: 'Identifier', name: 'add' },
arguments: [
{ type: 'NumberLiteral', value: '2' },
{
type: 'CallExpression',
callee: { type: 'Identifier', name: 'subtract' },
arguments: [
{ type: 'NumberLiteral', value: '4' },
{ type: 'NumberLiteral', value: '2' },
],
},
],
},
}],
}
2.5. Code Generation
Convert the transformed AST into C-style code:
function codeGenerator(node) {
switch (node.type) {
case 'Program': return node.body.map(codeGenerator).join('\n');
case 'ExpressionStatement': return `${codeGenerator(node.expression)};`;
case 'CallExpression':
return `${codeGenerator(node.callee)}(${node.arguments.map(codeGenerator).join(', ')})`;
case 'Identifier': return node.name;
case 'NumberLiteral': return node.value;
default: throw new TypeError(`Unknown node type: ${node.type}`);
}
}
Final Output:
add(2, subtract(4, 2));
3. Real-World Applications with Babel
Babel, a popular JavaScript compiler, uses AST for:
- Transpiling modern JS (e.g., ES6+ → ES5).
- Code optimization (e.g., dead code elimination).
- Custom transformations (e.g., JSX → JS).
Example: Renaming a Function with Babel
const parser = require("@babel/parser");
const traverse = require("@babel/traverse");
const generator = require("@babel/generator");
const code = `const hello = () => {};`;
const ast = parser.parse(code);
traverse.default(ast, {
Identifier(path) {
if (path.node.name === "hello") path.node.name = "world";
},
});
const result = generator.default(ast, {}, code);
console.log(result.code); // Output: const world = () => {};
Conclusion
ASTs are foundational to modern programming tools. By understanding them, you gain the ability to:
- Build custom compilers/transpilers.
- Analyze and refactor code programmatically.
- Create domain-specific languages (DSLs).
For further reading, explore:
Original article: https://webfem.com/post/ast. Translated to English.