Implementing Template Engine from Scratch (Like Jinja2 or Django Templates)

Leapcell: The Best of Serverless Web Hosting Implementation and Principle Analysis of a Simple Template Engine We will start writing a simple template engine and deeply explore its underlying implementation mechanism. Language Design The design of this template language is extremely basic, mainly using two types of tags: variable tags and block tags. Variable Tags Variable tags use {{ and }} as identifiers. Here is an example code: // Variables use `{{` and `}}` as identifiers {{template_variable}} Block Tags Block tags use {% and %} as identifiers. Most blocks require a closing tag {% end %} to end. The following is an example: // Blocks use `{%` and `%}` as identifiers {% each item_list %} {{current_item}} {% end %} This template engine can handle basic loop and conditional statements, and also supports calling callable objects within blocks. It is very convenient to call any Python function in the template. Loop Structure The loop structure can be used to iterate over collections or iterable objects. The example code is as follows: // Iterate over the people collection {% each person_list %} {{current_person.name}} {% end %} // Iterate over the [1, 2, 3] list {% each [1, 2, 3] %} {{current_num}} {% end %} // Iterate over the records collection {% each record_list %} {{..outer_name}} {% end %} In the above examples, person_list and the like are collections, and current_person and the like point to the current iterated element. A path separated by dots will be parsed as a dictionary attribute, and .. can be used to access objects in the outer context. Conditional Statements The logic of conditional statements is relatively intuitive. This language supports if and else structures, as well as operators such as ==, =, !=, is, . The example is as follows: // Output different content according to the value of num {% if num > 5 %} more than 5 {% else %} less than or equal to 5 {% end %} Calling Blocks Callable objects can be passed through the template context and called using ordinary positional arguments or named arguments. Calling blocks do not require the use of end to close. The examples are as follows: // Using ordinary arguments {% call format_date date_created %} // Using named arguments {% call log_message 'here' verbosity='debug' %} Compilation Principle and Process Step 1: Template Tokenization (tokenize) Principle Template tokenization is the starting step of compilation, and its core goal is to divide the template content into independent fragments. These fragments can be ordinary HTML text, or variable tags or block tags defined in the template. Mathematically, this is similar to splitting a complex string, breaking it into multiple substrings according to specific rules. Implementation Use regular expressions and the split() function to complete the text splitting. Here is the specific code example: import re # Define the start and end identifiers of variable tags VAR_TOKEN_START = '{{' VAR_TOKEN_END = '}}' # Define the start and end identifiers of block tags BLOCK_TOKEN_START = '{%' BLOCK_TOKEN_END = '%}' # Compile the regular expression for matching variable tags or block tags TOK_REGEX = re.compile(r"(%s.*?%s|%s.*?%s)" % ( VAR_TOKEN_START, VAR_TOKEN_END, BLOCK_TOKEN_START, BLOCK_TOKEN_END )) The meaning of the TOK_REGEX regular expression is to match variable tags or block tags to achieve the splitting of the text. The outermost parentheses of the expression are used to capture the matched text, and ? represents a non-greedy match, ensuring that the regular expression stops at the first match. The example is as follows: # Actually show the splitting effect of the regular expression >>> TOK_REGEX.split('{% each vars %}{{it}}{% endeach %}') ['{% each vars %}', '', '{{it}}', '', '{% endeach %}'] Subsequently, each fragment is encapsulated into a Fragment object, which contains the type of the fragment and can be used as a parameter for the compilation function. There are four types of fragments in total: # Define fragment type constants VAR_FRAGMENT = 0 OPEN_BLOCK_FRAGMENT = 1 CLOSE_BLOCK_FRAGMENT = 2 TEXT_FRAGMENT = 3 Step 2: Building an Abstract Syntax Tree (AST) Principle An Abstract Syntax Tree (AST) is a data structure that represents the source code in a structured way, presenting the syntactic structure of the code in the form of a tree. In template compilation, the purpose of building an AST is to organize the fragments obtained from tokenization into a hierarchical structure, facilitating subsequent processing and rendering. Mathematically, this is similar to building a tree diagram, where each node represents a syntactic unit, and the relationships between nodes reflect the logical structure of the code. Implementati

Apr 16, 2025 - 13:18
 0
Implementing Template Engine from Scratch (Like Jinja2 or Django Templates)

Image description

Leapcell: The Best of Serverless Web Hosting

Implementation and Principle Analysis of a Simple Template Engine

We will start writing a simple template engine and deeply explore its underlying implementation mechanism.

Language Design

The design of this template language is extremely basic, mainly using two types of tags: variable tags and block tags.

Variable Tags

Variable tags use {{ and }} as identifiers. Here is an example code:

// Variables use `{{` and `}}` as identifiers
<div>{{template_variable}}</div>

Block Tags

Block tags use {% and %} as identifiers. Most blocks require a closing tag {% end %} to end. The following is an example:

// Blocks use `{%` and `%}` as identifiers
{% each item_list %}
    <div>{{current_item}}</div>
{% end %}

This template engine can handle basic loop and conditional statements, and also supports calling callable objects within blocks. It is very convenient to call any Python function in the template.

Loop Structure

The loop structure can be used to iterate over collections or iterable objects. The example code is as follows:

// Iterate over the people collection
{% each person_list %}
    <div>{{current_person.name}}</div>
{% end %}

// Iterate over the [1, 2, 3] list
{% each [1, 2, 3] %}
    <div>{{current_num}}</div>
{% end %}

// Iterate over the records collection
{% each record_list %}
    <div>{{..outer_name}}</div>
{% end %}

In the above examples, person_list and the like are collections, and current_person and the like point to the current iterated element. A path separated by dots will be parsed as a dictionary attribute, and .. can be used to access objects in the outer context.

Conditional Statements

The logic of conditional statements is relatively intuitive. This language supports if and else structures, as well as operators such as ==, <=, >=, !=, is, <, >. The example is as follows:

// Output different content according to the value of num
{% if num > 5 %}
    <div>more than 5</div>
{% else %}
    <div>less than or equal to 5</div>
{% end %}

Calling Blocks

Callable objects can be passed through the template context and called using ordinary positional arguments or named arguments. Calling blocks do not require the use of end to close. The examples are as follows:

// Using ordinary arguments
<div class='date'>{% call format_date date_created %}</div>
// Using named arguments
<div>{% call log_message 'here' verbosity='debug' %}</div>

Compilation Principle and Process

Step 1: Template Tokenization (tokenize)

Principle

Template tokenization is the starting step of compilation, and its core goal is to divide the template content into independent fragments. These fragments can be ordinary HTML text, or variable tags or block tags defined in the template. Mathematically, this is similar to splitting a complex string, breaking it into multiple substrings according to specific rules.

Implementation

Use regular expressions and the split() function to complete the text splitting. Here is the specific code example:

import re

# Define the start and end identifiers of variable tags
VAR_TOKEN_START = '{{'
VAR_TOKEN_END = '}}'
# Define the start and end identifiers of block tags
BLOCK_TOKEN_START = '{%'
BLOCK_TOKEN_END = '%}'
# Compile the regular expression for matching variable tags or block tags
TOK_REGEX = re.compile(r"(%s.*?%s|%s.*?%s)" % (
    VAR_TOKEN_START,
    VAR_TOKEN_END,
    BLOCK_TOKEN_START,
    BLOCK_TOKEN_END
))

The meaning of the TOK_REGEX regular expression is to match variable tags or block tags to achieve the splitting of the text. The outermost parentheses of the expression are used to capture the matched text, and ? represents a non-greedy match, ensuring that the regular expression stops at the first match. The example is as follows:

# Actually show the splitting effect of the regular expression
>>> TOK_REGEX.split('{% each vars %}{{it}}{% endeach %}')
['{% each vars %}', '', '{{it}}', '', '{% endeach %}']

Subsequently, each fragment is encapsulated into a Fragment object, which contains the type of the fragment and can be used as a parameter for the compilation function. There are four types of fragments in total:

# Define fragment type constants
VAR_FRAGMENT = 0
OPEN_BLOCK_FRAGMENT = 1
CLOSE_BLOCK_FRAGMENT = 2
TEXT_FRAGMENT = 3

Step 2: Building an Abstract Syntax Tree (AST)

Principle

An Abstract Syntax Tree (AST) is a data structure that represents the source code in a structured way, presenting the syntactic structure of the code in the form of a tree. In template compilation, the purpose of building an AST is to organize the fragments obtained from tokenization into a hierarchical structure, facilitating subsequent processing and rendering. Mathematically, this is similar to building a tree diagram, where each node represents a syntactic unit, and the relationships between nodes reflect the logical structure of the code.

Implementation

After completing the tokenization, iterate over each fragment and build the syntax tree. Use the Node class as the base class for tree nodes, and create subclasses for each node type. Each subclass must provide process_fragment and render methods. process_fragment is used to further parse the fragment content and store the required attributes in the Node object; the render method is responsible for converting the content of the corresponding node into HTML using the provided context.

Here is the definition of the Node base class:

class TemplateNode(object):
    def __init__(self, fragment=None):
        # Store child nodes
        self.children = []
        # Mark whether to create a new scope
        self.creates_scope = False
        # Process the fragment
        self.process_fragment(fragment)

    def process_fragment(self, fragment):
        pass

    def enter_scope(self):
        pass

    def render(self, context):
        pass

    def exit_scope(self):
        pass

    def render_children(self, context, children=None):
        if children is None:
            children = self.children
        def render_child(child):
            child_html = child.render(context)
            return '' if not child_html else str(child_html)
        return ''.join(map(render_child, children))

Here is the definition of the variable node:

class TemplateVariable(_Node):
    def process_fragment(self, fragment):
        # Store the variable name
        self.name = fragment

    def render(self, context):
        # Resolve the variable value in the context
        return resolve_in_context(self.name, context)

To determine the type of the Node and initialize the correct class, it is necessary to check the type and text of the fragment. Text and variable fragments can be directly converted into text nodes and variable nodes, while block fragments require additional processing, and their types are determined by the block commands. For example, {% each items %} is a block node of the each type.

A node can also create a scope. During compilation, we record the current scope and make the new node a child node of the current scope. Once the correct closing tag is encountered, the current scope is closed, and the scope is popped from the scope stack, making the top of the stack the new scope. The example code is as follows:

def template_compile(self):
    # Create the root node
    root = _Root()
    # Initialize the scope stack
    scope_stack = [root]
    for fragment in self.each_fragment():
        if not scope_stack:
            raise TemplateError('nesting issues')
        # Get the current scope
        parent_scope = scope_stack[-1]
        if fragment.type == CLOSE_BLOCK_FRAGMENT:
            # Exit the current scope
            parent_scope.exit_scope()
            # Pop the current scope
            scope_stack.pop()
            continue
        # Create a new node
        new_node = self.create_node(fragment)
        if new_node:
            # Add the new node to the child node list of the current scope
            parent_scope.children.append(new_node)
            if new_node.creates_scope:
                # Add the new node to the scope stack
                scope_stack.append(new_node)
                # Enter the new scope
                new_node.enter_scope()
    return root

Step 3: Rendering

Principle

Rendering is the process of converting the constructed AST into the final HTML output. In this process, according to the type of AST nodes and context information, the variables and logic in the template need to be replaced with actual values and content. Mathematically, this is similar to traversing and evaluating a tree structure, converting and combining the information of each node according to the rules.

Implementation

The last step is to render the AST into HTML. This step will visit all the nodes in the AST and call the render method using the context parameter passed to the template. During the rendering process, render will continuously resolve the values of context variables. You can use the ast.literal_eval function to safely execute strings containing Python code. The example code is as follows:

import ast

def eval_expression(expr):
    try:
        return 'literal', ast.literal_eval(expr)
    except (ValueError, SyntaxError):
        return 'name', expr

If context variables are used instead of literals, their values need to be searched for in the context. Here, it is necessary to handle variable names containing dots and variables that access the outer context using two dots. Here is the implementation of the resolve function:

def resolve(name, context):
    if name.startswith('..'):
        # Get the outer context
        context = context.get('..', {})
        name = name[2:]
    try:
        for tok in name.split('.'):
            # Look up the variable in the context
            context = context[tok]
        return context
    except KeyError:
        raise TemplateContextError(name)

Conclusion

It is hoped that through this simple example, you can have a preliminary understanding of the working principle of the template engine. Although this code is still far from being production-level, it can serve as a basis for developing more complete tools.

Reference: https://github.com/alexmic/microtemplates

Leapcell: The Best of Serverless Web Hosting

Finally, I would like to recommend a platform that is most suitable for deploying Python services: Leapcell

Image description