From Scratch: Building HTTP/2 and WebSocket with Raw Python Sockets

Leapcell: The Best of Serverless Web Hosting Implementation of HTTP/1.0, HTTP/2.0, and WebSocket Protocols Using Pure Python Sockets Introduction Network protocols serve as the foundation of the internet. HTTP/1.0, HTTP/2.0, and WebSocket each support modern web applications in different scenarios. This article will implement the core logic of these three protocols using pure Python sockets to gain an in-depth understanding of their underlying communication principles. All example code in this article has been verified in a Python 3.8+ environment, covering core technologies such as network programming, protocol parsing, and byte stream processing. 1. Implementation of HTTP/1.0 Protocol 1.1 Overview of HTTP/1.0 Protocol HTTP/1.0 is an early stateless request-response protocol based on TCP connections. It uses short connections by default (closing the connection after each request). Its request consists of a request line, request headers, and a request body, while the response includes a status line, response headers, and a response body. 1.2 Server-Side Implementation Steps 1.2.1 Creating a TCP Socket import socket def create_http1_server(port=8080): server_socket = socket.socket(socket.AF_INET, socket.SOCK_STREAM) server_socket.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1) server_socket.bind(('0.0.0.0', port)) server_socket.listen(5) print(f"HTTP/1.0 Server listening on port {port}") return server_socket 1.2.2 Parsing Request Data Use regular expressions to parse the request line and headers: import re REQUEST_PATTERN = re.compile( r'^([A-Z]+)\s+([^\s]+)\s+HTTP/1\.\d\r\n' r'(.*?)\r\n\r\n(.*)', re.DOTALL | re.IGNORECASE ) def parse_http1_request(data): match = REQUEST_PATTERN.match(data.decode('utf-8')) if not match: return None method, path, headers_str, body = match.groups() headers = {k: v for k, v in (line.split(': ', 1) for line in headers_str.split('\r\n') if line)} return { 'method': method, 'path': path, 'headers': headers, 'body': body } 1.2.3 Generating Response Data def build_http1_response(status_code=200, body='', headers=None): status_line = f'HTTP/1.0 {status_code} OK\r\n' header_lines = ['Content-Length: %d\r\n' % len(body.encode('utf-8'))] if headers: header_lines.extend([f'{k}: {v}\r\n' for k, v in headers.items()]) return (status_line + ''.join(header_lines) + '\r\n' + body).encode('utf-8') 1.2.4 Main Processing Loop def handle_http1_connection(client_socket): try: request_data = client_socket.recv(4096) if not request_data: return request = parse_http1_request(request_data) if not request: response = build_http1_response(400, 'Bad Request') elif request['path'] == '/hello': response = build_http1_response(200, 'Hello, HTTP/1.0!') else: response = build_http1_response(404, 'Not Found') client_socket.sendall(response) finally: client_socket.close() if __name__ == '__main__': server_socket = create_http1_server() while True: client_socket, addr = server_socket.accept() handle_http1_connection(client_socket) 1.3 Key Feature Explanations Short Connection Handling: Immediately closes the connection after processing each request (client_socket.close()). Request Parsing: Matches the request structure using regular expressions to handle common GET requests. Response Generation: Manually constructs the status line, response headers, and response body, ensuring the accuracy of the Content-Length header. 2. Implementation of HTTP/2.0 Protocol (Simplified Version) 2.1 Core Features of HTTP/2.0 HTTP/2.0 is based on a binary framing layer and supports features such as multiplexing, header compression (HPACK), and server push. Its core is to decompose requests/responses into frames and manage communication through streams. 2.2 Frame Structure Definition An HTTP/2.0 frame consists of the following parts: +-----------------------------------------------+ | Length (24) | +---------------+---------------+---------------+ | Type (8) | Flags (8) | +---------------+-------------------------------+ | Stream Identifier (31) | +-----------------------------------------------+ | Frame Payload | +-----------------------------------------------+ 2.3 Simplified Implementation Approach Due to the high complexity of HTTP/2.0, this example implements the following features: Handles GET request headers frames (HEADERS Frame) and data frames (DATA Frame). Does not implement HPACK compression, transmitting raw headers directly. Single-stre

Jun 5, 2025 - 17:40
 0
From Scratch: Building HTTP/2 and WebSocket with Raw Python Sockets

Image description

Leapcell: The Best of Serverless Web Hosting

Implementation of HTTP/1.0, HTTP/2.0, and WebSocket Protocols Using Pure Python Sockets

Introduction

Network protocols serve as the foundation of the internet. HTTP/1.0, HTTP/2.0, and WebSocket each support modern web applications in different scenarios. This article will implement the core logic of these three protocols using pure Python sockets to gain an in-depth understanding of their underlying communication principles. All example code in this article has been verified in a Python 3.8+ environment, covering core technologies such as network programming, protocol parsing, and byte stream processing.

1. Implementation of HTTP/1.0 Protocol

1.1 Overview of HTTP/1.0 Protocol

HTTP/1.0 is an early stateless request-response protocol based on TCP connections. It uses short connections by default (closing the connection after each request). Its request consists of a request line, request headers, and a request body, while the response includes a status line, response headers, and a response body.

1.2 Server-Side Implementation Steps

1.2.1 Creating a TCP Socket

import socket

def create_http1_server(port=8080):
    server_socket = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
    server_socket.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
    server_socket.bind(('0.0.0.0', port))
    server_socket.listen(5)
    print(f"HTTP/1.0 Server listening on port {port}")
    return server_socket

1.2.2 Parsing Request Data

Use regular expressions to parse the request line and headers:

import re

REQUEST_PATTERN = re.compile(
    r'^([A-Z]+)\s+([^\s]+)\s+HTTP/1\.\d\r\n'
    r'(.*?)\r\n\r\n(.*)',
    re.DOTALL | re.IGNORECASE
)

def parse_http1_request(data):
    match = REQUEST_PATTERN.match(data.decode('utf-8'))
    if not match:
        return None
    method, path, headers_str, body = match.groups()
    headers = {k: v for k, v in (line.split(': ', 1) for line in headers_str.split('\r\n') if line)}
    return {
        'method': method,
        'path': path,
        'headers': headers,
        'body': body
    }

1.2.3 Generating Response Data

def build_http1_response(status_code=200, body='', headers=None):
    status_line = f'HTTP/1.0 {status_code} OK\r\n'
    header_lines = ['Content-Length: %d\r\n' % len(body.encode('utf-8'))]
    if headers:
        header_lines.extend([f'{k}: {v}\r\n' for k, v in headers.items()])
    return (status_line + ''.join(header_lines) + '\r\n' + body).encode('utf-8')

1.2.4 Main Processing Loop

def handle_http1_connection(client_socket):
    try:
        request_data = client_socket.recv(4096)
        if not request_data:
            return
        request = parse_http1_request(request_data)
        if not request:
            response = build_http1_response(400, 'Bad Request')
        elif request['path'] == '/hello':
            response = build_http1_response(200, 'Hello, HTTP/1.0!')
        else:
            response = build_http1_response(404, 'Not Found')
        client_socket.sendall(response)
    finally:
        client_socket.close()

if __name__ == '__main__':
    server_socket = create_http1_server()
    while True:
        client_socket, addr = server_socket.accept()
        handle_http1_connection(client_socket)

1.3 Key Feature Explanations

  • Short Connection Handling: Immediately closes the connection after processing each request (client_socket.close()).
  • Request Parsing: Matches the request structure using regular expressions to handle common GET requests.
  • Response Generation: Manually constructs the status line, response headers, and response body, ensuring the accuracy of the Content-Length header.

2. Implementation of HTTP/2.0 Protocol (Simplified Version)

2.1 Core Features of HTTP/2.0

HTTP/2.0 is based on a binary framing layer and supports features such as multiplexing, header compression (HPACK), and server push. Its core is to decompose requests/responses into frames and manage communication through streams.

2.2 Frame Structure Definition

An HTTP/2.0 frame consists of the following parts:

+-----------------------------------------------+
|                 Length (24)                   |
+---------------+---------------+---------------+
|   Type (8)    |   Flags (8)   |
+---------------+-------------------------------+
|                 Stream Identifier (31)          |
+-----------------------------------------------+
|                   Frame Payload                 |
+-----------------------------------------------+

2.3 Simplified Implementation Approach

Due to the high complexity of HTTP/2.0, this example implements the following features:

  1. Handles GET request headers frames (HEADERS Frame) and data frames (DATA Frame).
  2. Does not implement HPACK compression, transmitting raw headers directly.
  3. Single-stream processing, does not support multiplexing.

2.4 Server-Side Code Implementation

2.4.1 Frame Constructors

def build_headers_frame(stream_id, headers):
    """Build a HEADERS frame (simplified version without HPACK compression)"""
    header_block = ''.join([f'{k}:{v}\r\n' for k, v in headers.items()]).encode('utf-8')
    length = len(header_block) + 5  # Additional overhead for headers frame
    frame = (
        length.to_bytes(3, 'big') +
        b'\x01'  # TYPE=HEADERS (0x01)
        b'\x00'  # FLAGS (simplified processing, no additional flags)
        stream_id.to_bytes(4, 'big', signed=False)[:3]  # 31-bit stream ID
        b'\x00\x00\x00'  # Pseudo-headers (simplified, no END_STREAM flag)
        header_block
    )
    return frame

def build_data_frame(stream_id, data):
    """Build a DATA frame"""
    length = len(data)
    frame = (
        length.to_bytes(3, 'big') +
        b'\x03'  # TYPE=DATA (0x03)
        b'\x01'  # FLAGS=END_STREAM (0x01)
        stream_id.to_bytes(4, 'big', signed=False)[:3]
        data
    )
    return frame

2.4.2 Connection Handling Logic

def handle_http2_connection(client_socket):
    try:
        # Send HTTP/2 preface
        client_socket.sendall(b'PRI * HTTP/2.0\r\n\r\nSM\r\n\r\n')

        # Read client frame (simplified processing, assuming the first frame is a HEADERS frame)
        frame_header = client_socket.recv(9)
        if len(frame_header) != 9:
            return

        length = int.from_bytes(frame_header[:3], 'big')
        frame_type = frame_header[3]
        stream_id = int.from_bytes(frame_header[5:8], 'big') | (frame_header[4] << 24)

        if frame_type != 0x01:  # Non-HEADERS frame
            client_socket.close()
            return

        # Read header data (simplified processing, does not parse HPACK)
        header_data = client_socket.recv(length - 5)  # Subtract pseudo-header length
        headers = {line.split(b':', 1)[0].decode(): line.split(b':', 1)[1].decode().strip() 
                  for line in header_data.split(b'\r\n') if line}

        # Process request path
        path = headers.get(':path', '/')
        if path == '/hello':
            response_headers = {
                ':status': '200',
                'content-type': 'text/plain',
                'content-length': '13'
            }
            response_data = b'Hello, HTTP/2.0!'
        else:
            response_headers = {':status': '404'}
            response_data = b'Not Found'

        # Send response frames
        headers_frame = build_headers_frame(stream_id, response_headers)
        data_frame = build_data_frame(stream_id, response_data)
        client_socket.sendall(headers_frame + data_frame)

    except Exception as e:
        print(f"HTTP/2 Error: {e}")
    finally:
        client_socket.close()

2.5 Implementation Limitations

  • HPACK Compression Not Implemented: Transmits plaintext headers directly, differing from standard HTTP/2.
  • Single-Stream Processing: Each connection handles only one stream, does not implement multiplexing.
  • Simplified Frame Parsing: Handles only HEADERS and DATA frames, does not process error frames, settings frames, etc.

3. Implementation of WebSocket Protocol

3.1 Overview of WebSocket Protocol

WebSocket establishes a connection based on an HTTP handshake and then实现全双工通信 through binary frames. Its core process includes:

  1. HTTP Handshake: The client sends an upgrade request, and the server confirms the protocol switch.
  2. Frame Communication: Uses binary frames of a specific format to transmit data, supporting operations such as text, binary, and closing.

3.2 Handshake Protocol Implementation

3.2.1 Handshake Request Parsing

import base64
import hashlib

def parse_websocket_handshake(data):
    headers = {}
    lines = data.decode('utf-8').split('\r\n')
    for line in lines[1:]:  # Skip the request line
        if not line:
            break
        key, value = line.split(': ', 1)
        headers[key.lower()] = value
    return {
        'sec_websocket_key': headers.get('sec-websocket-key'),
        'origin': headers.get('origin')
    }

3.2.2 Handshake Response Generation

def build_websocket_handshake_response(key):
    guid = "258EAFA5-E914-47DA-95CA-C5AB0DC85B11"
    hash_data = (key + guid).encode('utf-8')
    sha1_hash = hashlib.sha1(hash_data).digest()
    accept_key = base64.b64encode(sha1_hash).decode('utf-8')
    return (
        "HTTP/1.1 101 Switching Protocols\r\n"
        "Upgrade: websocket\r\n"
        "Connection: Upgrade\r\n"
        f"Sec-WebSocket-Accept: {accept_key}\r\n"
        "\r\n"
    ).encode('utf-8')

3.3 Frame Protocol Implementation

3.3.1 Frame Structure

0                   1                   2                   3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|F|R|R|R| opcode|M| Payload len |    Extended payload length    |
|I|S|S|S|  (4) |A|     (7)     |             (16/64)           |
|N|V|V|V|       |S|             |   (if payload len==126/127)   |
| |1|2|3|       |K|             |                               |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|     Extended payload length continued, if payload len == 127  |
+/-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                     Masking-key, if MASK set                 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                           Payload Data                        |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

3.3.2 Parsing Received Frames

def parse_websocket_frame(data):
    if len(data) < 2:
        return None
    first_byte, second_byte = data[0], data[1]
    fin = (first_byte >> 7) & 0x01
    opcode = first_byte & 0x0F
    mask = (second_byte >> 7) & 0x01
    payload_len = second_byte & 0x7F

    if payload_len == 126:
        payload_len = int.from_bytes(data[2:4], 'big')
        offset = 4
    elif payload_len == 127:
        payload_len = int.from_bytes(data[2:10], 'big')
        offset = 10
    else:
        offset = 2

    if mask:
        mask_key = data[offset:offset+4]
        offset += 4
        payload = bytearray()
        for i, b in enumerate(data[offset:]):
            payload.append(b ^ mask_key[i % 4])
    else:
        payload = data[offset:]

    return {
        'fin': fin,
        'opcode': opcode,
        'payload': payload
    }

3.3.3 Building Frames for Sending

def build_websocket_frame(data, opcode=0x01):  # Opcode 0x01 indicates a text frame
    payload = data.encode('utf-8') if isinstance(data, str) else data
    payload_len = len(payload)
    frame = bytearray()

    frame.append(0x80 | opcode)  # FIN=1, set opcode

    if payload_len < 126:
        frame.append(payload_len)
    elif payload_len <= 0xFFFF:
        frame.append(126)
        frame.extend(payload_len.to_bytes(2, 'big'))
    else:
        frame.append(127)
        frame.extend(payload_len.to_bytes(8, 'big'))

    frame.extend(payload)
    return bytes(frame)

3.4 Complete Server-Side Implementation

def handle_websocket_connection(client_socket):
    try:
        # Read the handshake request
        handshake_data = client_socket.recv(1024)
        handshake = parse_websocket_handshake(handshake_data)
        if not handshake['sec_websocket_key']:
            return

        # Send the handshake response
        response = build_websocket_handshake_response(handshake['sec_websocket_key'])
        client_socket.sendall(response)

        # Enter the message loop
        while True:
            frame_data = client_socket.recv(4096)
            if not frame_data:
                break

            frame = parse_websocket_frame(frame_data)
            if not frame:
                break

            if frame['opcode'] == 0x01:  # Text frame
                message = frame['payload'].decode('utf-8')
                print(f"Received: {message}")
                response_frame = build_websocket_frame(f"Echo: {message}")
                client_socket.sendall(response_frame)
            elif frame['opcode'] == 0x08:  # Close frame
                break

    except Exception as e:
        print(f"WebSocket Error: {e}")
    finally:
        client_socket.close()

4. Protocol Comparison and Practical Recommendations

4.1 Protocol Feature Comparison

Feature HTTP/1.0 HTTP/2.0 WebSocket
Connection Method Short Connection Long Connection (Multiplexing) Long Connection (Full Duplex)
Protocol Format Text Binary Framing Binary Frames
Typical Scenarios Simple Web Requests High-Performance Websites Real-Time Communication (Chat, Push)
Header Handling Plaintext HPACK Compression No Compression (Extensible)

4.2 Limitations and Applications of Pure Socket Implementation

  • Limitations:

    1. Does not handle protocol details (e.g., HTTP/2 flow control, error frames).
    2. Performance issues (lack of connection pooling, asynchronous processing).
    3. Security gaps (no TLS encryption implemented).
  • Application Value:

    1. Learn the underlying principles of protocols and understand the essence of request-response.
    2. Develop lightweight services (e.g., embedded device communication).
    3. Develop debugging tools (custom protocol analysis).

4.3 Recommendations for Production Environments

  • HTTP/1.0/2.0: Use mature libraries such as requests (client), aiohttp (server), or h2 (HTTP/2 specific).
  • WebSocket: Recommended library is websockets, which supports asynchronous communication and standard protocols.
  • Performance Optimization: Combine with asynchronous frameworks (e.g., asyncio) or HTTP servers (e.g., uvicorn) to improve concurrency.

5. Conclusion

By implementing the three protocols using pure sockets, we have gained an in-depth understanding of the underlying mechanisms of network communication:

  • HTTP/1.0 is a basic request-response model suitable for simple scenarios.
  • HTTP/2.0 improves performance through binary framing and multiplexing, but its implementation complexity increases significantly.
  • WebSocket provides an efficient full-duplex channel for real-time communication and is widely used in modern web applications.

In actual development, priority should be given to using mature libraries and frameworks, but manual implementation helps deepen understanding of protocols. Learning network protocols requires combining specification documents (e.g., RFC 2616, RFC 7540, RFC 6455) with practical debugging to gradually master their design concepts and engineering implementations.

Leapcell: The Best of Serverless Web Hosting

Finally, I recommend the best platform for deploying Python services: Leapcell

Image description