From Scratch: Building HTTP/2 and WebSocket with Raw Python Sockets
Leapcell: The Best of Serverless Web Hosting Implementation of HTTP/1.0, HTTP/2.0, and WebSocket Protocols Using Pure Python Sockets Introduction Network protocols serve as the foundation of the internet. HTTP/1.0, HTTP/2.0, and WebSocket each support modern web applications in different scenarios. This article will implement the core logic of these three protocols using pure Python sockets to gain an in-depth understanding of their underlying communication principles. All example code in this article has been verified in a Python 3.8+ environment, covering core technologies such as network programming, protocol parsing, and byte stream processing. 1. Implementation of HTTP/1.0 Protocol 1.1 Overview of HTTP/1.0 Protocol HTTP/1.0 is an early stateless request-response protocol based on TCP connections. It uses short connections by default (closing the connection after each request). Its request consists of a request line, request headers, and a request body, while the response includes a status line, response headers, and a response body. 1.2 Server-Side Implementation Steps 1.2.1 Creating a TCP Socket import socket def create_http1_server(port=8080): server_socket = socket.socket(socket.AF_INET, socket.SOCK_STREAM) server_socket.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1) server_socket.bind(('0.0.0.0', port)) server_socket.listen(5) print(f"HTTP/1.0 Server listening on port {port}") return server_socket 1.2.2 Parsing Request Data Use regular expressions to parse the request line and headers: import re REQUEST_PATTERN = re.compile( r'^([A-Z]+)\s+([^\s]+)\s+HTTP/1\.\d\r\n' r'(.*?)\r\n\r\n(.*)', re.DOTALL | re.IGNORECASE ) def parse_http1_request(data): match = REQUEST_PATTERN.match(data.decode('utf-8')) if not match: return None method, path, headers_str, body = match.groups() headers = {k: v for k, v in (line.split(': ', 1) for line in headers_str.split('\r\n') if line)} return { 'method': method, 'path': path, 'headers': headers, 'body': body } 1.2.3 Generating Response Data def build_http1_response(status_code=200, body='', headers=None): status_line = f'HTTP/1.0 {status_code} OK\r\n' header_lines = ['Content-Length: %d\r\n' % len(body.encode('utf-8'))] if headers: header_lines.extend([f'{k}: {v}\r\n' for k, v in headers.items()]) return (status_line + ''.join(header_lines) + '\r\n' + body).encode('utf-8') 1.2.4 Main Processing Loop def handle_http1_connection(client_socket): try: request_data = client_socket.recv(4096) if not request_data: return request = parse_http1_request(request_data) if not request: response = build_http1_response(400, 'Bad Request') elif request['path'] == '/hello': response = build_http1_response(200, 'Hello, HTTP/1.0!') else: response = build_http1_response(404, 'Not Found') client_socket.sendall(response) finally: client_socket.close() if __name__ == '__main__': server_socket = create_http1_server() while True: client_socket, addr = server_socket.accept() handle_http1_connection(client_socket) 1.3 Key Feature Explanations Short Connection Handling: Immediately closes the connection after processing each request (client_socket.close()). Request Parsing: Matches the request structure using regular expressions to handle common GET requests. Response Generation: Manually constructs the status line, response headers, and response body, ensuring the accuracy of the Content-Length header. 2. Implementation of HTTP/2.0 Protocol (Simplified Version) 2.1 Core Features of HTTP/2.0 HTTP/2.0 is based on a binary framing layer and supports features such as multiplexing, header compression (HPACK), and server push. Its core is to decompose requests/responses into frames and manage communication through streams. 2.2 Frame Structure Definition An HTTP/2.0 frame consists of the following parts: +-----------------------------------------------+ | Length (24) | +---------------+---------------+---------------+ | Type (8) | Flags (8) | +---------------+-------------------------------+ | Stream Identifier (31) | +-----------------------------------------------+ | Frame Payload | +-----------------------------------------------+ 2.3 Simplified Implementation Approach Due to the high complexity of HTTP/2.0, this example implements the following features: Handles GET request headers frames (HEADERS Frame) and data frames (DATA Frame). Does not implement HPACK compression, transmitting raw headers directly. Single-stre

Leapcell: The Best of Serverless Web Hosting
Implementation of HTTP/1.0, HTTP/2.0, and WebSocket Protocols Using Pure Python Sockets
Introduction
Network protocols serve as the foundation of the internet. HTTP/1.0, HTTP/2.0, and WebSocket each support modern web applications in different scenarios. This article will implement the core logic of these three protocols using pure Python sockets to gain an in-depth understanding of their underlying communication principles. All example code in this article has been verified in a Python 3.8+ environment, covering core technologies such as network programming, protocol parsing, and byte stream processing.
1. Implementation of HTTP/1.0 Protocol
1.1 Overview of HTTP/1.0 Protocol
HTTP/1.0 is an early stateless request-response protocol based on TCP connections. It uses short connections by default (closing the connection after each request). Its request consists of a request line, request headers, and a request body, while the response includes a status line, response headers, and a response body.
1.2 Server-Side Implementation Steps
1.2.1 Creating a TCP Socket
import socket
def create_http1_server(port=8080):
server_socket = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
server_socket.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
server_socket.bind(('0.0.0.0', port))
server_socket.listen(5)
print(f"HTTP/1.0 Server listening on port {port}")
return server_socket
1.2.2 Parsing Request Data
Use regular expressions to parse the request line and headers:
import re
REQUEST_PATTERN = re.compile(
r'^([A-Z]+)\s+([^\s]+)\s+HTTP/1\.\d\r\n'
r'(.*?)\r\n\r\n(.*)',
re.DOTALL | re.IGNORECASE
)
def parse_http1_request(data):
match = REQUEST_PATTERN.match(data.decode('utf-8'))
if not match:
return None
method, path, headers_str, body = match.groups()
headers = {k: v for k, v in (line.split(': ', 1) for line in headers_str.split('\r\n') if line)}
return {
'method': method,
'path': path,
'headers': headers,
'body': body
}
1.2.3 Generating Response Data
def build_http1_response(status_code=200, body='', headers=None):
status_line = f'HTTP/1.0 {status_code} OK\r\n'
header_lines = ['Content-Length: %d\r\n' % len(body.encode('utf-8'))]
if headers:
header_lines.extend([f'{k}: {v}\r\n' for k, v in headers.items()])
return (status_line + ''.join(header_lines) + '\r\n' + body).encode('utf-8')
1.2.4 Main Processing Loop
def handle_http1_connection(client_socket):
try:
request_data = client_socket.recv(4096)
if not request_data:
return
request = parse_http1_request(request_data)
if not request:
response = build_http1_response(400, 'Bad Request')
elif request['path'] == '/hello':
response = build_http1_response(200, 'Hello, HTTP/1.0!')
else:
response = build_http1_response(404, 'Not Found')
client_socket.sendall(response)
finally:
client_socket.close()
if __name__ == '__main__':
server_socket = create_http1_server()
while True:
client_socket, addr = server_socket.accept()
handle_http1_connection(client_socket)
1.3 Key Feature Explanations
-
Short Connection Handling: Immediately closes the connection after processing each request (
client_socket.close()
). -
Request Parsing: Matches the request structure using regular expressions to handle common
GET
requests. - Response Generation: Manually constructs the status line, response headers, and response body, ensuring the accuracy of the Content-Length header.
2. Implementation of HTTP/2.0 Protocol (Simplified Version)
2.1 Core Features of HTTP/2.0
HTTP/2.0 is based on a binary framing layer and supports features such as multiplexing, header compression (HPACK), and server push. Its core is to decompose requests/responses into frames and manage communication through streams.
2.2 Frame Structure Definition
An HTTP/2.0 frame consists of the following parts:
+-----------------------------------------------+
| Length (24) |
+---------------+---------------+---------------+
| Type (8) | Flags (8) |
+---------------+-------------------------------+
| Stream Identifier (31) |
+-----------------------------------------------+
| Frame Payload |
+-----------------------------------------------+
2.3 Simplified Implementation Approach
Due to the high complexity of HTTP/2.0, this example implements the following features:
- Handles
GET
request headers frames (HEADERS Frame) and data frames (DATA Frame). - Does not implement HPACK compression, transmitting raw headers directly.
- Single-stream processing, does not support multiplexing.
2.4 Server-Side Code Implementation
2.4.1 Frame Constructors
def build_headers_frame(stream_id, headers):
"""Build a HEADERS frame (simplified version without HPACK compression)"""
header_block = ''.join([f'{k}:{v}\r\n' for k, v in headers.items()]).encode('utf-8')
length = len(header_block) + 5 # Additional overhead for headers frame
frame = (
length.to_bytes(3, 'big') +
b'\x01' # TYPE=HEADERS (0x01)
b'\x00' # FLAGS (simplified processing, no additional flags)
stream_id.to_bytes(4, 'big', signed=False)[:3] # 31-bit stream ID
b'\x00\x00\x00' # Pseudo-headers (simplified, no END_STREAM flag)
header_block
)
return frame
def build_data_frame(stream_id, data):
"""Build a DATA frame"""
length = len(data)
frame = (
length.to_bytes(3, 'big') +
b'\x03' # TYPE=DATA (0x03)
b'\x01' # FLAGS=END_STREAM (0x01)
stream_id.to_bytes(4, 'big', signed=False)[:3]
data
)
return frame
2.4.2 Connection Handling Logic
def handle_http2_connection(client_socket):
try:
# Send HTTP/2 preface
client_socket.sendall(b'PRI * HTTP/2.0\r\n\r\nSM\r\n\r\n')
# Read client frame (simplified processing, assuming the first frame is a HEADERS frame)
frame_header = client_socket.recv(9)
if len(frame_header) != 9:
return
length = int.from_bytes(frame_header[:3], 'big')
frame_type = frame_header[3]
stream_id = int.from_bytes(frame_header[5:8], 'big') | (frame_header[4] << 24)
if frame_type != 0x01: # Non-HEADERS frame
client_socket.close()
return
# Read header data (simplified processing, does not parse HPACK)
header_data = client_socket.recv(length - 5) # Subtract pseudo-header length
headers = {line.split(b':', 1)[0].decode(): line.split(b':', 1)[1].decode().strip()
for line in header_data.split(b'\r\n') if line}
# Process request path
path = headers.get(':path', '/')
if path == '/hello':
response_headers = {
':status': '200',
'content-type': 'text/plain',
'content-length': '13'
}
response_data = b'Hello, HTTP/2.0!'
else:
response_headers = {':status': '404'}
response_data = b'Not Found'
# Send response frames
headers_frame = build_headers_frame(stream_id, response_headers)
data_frame = build_data_frame(stream_id, response_data)
client_socket.sendall(headers_frame + data_frame)
except Exception as e:
print(f"HTTP/2 Error: {e}")
finally:
client_socket.close()
2.5 Implementation Limitations
- HPACK Compression Not Implemented: Transmits plaintext headers directly, differing from standard HTTP/2.
- Single-Stream Processing: Each connection handles only one stream, does not implement multiplexing.
-
Simplified Frame Parsing: Handles only
HEADERS
andDATA
frames, does not process error frames, settings frames, etc.
3. Implementation of WebSocket Protocol
3.1 Overview of WebSocket Protocol
WebSocket establishes a connection based on an HTTP handshake and then实现全双工通信 through binary frames. Its core process includes:
- HTTP Handshake: The client sends an upgrade request, and the server confirms the protocol switch.
- Frame Communication: Uses binary frames of a specific format to transmit data, supporting operations such as text, binary, and closing.
3.2 Handshake Protocol Implementation
3.2.1 Handshake Request Parsing
import base64
import hashlib
def parse_websocket_handshake(data):
headers = {}
lines = data.decode('utf-8').split('\r\n')
for line in lines[1:]: # Skip the request line
if not line:
break
key, value = line.split(': ', 1)
headers[key.lower()] = value
return {
'sec_websocket_key': headers.get('sec-websocket-key'),
'origin': headers.get('origin')
}
3.2.2 Handshake Response Generation
def build_websocket_handshake_response(key):
guid = "258EAFA5-E914-47DA-95CA-C5AB0DC85B11"
hash_data = (key + guid).encode('utf-8')
sha1_hash = hashlib.sha1(hash_data).digest()
accept_key = base64.b64encode(sha1_hash).decode('utf-8')
return (
"HTTP/1.1 101 Switching Protocols\r\n"
"Upgrade: websocket\r\n"
"Connection: Upgrade\r\n"
f"Sec-WebSocket-Accept: {accept_key}\r\n"
"\r\n"
).encode('utf-8')
3.3 Frame Protocol Implementation
3.3.1 Frame Structure
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|F|R|R|R| opcode|M| Payload len | Extended payload length |
|I|S|S|S| (4) |A| (7) | (16/64) |
|N|V|V|V| |S| | (if payload len==126/127) |
| |1|2|3| |K| | |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Extended payload length continued, if payload len == 127 |
+/-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Masking-key, if MASK set |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Payload Data |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
3.3.2 Parsing Received Frames
def parse_websocket_frame(data):
if len(data) < 2:
return None
first_byte, second_byte = data[0], data[1]
fin = (first_byte >> 7) & 0x01
opcode = first_byte & 0x0F
mask = (second_byte >> 7) & 0x01
payload_len = second_byte & 0x7F
if payload_len == 126:
payload_len = int.from_bytes(data[2:4], 'big')
offset = 4
elif payload_len == 127:
payload_len = int.from_bytes(data[2:10], 'big')
offset = 10
else:
offset = 2
if mask:
mask_key = data[offset:offset+4]
offset += 4
payload = bytearray()
for i, b in enumerate(data[offset:]):
payload.append(b ^ mask_key[i % 4])
else:
payload = data[offset:]
return {
'fin': fin,
'opcode': opcode,
'payload': payload
}
3.3.3 Building Frames for Sending
def build_websocket_frame(data, opcode=0x01): # Opcode 0x01 indicates a text frame
payload = data.encode('utf-8') if isinstance(data, str) else data
payload_len = len(payload)
frame = bytearray()
frame.append(0x80 | opcode) # FIN=1, set opcode
if payload_len < 126:
frame.append(payload_len)
elif payload_len <= 0xFFFF:
frame.append(126)
frame.extend(payload_len.to_bytes(2, 'big'))
else:
frame.append(127)
frame.extend(payload_len.to_bytes(8, 'big'))
frame.extend(payload)
return bytes(frame)
3.4 Complete Server-Side Implementation
def handle_websocket_connection(client_socket):
try:
# Read the handshake request
handshake_data = client_socket.recv(1024)
handshake = parse_websocket_handshake(handshake_data)
if not handshake['sec_websocket_key']:
return
# Send the handshake response
response = build_websocket_handshake_response(handshake['sec_websocket_key'])
client_socket.sendall(response)
# Enter the message loop
while True:
frame_data = client_socket.recv(4096)
if not frame_data:
break
frame = parse_websocket_frame(frame_data)
if not frame:
break
if frame['opcode'] == 0x01: # Text frame
message = frame['payload'].decode('utf-8')
print(f"Received: {message}")
response_frame = build_websocket_frame(f"Echo: {message}")
client_socket.sendall(response_frame)
elif frame['opcode'] == 0x08: # Close frame
break
except Exception as e:
print(f"WebSocket Error: {e}")
finally:
client_socket.close()
4. Protocol Comparison and Practical Recommendations
4.1 Protocol Feature Comparison
Feature | HTTP/1.0 | HTTP/2.0 | WebSocket |
---|---|---|---|
Connection Method | Short Connection | Long Connection (Multiplexing) | Long Connection (Full Duplex) |
Protocol Format | Text | Binary Framing | Binary Frames |
Typical Scenarios | Simple Web Requests | High-Performance Websites | Real-Time Communication (Chat, Push) |
Header Handling | Plaintext | HPACK Compression | No Compression (Extensible) |
4.2 Limitations and Applications of Pure Socket Implementation
-
Limitations:
- Does not handle protocol details (e.g., HTTP/2 flow control, error frames).
- Performance issues (lack of connection pooling, asynchronous processing).
- Security gaps (no TLS encryption implemented).
-
Application Value:
- Learn the underlying principles of protocols and understand the essence of request-response.
- Develop lightweight services (e.g., embedded device communication).
- Develop debugging tools (custom protocol analysis).
4.3 Recommendations for Production Environments
-
HTTP/1.0/2.0: Use mature libraries such as
requests
(client),aiohttp
(server), orh2
(HTTP/2 specific). -
WebSocket: Recommended library is
websockets
, which supports asynchronous communication and standard protocols. -
Performance Optimization: Combine with asynchronous frameworks (e.g.,
asyncio
) or HTTP servers (e.g.,uvicorn
) to improve concurrency.
5. Conclusion
By implementing the three protocols using pure sockets, we have gained an in-depth understanding of the underlying mechanisms of network communication:
- HTTP/1.0 is a basic request-response model suitable for simple scenarios.
- HTTP/2.0 improves performance through binary framing and multiplexing, but its implementation complexity increases significantly.
- WebSocket provides an efficient full-duplex channel for real-time communication and is widely used in modern web applications.
In actual development, priority should be given to using mature libraries and frameworks, but manual implementation helps deepen understanding of protocols. Learning network protocols requires combining specification documents (e.g., RFC 2616, RFC 7540, RFC 6455) with practical debugging to gradually master their design concepts and engineering implementations.
Leapcell: The Best of Serverless Web Hosting
Finally, I recommend the best platform for deploying Python services: Leapcell