Hacking with Pickle: Python Deserialization Attacks Explained

Leapcell: The Best of Serverless Web Hosting Deserialization Attacks and Prevention of the Pickle Module in Python Introduction Hello everyone! In the realm of Python programming, there exists a potential security risk – deserialization attacks. Before delving into deserialization attacks, it is essential for us to understand what serialization and deserialization are. Conceptually, serialization is the process of converting a data structure or object into a byte stream. Through this conversion, data can be conveniently saved to a file or transmitted over a network. Deserialization, on the other hand, is the reverse process, which converts the byte stream back into the original data structure or object. In Python, the Pickle module is one of the commonly used tools for implementing serialization and deserialization. It provides a convenient interface that can serialize and save complex Python objects, and when needed later, it can easily deserialize and restore them. However, this convenience also brings potential security risks. Overview of Deserialization Attacks The deserialization process is not always secure and reliable. When we perform deserialization operations from an untrusted data source, there is a possibility of suffering from deserialization attacks. Attackers can embed malicious code in the serialized data. Once these data are deserialized, the embedded malicious code will be executed. Such attacks may lead to serious consequences, such as data leakage, system crashes, and even enable attackers to obtain remote control permissions of the system. Overview of the Python Pickle Module Basic Functions of Pickle The Pickle module is part of the Python standard library and can be used without additional installation. Its main function is to implement the serialization and deserialization of Python objects. Whether it is a simple basic data type or a complex data structure (such as lists, dictionaries, class instances, etc.), Pickle can convert it into a byte stream for storage or transmission and restore it to the original object form when needed. Working Principle of Pickle The working principle of Pickle is relatively intuitive. In the serialization stage, it will convert Python objects into byte streams according to specific rules. These byte streams contain the type information and data content of the objects. In the deserialization stage, Pickle will read the byte stream and restore it to the corresponding Python object according to the information in it. Serialization and Deserialization of Pickle Serialization: Pickle provides two main serialization functions: pickle.dump and pickle.dumps. The pickle.dump function will directly write the serialized object into the specified file, while the pickle.dumps function will return a byte stream containing the serialized data. import pickle # Create an object data = {'name': 'Leapcell', 'age': 29, 'city': 'New York'} # Serialize the object and write it to a file with open('data.pickle', 'wb') as file: pickle.dump(data, file) # Or return a byte stream data_bytes = pickle.dumps(data) Deserialization: There are also two commonly used functions for deserialization: pickle.load and pickle.loads. The pickle.load function reads the byte stream from the specified file and deserializes it, and the pickle.loads function directly deserializes a byte stream. import pickle # Deserialize the object from the file with open('data.pickle', 'rb') as file: data = pickle.load(file) # Or directly deserialize a byte stream data = pickle.loads(data_bytes) Principle of Deserialization Attacks Attack Mechanism The core of deserialization attacks is that attackers can inject malicious code into the serialized data. When the target system deserializes these serialized data containing malicious code, the malicious code will be executed, thus achieving the attacker's goal. That is to say, if we do not conduct strict verification and screening of the data source during deserialization, it is equivalent to opening the door for attackers to execute arbitrary code in the system. What Attackers Can Do Attackers can use deserialization vulnerabilities to perform a variety of malicious operations, such as executing arbitrary system commands, modifying important data in the system, or stealing sensitive information, etc. These operations may cause serious damage to the security and stability of the system. Example Code To more clearly demonstrate the process of deserialization attacks, let's look at a specific example: import pickle import os # Construct malicious code class Malicious: def reduce(self): return (os.system, ('echo Hacked!',)) # Serialize the malicious object malicious_data = pickle.dumps(Malicious()) # Execute malicious code during deserialization pickle.loads(malicious_data) In

Apr 18, 2025 - 12:13

Hacking with Pickle: Python Deserialization Attacks Explained

Leapcell: The Best of Serverless Web Hosting

Deserialization Attacks and Prevention of the Pickle Module in Python

Introduction

Hello everyone! In the realm of Python programming, there exists a potential security risk – deserialization attacks. Before delving into deserialization attacks, it is essential for us to understand what serialization and deserialization are.

Conceptually, serialization is the process of converting a data structure or object into a byte stream. Through this conversion, data can be conveniently saved to a file or transmitted over a network. Deserialization, on the other hand, is the reverse process, which converts the byte stream back into the original data structure or object.

In Python, the Pickle module is one of the commonly used tools for implementing serialization and deserialization. It provides a convenient interface that can serialize and save complex Python objects, and when needed later, it can easily deserialize and restore them. However, this convenience also brings potential security risks.

Overview of Deserialization Attacks

The deserialization process is not always secure and reliable. When we perform deserialization operations from an untrusted data source, there is a possibility of suffering from deserialization attacks. Attackers can embed malicious code in the serialized data. Once these data are deserialized, the embedded malicious code will be executed. Such attacks may lead to serious consequences, such as data leakage, system crashes, and even enable attackers to obtain remote control permissions of the system.

Overview of the Python Pickle Module

Basic Functions of Pickle

The Pickle module is part of the Python standard library and can be used without additional installation. Its main function is to implement the serialization and deserialization of Python objects. Whether it is a simple basic data type or a complex data structure (such as lists, dictionaries, class instances, etc.), Pickle can convert it into a byte stream for storage or transmission and restore it to the original object form when needed.

Working Principle of Pickle

The working principle of Pickle is relatively intuitive. In the serialization stage, it will convert Python objects into byte streams according to specific rules. These byte streams contain the type information and data content of the objects. In the deserialization stage, Pickle will read the byte stream and restore it to the corresponding Python object according to the information in it.

Serialization and Deserialization of Pickle

Serialization: Pickle provides two main serialization functions: pickle.dump and pickle.dumps. The pickle.dump function will directly write the serialized object into the specified file, while the pickle.dumps function will return a byte stream containing the serialized data.

import pickle

# Create an object
data = {'name': 'Leapcell', 'age': 29, 'city': 'New York'}

# Serialize the object and write it to a file
with open('data.pickle', 'wb') as file:
    pickle.dump(data, file)

# Or return a byte stream
data_bytes = pickle.dumps(data)

Deserialization: There are also two commonly used functions for deserialization: pickle.load and pickle.loads. The pickle.load function reads the byte stream from the specified file and deserializes it, and the pickle.loads function directly deserializes a byte stream.

import pickle

# Deserialize the object from the file
with open('data.pickle', 'rb') as file:
    data = pickle.load(file)

# Or directly deserialize a byte stream
data = pickle.loads(data_bytes)

Principle of Deserialization Attacks

Attack Mechanism

The core of deserialization attacks is that attackers can inject malicious code into the serialized data. When the target system deserializes these serialized data containing malicious code, the malicious code will be executed, thus achieving the attacker's goal. That is to say, if we do not conduct strict verification and screening of the data source during deserialization, it is equivalent to opening the door for attackers to execute arbitrary code in the system.

What Attackers Can Do

Attackers can use deserialization vulnerabilities to perform a variety of malicious operations, such as executing arbitrary system commands, modifying important data in the system, or stealing sensitive information, etc. These operations may cause serious damage to the security and stability of the system.

Example Code

To more clearly demonstrate the process of deserialization attacks, let's look at a specific example:

import pickle
import os

# Construct malicious code
class Malicious:
    def __reduce__(self):
        return (os.system, ('echo Hacked!',))

# Serialize the malicious object
malicious_data = pickle.dumps(Malicious())

# Execute malicious code during deserialization
pickle.loads(malicious_data)

In this example:

Construct malicious code: We define a class named Malicious and specify the command os.system('echo Hacked!') to be executed in its __reduce__ method. The __reduce__ method is a special method that Pickle will call during the deserialization process to reconstruct the object.
Serialize the malicious object: Use the pickle.dumps function to serialize an instance of the Malicious class to obtain the byte stream malicious_data containing malicious code.
Deserialize the malicious object: When using the pickle.loads function to deserialize malicious_data, the __reduce__ method will be called, thus executing the specified command and outputting "Hacked!".

How to Prevent Pickle Deserialization Attacks

Principles of Secure Deserialization

The primary principle of preventing deserialization attacks is to avoid performing deserialization operations from untrusted sources. Only when the data source is completely trusted can deserialization operations be carried out.

Practical Defense Methods

Example of Secure Deserialization Code: If deserialization using Pickle is necessary in some cases, the types of objects that can be deserialized can be limited by overloading the find_class method, thereby restricting the scope of deserialization.

import pickle
import types
import io

# Custom Unpickler to restrict deserializable types
class RestrictedUnpickler(pickle.Unpickler):
    def find_class(self, module, name):
        if module == "builtins" and name in {"str", "list", "dict", "set", "int", "float", "bool"}:
            return getattr(__import__(module), name)
        raise pickle.UnpicklingError(f"global '{module}.{name}' is forbidden")

def restricted_loads(s):
    return RestrictedUnpickler(io.BytesIO(s)).load()

In the above code, we have customized a RestrictedUnpickler class, which inherits from pickle.Unpickler and overrides the find_class method. In this way, only some secure built-in types are allowed to be deserialized, thus improving the security of deserialization operations.

Use Other Secure Serialization Modules (such as JSON): A more secure option is to use the JSON module instead of Pickle for serialization and deserialization operations. JSON only supports basic data types (such as strings, numbers, booleans, arrays, and objects) and will not execute arbitrary code, so it has certain advantages in terms of security.

import json

# Serialize the object
data = {'name': 'Leapcell', 'age': 29, 'city': 'New York'}
data_json = json.dumps(data)

# Deserialize the object
data = json.loads(data_json)

Conclusion

This article comprehensively introduces the concepts of serialization and deserialization in Python, as well as the application of the Pickle module in this process. At the same time, it elaborates in detail on the principles of deserialization attacks and demonstrates the ways that attackers may use through specific code examples. Finally, we discussed the principles and specific methods of preventing Pickle deserialization attacks, including restricting deserialization types and using more secure serialization modules. It is hoped that through the introduction of this article, everyone can have a deeper understanding of deserialization attacks and take effective preventive measures in actual programming to ensure the security of the system. If you have any questions or suggestions about the content of this article, you are welcome to discuss them in the comment section.

Leapcell: The Best of Serverless Web Hosting

Finally, I would like to recommend a platform that is most suitable for deploying Python services: Leapcell