Learn Python Basics for Data Engineering (with Mini Project)
In today’s data-driven world, Data Engineers are the architects of pipelines that move and transform data. Python has emerged as a key language for data engineering due to its simplicity, easy to import libraries, and reliable community support. Whether you're cleaning raw CSVs or automating database workflows, Python gives you the power and simplicity to build quickly and effectively. This article introduces you to the fundamentals of Python programming, setting the foundation for your journey into data engineering. 1. Python fundamentals that you should learn. a. Variables and Data types. Variables are identifiers in Python, they are like 'number plates' for our data. They are used to label/identify data from each other. For example: x = 5 person = "John" car = "Mercedes" Variables follow rules of declaration to avoid errors when writing code as stipulated by Python Enhancement Proposal (PEP) which is a document providing information to the Python community, or describing a new feature for Python or its processes or environment. Learn more about PEP here. Rules to follow include: i. Do not start variable names with a capital letter. This will raise an error. Person = 'John' # wrong person = 'John' # right ii. Do not start variable names with a number. 001_student = 'Mary' # wrong student_001 = 'Mary' # right iii. Do not use Python's reserved keywords to declare variables. For example, def which is used to declare functions, from which is used to import Python modules and packages or and which is a logical operator def = 'some word' # wrong. For more PEP rules regarding variable declaration, visit their official website highlighted above. Python has various data types which include string, float, integer, booleans. You will work with these data types everytime in your data engineering career, so let's dive in! i. Integer. This includes negative and positive whole numbers like -10, 10, 10000, 0 e.t.c. x = 10 ii. Float. This includes an actual number with a numerical representation denoted with a decimal number for example 10.12, 677.75 e.t.c. x = 677.75 iii. String This includes alphabetical words and representations for example John, school e.t.c. student = 'Maria' iv. Boolean. These are basically True / False statements. is_admin = False b. Data structures in Python Data structures in Python are used to store data and are divided into two groups namely mutable and immutable. Mutable data structures. These data structures could be changed or altered. They include lists, sets and dictionaries. i. Lists: Lists are used to store different data types, and is mutable. They are denoted by square brackets [] An example of a list includes: student = ["Maria", "John", "Jane"] Lists can be used to store different data types as stated above. new_list = ["Maria", 15, True] ii. Sets A set is an unordered collection of unique items. It’s useful when you want to remove duplicates or perform operations like unions and intersections. set_1 = {10, 20, 30, 40, 50} iii. Dictionaries A dictionary stores data in key-value pairs. It’s perfect when you want to label your data and access it efficiently by key. student_info = { 'name': 'John Doe', 'age': 18, 'id': 200, 'grade': 12 } Immutable data structures. These data structures cannot be changed once declared. An example of these data structures is a tuple. i. Tuples: A tuple is similar to a list, but it’s immutable (you can’t change it after creation). Tuples are often used for fixed data collections like coordinates or return values from functions. coordinates = (10, 35) c. Conditionals. Conditionals let your program make decisions based on certain conditions. Python uses if, elif (else if), and else to control the flow of logic. An example of a conditional logic flow is as follows: temperature = 36 if temperature > 30: print("Today's too hot, I might need some sunscreen") else: print("Hmm, it ain't that hot.") In the above conditional flow, it checks if the temperature is above 30 degrees, if it is higher than 30, it will print the message, else it will print the message indicated. d. Loops. Loops allow you to repeat a block of code multiple times. Python has two main types of loops: for and while. i. for loop This loop is great for looping through a list of items, or even a dictionary. For example: shopping_list = ["Rice", "Sugar", "Salt", "Eggs", "Olive oil"] for i in shopping_list: print(i) The output would be: Rice Sugar Salt Eggs Olive oil SECRET HINT: If you want your list to print in a single line, use the end=' ' argument while printing out your list shopping_list = ["Rice", "Sugar", "Salt", "Eggs", "Olive oil"] for i in shopping_list: print(i, end=' ') Output: Rice Sugar Salt Egg

In today’s data-driven world, Data Engineers are the architects of pipelines that move and transform data. Python has emerged as a key language for data engineering due to its simplicity, easy to import libraries, and reliable community support. Whether you're cleaning raw CSVs or automating database workflows, Python gives you the power and simplicity to build quickly and effectively.
This article introduces you to the fundamentals of Python programming, setting the foundation for your journey into data engineering.
1. Python fundamentals that you should learn.
a. Variables and Data types.
Variables are identifiers in Python, they are like 'number plates' for our data. They are used to label/identify data from each other.
For example:
x = 5
person = "John"
car = "Mercedes"
Variables follow rules of declaration to avoid errors when writing code as stipulated by Python Enhancement Proposal (PEP) which is a document providing information to the Python community, or describing a new feature for Python or its processes or environment. Learn more about PEP here.
Rules to follow include:
i. Do not start variable names with a capital letter. This will raise an error.
Person = 'John' # wrong
person = 'John' # right
ii. Do not start variable names with a number.
001_student = 'Mary' # wrong
student_001 = 'Mary' # right
iii. Do not use Python's reserved keywords to declare variables. For example, def
which is used to declare functions, from
which is used to import Python modules and packages or and
which is a logical operator
def = 'some word' # wrong.
For more PEP rules regarding variable declaration, visit their official website highlighted above.
Python has various data types which include string, float, integer, booleans. You will work with these data types everytime in your data engineering career, so let's dive in!
i. Integer.
This includes negative and positive whole numbers like -10, 10, 10000, 0 e.t.c.
x = 10
ii. Float.
This includes an actual number with a numerical representation denoted with a decimal number for example 10.12, 677.75 e.t.c.
x = 677.75
iii. String
This includes alphabetical words and representations for example John, school e.t.c.
student = 'Maria'
iv. Boolean.
These are basically True / False statements.
is_admin = False
b. Data structures in Python
Data structures in Python are used to store data and are divided into two groups namely mutable and immutable.
Mutable data structures.
These data structures could be changed or altered.
They include lists, sets and dictionaries.
i. Lists:
Lists are used to store different data types, and is mutable.
They are denoted by square brackets []
An example of a list includes:
student = ["Maria", "John", "Jane"]
Lists can be used to store different data types as stated above.
new_list = ["Maria", 15, True]
ii. Sets
A set is an unordered collection of unique items. It’s useful when you want to remove duplicates or perform operations like unions and intersections.
set_1 = {10, 20, 30, 40, 50}
iii. Dictionaries
A dictionary stores data in key-value pairs. It’s perfect when you want to label your data and access it efficiently by key.
student_info = {
'name': 'John Doe',
'age': 18,
'id': 200,
'grade': 12
}
Immutable data structures.
These data structures cannot be changed once declared. An example of these data structures is a tuple.
i. Tuples:
A tuple is similar to a list, but it’s immutable (you can’t change it after creation). Tuples are often used for fixed data collections like coordinates or return values from functions.
coordinates = (10, 35)
c. Conditionals.
Conditionals let your program make decisions based on certain conditions. Python uses if
, elif (else if)
, and else
to control the flow of logic.
An example of a conditional logic flow is as follows:
temperature = 36
if temperature > 30:
print("Today's too hot, I might need some sunscreen")
else:
print("Hmm, it ain't that hot.")
In the above conditional flow, it checks if the temperature
is above 30 degrees, if it is higher than 30, it will print the message, else it will print the message indicated.
d. Loops.
Loops allow you to repeat a block of code multiple times. Python has two main types of loops: for
and while
.
i. for
loop
This loop is great for looping through a list of items, or even a dictionary.
For example:
shopping_list = ["Rice", "Sugar", "Salt", "Eggs", "Olive oil"]
for i in shopping_list:
print(i)
The output would be:
Rice
Sugar
Salt
Eggs
Olive oil
SECRET HINT: If you want your list to print in a single line, use the end=' '
argument while printing out your list
shopping_list = ["Rice", "Sugar", "Salt", "Eggs", "Olive oil"]
for i in shopping_list:
print(i, end=' ')
Output:
Rice Sugar Salt Eggs Olive oil
Looks neat right?
b. while
loop:
A while loop is used when we need our code to run until a certain condition is met, which might be true or false.
Example:
count = 0
while count < 3:
print("Count is", count)
count += 1
e. Mathematical Operators.
Python supports standard math operations you can use to manipulate numbers and data. These operations include addition, subtraction, multiplication, division among other operations.
a = 10
b = 15
print(a + b) # addition
print(a - b) # subtraction
print(a * b) # multiplication
print(b / a) # division
f. Functions.
Functions are blocks of reusable code that can be used in the same Python file or imported to be used in other Python files. Functions are a clean way of writing Python code as it ensures code reusability and readability.
To declare a function, we use the def
keyword as shown below.
def say_hello():
print("Hello world!")
This function when called will print Hello world!
in the terminal. We can reuse this function within another function or another Python file, making functions really important in Python.
To call a function in Python:
say_hello()
This will run the function and output our message in the terminal.
2. Small Python project.
Since we have learnt some Python fundamentals, how about we put those skills to practice?
Let's create a function that calculates the area of a rectangle.
i. Let's declare the measurements of the rectangle:
Create a file called area.py
in an IDE of your choice. If you have not downloaded an IDE, download Visual Studio Code here.
Let l
be the length of the rectangle and w
be the width of the rectangle. All measurements are in centimetres(cm).
l = 10
w = 30
ii. Let's declare our function
Let's now declare our function rectangle_area
that will calculate the area of the rectangle using the measurements declared above which takes the length and width as arguments.
def rectangle_area(l, w):
area = l * w
print('The area is:', area)
After declaring the function, call our function to run our function and calculate the area.
rectangle_area(l, w)
Now, run your file and your output should be:
The area is: 300
There you go! You have created your first Python project which calculates the area of a rectangle.
3. Python resources.
This post alone cannot make you a Python guru! You need more practice and knowledge of Python fundamentals e.g. reading and writing files, Object Oriented Programming(OOP) among others.
I'd recommend the following resources to assist you in the path of being a Python beast.
For those who prefer reading as a mode of learning:
For those who prefer videos as a mode of learning:
Final Remarks.
This post covers some of the important Python fundamentals that are useful in data engineering. Continue learning and practising and you will be a Python guru in no time!
Thanks for reading this post. Please leave a like, comment and if you do not mind, please share this post to other people who would like to start learning Python.