Python for Data Science Cheatsheet

Pandas is a panda package for data manipulation and visualization. Pandas is built ontop of Numpy and Matlplotlib(data visualization). In pandas, data analysis is done on rectangular data which is represented as dataframes, in SQL they are referred to as tables. We will be using the data Exploring a Dataframe using the table Returns first few rows of the dataframe print(alcohol_df.head()) To get the names of columns, and their data types and if they contain missing values print(alcohol_df.info()) To get the number of rows and columns of the datframe displayed as a tuple. This is an attribute, not a method. print(alcohol_df.shape) To get the summary statistics of the dataframe; print(alcohol_df.describe()) To get the values in a 2D NumPy array print(alcohol_df.values) To get the column names print(alcohol_df.columns) To get row numbers/row names print(alcohol_df.index) 1. Dataframes Change order of the rows by sorting them using a particular column. This automatically sorts be ascending order, from smallest to the largest. sorted_drinks = alcohol_df.sort_values("region") print(sorted_drinks) To sort by descending order sorted_drinks = alcohol_df.sort_values("region", ascending=False) print(sorted_drinks) Sort my multiple variables by passing a list to the method sorted_drinks = alcohol_df.sort_values(["region", "year"], ascending=[True, False]) print(sorted_drinks) Subset columns to only see the data in a particular column print(alcohol_df["region"]) To subset multiple columns to only see the data in the particular columns print(alcohol_df[["region", "year"]]) 2. Aggregating Data 3. Slicing and Indexing Data 4. Creating and Visualizing data

May 10, 2025 - 12:15

Pandas is a panda package for data manipulation and visualization. Pandas is built ontop of Numpy and Matlplotlib(data visualization). In pandas, data analysis is done on rectangular data which is represented as dataframes, in SQL they are referred to as tables.
We will be using the data

Exploring a Dataframe using the table

Returns first few rows of the dataframe

print(alcohol_df.head())

To get the names of columns, and their data types and if they contain missing values

print(alcohol_df.info())

To get the number of rows and columns of the datframe displayed as a tuple. This is an attribute, not a method.

print(alcohol_df.shape)

To get the summary statistics of the dataframe;

print(alcohol_df.describe())

To get the values in a 2D NumPy array

print(alcohol_df.values)

To get the column names

print(alcohol_df.columns)

To get row numbers/row names

print(alcohol_df.index)

1. Dataframes

Change order of the rows by sorting them using a particular column. This automatically sorts be ascending order, from smallest to the largest.

sorted_drinks = alcohol_df.sort_values("region")
print(sorted_drinks)

To sort by descending order

sorted_drinks = alcohol_df.sort_values("region", ascending=False)
print(sorted_drinks)

Sort my multiple variables by passing a list to the method

sorted_drinks = alcohol_df.sort_values(["region", "year"], ascending=[True, False])
print(sorted_drinks)

Subset columns to only see the data in a particular column

print(alcohol_df["region"])

To subset multiple columns to only see the data in the particular columns

print(alcohol_df[["region", "year"]])

2. Aggregating Data

3. Slicing and Indexing Data

4. Creating and Visualizing data