Python for Data Science Cheatsheet
Pandas is a panda package for data manipulation and visualization. Pandas is built ontop of Numpy and Matlplotlib(data visualization). In pandas, data analysis is done on rectangular data which is represented as dataframes, in SQL they are referred to as tables. We will be using the data Exploring a Dataframe using the table Returns first few rows of the dataframe print(alcohol_df.head()) To get the names of columns, and their data types and if they contain missing values print(alcohol_df.info()) To get the number of rows and columns of the datframe displayed as a tuple. This is an attribute, not a method. print(alcohol_df.shape) To get the summary statistics of the dataframe; print(alcohol_df.describe()) To get the values in a 2D NumPy array print(alcohol_df.values) To get the column names print(alcohol_df.columns) To get row numbers/row names print(alcohol_df.index) 1. Dataframes Change order of the rows by sorting them using a particular column. This automatically sorts be ascending order, from smallest to the largest. sorted_drinks = alcohol_df.sort_values("region") print(sorted_drinks) To sort by descending order sorted_drinks = alcohol_df.sort_values("region", ascending=False) print(sorted_drinks) Sort my multiple variables by passing a list to the method sorted_drinks = alcohol_df.sort_values(["region", "year"], ascending=[True, False]) print(sorted_drinks) Subset columns to only see the data in a particular column print(alcohol_df["region"]) To subset multiple columns to only see the data in the particular columns print(alcohol_df[["region", "year"]]) 2. Aggregating Data 3. Slicing and Indexing Data 4. Creating and Visualizing data

Pandas is a panda package for data manipulation and visualization. Pandas is built ontop of Numpy and Matlplotlib(data visualization). In pandas, data analysis is done on rectangular data which is represented as dataframes, in SQL they are referred to as tables.
We will be using the data
Exploring a Dataframe using the table
Returns first few rows of the dataframe
print(alcohol_df.head())
To get the names of columns, and their data types and if they contain missing values
print(alcohol_df.info())
To get the number of rows and columns of the datframe displayed as a tuple. This is an attribute, not a method.
print(alcohol_df.shape)
To get the summary statistics of the dataframe;
print(alcohol_df.describe())
To get the values in a 2D NumPy array
print(alcohol_df.values)
To get the column names
print(alcohol_df.columns)
To get row numbers/row names
print(alcohol_df.index)
1. Dataframes
Change order of the rows by sorting them using a particular column. This automatically sorts be ascending order, from smallest to the largest.
sorted_drinks = alcohol_df.sort_values("region")
print(sorted_drinks)
To sort by descending order
sorted_drinks = alcohol_df.sort_values("region", ascending=False)
print(sorted_drinks)
Sort my multiple variables by passing a list to the method
sorted_drinks = alcohol_df.sort_values(["region", "year"], ascending=[True, False])
print(sorted_drinks)
Subset columns to only see the data in a particular column
print(alcohol_df["region"])
To subset multiple columns to only see the data in the particular columns
print(alcohol_df[["region", "year"]])