Introduction to Data Analysis using Microsoft Excel
Presently, more and more businesses are utilizing data to make decisions as opposed to using strategies like their intuition or speculation. This has been a result of the rise of big data, with companies like Facebook having about 3.06 billion active users every month. When aptly analysed, such data is highly beneficial not only to business owners who optimize their businesses but also to users who get more personalized experiences, among other advantages. Consequently, any new data analyst must select a powerful tool that has a fairly gentle learning curve to analyze complex datasets. Microsoft Excel is one such tool that fits this description and is used by most data analysts. Useful Data Analysis Features in Excel Formulas and Functions Sorting and Filtering Data Logical Functions Conditional Formatting and Data Validation Excel Tables Charts and Graphics PivotTables and PivotCharts Data Cleaning Formulas and Functions Always starts with an equal sign, a formula is an expression that is used for calculation on data in a worksheet. Example: =A1+A2 A function, on the other hand, is a predefined formula that simplifies complex calculations. Examples: =SUM(A1:A10)- Calculates the summation =AVERAGE(B1:B5)- Calculates the mean Sorting and Filtering Data Sorting arranges the data into a meaningful order, either in ascending or descending order. Filtered data displays only the rows that meet the criteria that you specify and hides rows that you do not want displayed. In Excel, under the Home tab, select Sort & Filter to display the sorting options available and the filter option. For example, a data analyst has a list of customer orders with columns for “Product” and “Price.” To find the most expensive orders for a specific product, they would first filter the “Product” column to show only the desired item and then sort the “Price” column in descending order. Logical Functions Excel offers 4 logical functions, ie, AND, OR, XOR, and NOT. These functions are useful when making more than one comparison or testing multiple conditions instead of a single one. AND- returns true if both arguments satisfy the condition eg. =AND(A1>10,A2=10, B1=10, A210): returns FALSE if the value in A1 is greater than 10, TRUE otherwise. In addition to the above logical functions, Excel provides conditional functions, ie, IF, IFS, and SWITCH, that are used to make a decision based on a condition. Conditional Formatting and Data Validation Nested under the Home tab, Conditional formatting is a useful feature to help analysts visually emphasize data and identify trends. For example, to find out the products that generated the highest and lowest revenue, a data analyst would select the “Revenue” column and apply conditional formatting. This highlights top-performing products and underperforming ones in a different colour. Data Validation is an equally important feature as it restricts the type of data that can be entered into a cell. For example to data validation rules can be applied to the “Age” column to prevent illogical entries like negative numbers or extremely large numbers. Excel Tables Using tables offers advantages such as automatic formatting and filtering, structured referencing, range expansion, and better readability. To convert data to a table: Click the Insert tab after selecting the data range Select Table and tick “My table has headers” Click OK. Charts and Graphics Charts help to visually present data in different forms depending on the variables being represented. When selecting a chart to represent your data, it is important to evaluate which chart would best tell the data story. Some of the charts in Excel include: Column chart- compares totals of different categories over time Bar chart- compares categories with long labels Line chart- shows trends over time Pie chart- shows part of the whole Area chart- visualizes cumulative changes of multiple variables over time Scatter chart- visualizes the relationship between 2 numerical values Frequently used in dashboards, these charts are easily generated by selecting the data range and selecting the specific chart needed under the Insert tab. Additionally, sparklines are bars that are embedded into a single cell that visualize data trends. Similarly, data bars are a type of conditional formatting that adds colour to cells in a range to indicate how large the cell values are compared to the other values. PivotTables and PivotCharts PivotTables are a functionality that helps an analyst group and summarize large data sets. One can add or remove values, perform calculations, and filter and sort data sets. For example, when working with a large dataset with numerous columns and rows of numerical data, such as traffic data and user demographics. Pivot tables enable the analyst to perform calculations that properly summarize this information, like finding the sums and counts. This enables efficient data analysis, like answering ques

Presently, more and more businesses are utilizing data to make decisions as opposed to using strategies like their intuition or speculation. This has been a result of the rise of big data, with companies like Facebook having about 3.06 billion active users every month.
When aptly analysed, such data is highly beneficial not only to business owners who optimize their businesses but also to users who get more personalized experiences, among other advantages. Consequently, any new data analyst must select a powerful tool that has a fairly gentle learning curve to analyze complex datasets. Microsoft Excel is one such tool that fits this description and is used by most data analysts.
Useful Data Analysis Features in Excel
- Formulas and Functions
- Sorting and Filtering Data
- Logical Functions
- Conditional Formatting and Data Validation
- Excel Tables
- Charts and Graphics
- PivotTables and PivotCharts
- Data Cleaning
Formulas and Functions
Always starts with an equal sign, a formula is an expression that is used for calculation on data in a worksheet.
Example: =A1+A2
A function, on the other hand, is a predefined formula that simplifies complex calculations.
Examples:
=SUM(A1:A10)- Calculates the summation
=AVERAGE(B1:B5)- Calculates the mean
Sorting and Filtering Data
Sorting arranges the data into a meaningful order, either in ascending or descending order.
Filtered data displays only the rows that meet the criteria that you specify and hides rows that you do not want displayed.
In Excel, under the Home tab, select Sort & Filter to display the sorting options available and the filter option.
For example, a data analyst has a list of customer orders with columns for “Product” and “Price.” To find the most expensive orders for a specific product, they would first filter the “Product” column to show only the desired item and then sort the “Price” column in descending order.
Logical Functions
Excel offers 4 logical functions, ie, AND, OR, XOR, and NOT. These functions are useful when making more than one comparison or testing multiple conditions instead of a single one.
- AND- returns true if both arguments satisfy the condition
eg. =AND(A1>10,A2<30): the formula returns TRUE if the value in cell A1 is greater than 10 and the value in A2 is less than 30, FALSE otherwise.
- OR- returns TRUE if at least one argument is true
eg. =OR(A1>=10, B1<10): returns TRUE if A1 is greater than or equal to 10 or B1 is less than 10 or both. If A1 is less than 10 and B1 is greater than 10, then the formula returns FALSE.
- XOR- returns a logical exclusive Or of all arguments
eg. =XOR(A1>=10, A2<10): returns TRUE if one of either A1 is greater or equal to 10 or A2 is less than 10, returns FALSE otherwise.
- NOT- negates the logical value of its argument
eg. =NOT(A1>10): returns FALSE if the value in A1 is greater than 10, TRUE otherwise.
In addition to the above logical functions, Excel provides conditional functions, ie, IF, IFS, and SWITCH, that are used to make a decision based on a condition.
Conditional Formatting and Data Validation
Nested under the Home tab, Conditional formatting is a useful feature to help analysts visually emphasize data and identify trends. For example, to find out the products that generated the highest and lowest revenue, a data analyst would select the “Revenue” column and apply conditional formatting. This highlights top-performing products and underperforming ones in a different colour.
Data Validation is an equally important feature as it restricts the type of data that can be entered into a cell. For example to data validation rules can be applied to the “Age” column to prevent illogical entries like negative numbers or extremely large numbers.
Excel Tables
Using tables offers advantages such as automatic formatting and filtering, structured referencing, range expansion, and better readability.
To convert data to a table:
- Click the Insert tab after selecting the data range
- Select Table and tick “My table has headers”
- Click OK.
Charts and Graphics
Charts help to visually present data in different forms depending on the variables being represented. When selecting a chart to represent your data, it is important to evaluate which chart would best tell the data story.
Some of the charts in Excel include:
- Column chart- compares totals of different categories over time
- Bar chart- compares categories with long labels
- Line chart- shows trends over time
- Pie chart- shows part of the whole
- Area chart- visualizes cumulative changes of multiple variables over time
- Scatter chart- visualizes the relationship between 2 numerical values
Frequently used in dashboards, these charts are easily generated by selecting the data range and selecting the specific chart needed under the Insert tab.
Additionally, sparklines are bars that are embedded into a single cell that visualize data trends. Similarly, data bars are a type of conditional formatting that adds colour to cells in a range to indicate how large the cell values are compared to the other values.
PivotTables and PivotCharts
PivotTables are a functionality that helps an analyst group and summarize large data sets. One can add or remove values, perform calculations, and filter and sort data sets.
For example, when working with a large dataset with numerous columns and rows of numerical data, such as traffic data and user demographics. Pivot tables enable the analyst to perform calculations that properly summarize this information, like finding the sums and counts. This enables efficient data analysis, like answering questions like the amount of total website traffic for different categories of users in a certain month.
In Excel, select the data range and click PivotTable under the Insert tab. Drag variables into either the Rows or Values, depending on the insight being sort. Adding slicers to the PivotChart adds more interactivity to the visualization.
Data Cleaning
It is no secret that the quality of the output of an analyst’s visualization greatly depends on how thoroughly data is preprocessed. With data cleaning being one of the major steps of data preprocessing, it is paramount that great effort is put into ensuring that the dataset is as optimal for use as it can be.
Some of the cleaning techniques used in Excel include:
- Removing duplicate rows
- Handling missing values- this can be done by filling a specific value like 0 or filling in the mean(=AVERAGE(A:A) among others.
- Standardizing text eg =LOWER(A1), =UPPER(A1), =TRIM(A1) etc.
- Combining columns eg, combining first name and last name, =A1& “ ”&B1
- Converting the data type eg, numeric fields mistakenly stored as text
- Using the Find and Replace feature
Mentioned above are the basic features that an individual who is new to data analysis would find useful in Microsoft Excel. Although not an exhaustive list, it suffices to nudge you in the right direction.