Technical, Data Analytics

Course 2: Ask questions to make data driven decisions, Module 3: Spreadsheet magic

make data driven dicisions

Spreadsheets are a key data analytics tool. Here, we will learn both why and how data analysts use spreadsheets in their work. We will also investigate how structured thinking helps analysts understand problems and come up with solutions.

Learning Objectives


  • Discuss the data analyst’s use of spreadsheets with reference to roles and responsibilities
  • Demonstrate the use of spreadsheets to complete basic tasks of the data analyst, including entering and organizing data
  • Demonstrate an understanding of the use of formulas in spreadsheets including a definition and specific examples
  • Compare formulas and functions with reference to similarities and differences
  • Describe the key ideas associated with structured thinking including the problem domain, scope of work, and context

Spreadsheets and the data life cycle

To better understand the benefits of using spreadsheets in data analytics, let’s explore how they relate to each phase of the data life cycle: plan, capture, manage, analyse, archive, and destroy.

  • Plan for the users who will work within a spreadsheet by developing organizational standards. This can mean formatting your cells, the headings you choose to highlight, the color scheme, and the way you order your data points. When you take the time to set these standards, you will improve communication, ensure consistency, and help people be more efficient with their time.Capture data by the source by connecting spreadsheets to other data sources, such as an online survey application or a database. This data will automatically be updated in the spreadsheet. That way, the information is always as current and accurate as possible.Manage different kinds of data with a spreadsheet. This can involve storing, organizing, filtering, and updating information. Spreadsheets also let you decide who can access the data, how the information is shared, and how to keep your data safe and secure. Analyze data in a spreadsheet to help make better decisions. Some of the most common spreadsheet analysis tools include formulas to aggregate data or create reports, and pivot tables for clear, easy-to-understand visuals. Archive any spreadsheet that you don’t use often, but might need to reference later with built-in tools. This is especially useful if you want to store historical data before it gets updated. Destroy your spreadsheet when you are certain that you will never need it again, if you have better backup copies, or for legal or security reasons. Keep in mind, lots of businesses are required to follow certain rules or have measures in place to make sure data is destroyed properly.

  • Resources for more information

Spreadsheet shortcuts can help you become more efficient with spreadsheets. If you’d like to learn more, you can explore the collection of Google Sheets shortcuts, or visit the Microsoft Excel shortcuts page if you are using Excel. Both of these resources contain a list of spreadsheet shortcuts you can save and reference as you work more with spreadsheets on your own.

In this module, we are learning about hands-on spreadsheet tutorial and basics. It won’t be possible for me to share the tutorials in writing, because there are lots of official and high quality free tutorial out there. Rather, if you like to learn more about spreadsheet then please go to those link above or go to W3 School site (my personal preference). Good luck 🙂

Common spreadsheet errors and fixes

The .pdf file in the attachment below includes common spreadsheet errors and fixes. You can save this file for future reference:

Some Tips and Tricks

Here are a few best practices and helpful tips. These strategies will help you avoid spreadsheet errors to begin with, making your life in analytics a lot less stressful:

  1. Filter data to make your spreadsheet less complex and busy.
  2. Use and freeze headers so you know what is in each column, even when scrolling.
  3. When multiplying numbers, use an asterisk (*) not an X.
  4. Start every formula and function with an equal sign (=).
  5. Whenever you use an open parenthesis, make sure there is a closed parenthesis on the other end to match.
  6. Change the font to something easy to read.
  7. Set the border colours to white so that you are working in a blank sheet. 
  8. Create a tab with just the raw data, and a separate tab with just the data you need.

Know your errors

ErrorDescriptionExample
#DIV/0!A formula is trying to divide a value in a cell by 0 (or an empty cell with no value)=B2/B3, when the cell B3 contains the value 0
#ERROR!(Google Sheets only)  Something can’t be interpreted as it has been input. This is also known as a parsing error.=COUNT(B1:D1 C1:C10) is invalid because the cell ranges aren’t separated by a comma
#N/AA formula can’t find the dataThe cell being referenced can’t be found
#NAME?The name of a formula or function used isn’t recognizedThe name of a function is misspelled
#NUM!The spreadsheet can’t perform a formula calculation because a cell has an invalid numeric value=DATEDIF(A4, B4, “M”) is unable to calculate the number of months between two dates because the date in cell A4 falls after the date in cell B4
#REF!A formula is referencing a cell that isn’t validA cell used in a formula was in a column that was deleted
#VALUE!A general error indicating a problem with a formula or with referenced cellsThere could be problems with spaces or text, or with referenced cells in a formula; you may have additional work to find the source of the problem.

Pro tip: Spotting errors in spreadsheets with conditional formatting

Conditional formatting can be used to highlight cells a different colour based on their contents. This feature can be extremely helpful when you want to locate all errors in a large spreadsheet. For example, using conditional formatting, you can highlight in yellow all cells that contain an error, and then work to fix them.

Conditional formatting in Microsoft Excel

To set up conditional formatting in Microsoft Excel to highlight all cells in a spreadsheet that contain errors, do the following:

  1. Click the gray triangle above row number 1 and to the left of Column A to select all cells in the spreadsheet.
  2. From the main menu, click Home, and then click Conditional Formatting to select Highlight Cell Rules > More Rules.
  3. For Select a Rule Type, choose Use a formula to determine which cells to format.
  4. For Format values where this formula is true, enter =ISERROR(A1).
  5. Click the Format button, select the Fill tab, select yellow (or any other color), and then click OK.
  6. Click OK to close the format rule window.

To remove conditional formatting, click Home and select Conditional Formatting, and then click Manage Rules. Locate the format rule in the list, click Delete Rule, and then click OK.

Conditional formatting in Google Sheets

To set up conditional formatting in Google Sheets to highlight all cells in a spreadsheet that contain errors, do the following:

  1. Click the empty rectangle above row number 1 and to the left of Column A to select all cells in the spreadsheet.
  2. From the main menu, click Format and select Conditional Formatting to open the Conditional format rules pane on the right.
  3. While in the Single Colour tab, under Format rules, use the drop-down to select Custom formula is, enter =ISERROR(A1), select yellow (or any other colour) for the formatting style, and then click Done.

To remove conditional formatting, click Format and select Conditional Formatting, and then click the Trash icon for the format rule.

Spreadsheet error resources

To learn more and read about additional examples of errors and solutions, explore these resources:

  • Microsoft Formulas and Functions: This resource describes how to avoid broken formulas and how to correct errors in Microsoft Excel. This is a useful reference to have saved in case you run into a specific error and need to find solutions quickly while working in Excel. 
  • When Your Formula Doesn’t Work: Formula Parse Errors in Google Sheets: This resource is a guide to finding and fixing some common errors in Google Sheets. If you are working with Google Sheets, you can use this as a quick reference for solving problems you might encounter working on your own.

Formulas VS Functions


Quick Reference: Functions in spreadsheets

As a quick refresher, a function is a preset command that automatically performs a specific process or task using the data in a spreadsheet. Functions give data analysts the ability to do calculations, which can be anything from simple arithmetic to complex equations. Use this reading to help you keep track of some of the most useful options.

Functions

The basics

  • Just like formulas, start all of your functions with an equal sign; for example =SUM. The equal sign tells the spreadsheet that what follows is part of a function, not just a word or number in a cell. 
  • After you enter the equal sign, most spreadsheet applications will display an autocomplete menu that lists valid functions, names, and text strings. This is a great way to create and edit functions while avoiding typing and syntax errors.
  • A fun way to learn new functions is by simply typing an equal sign and a single letter of the alphabet. Choose one of the options that pops up and learn what that function does.

Difference between formulas and functions

  • A formula is a set of instructions used to perform a calculation using the data in a spreadsheet.
  • A function is a preset command that automatically performs a specific process or task using the data in a spreadsheet.

Popular functions

A lot of people don’t realize that keyboard shortcuts like cut, save, and find are actually functions. These functions are built into an application and are amazing time-savers. Using shortcuts lets you do more with less effort. They can make you more efficient and productive because you are not constantly reaching for the mouse and navigating menus. Use these links to discover the most popular shortcuts, for Chromebook, PC, and Mac.

Auto-filling

The lower-right corner of each cell has a fill handle. It is a small green square in Microsoft Excel and a small blue circle in Google Sheets.

  • Click the fill handle for a cell and drag it down a column to auto-fill other cells in the column with the same formula or function used in that cell. 
  • Click the fill handle for a cell and drag it across a row to auto-fill other cells in the row with the same formula or function used in that cell. 

Relative, absolute, and mixed references

  • Relative references (cells referenced without a dollar sign, like A2) will change when you copy and paste the function into a different cell. With relative references, the location of the cell that contains the function determines the cells used by the function.
  • Absolute references (cells fully referenced with a dollar sign, like $A$2) will not change when you copy and paste the function into a different cell. With absolute references, the cells referenced always remain the same.
  • Mixed references (cells partially referenced with a dollar sign, like $A2 or A$2) will change when you copy and paste the function into a different cell. With mixed references, the location of the cell that contains the function determines the cells used by the function, but only the row or column is relative (not both).
  • In spreadsheets, you can press the F4 key to toggle between relative, absolute, and mixed references in a function. Click the cell containing the function, highlight the referenced cells in the formula bar, and then press F4 to toggle between and select relative, absolute, or mixed referencing.

Data ranges

  • When you click a cell that contains a function, colored data ranges in the formula bar indicate which cells are being used in the spreadsheet. There are different colors for each unique range in a function.
  • Colored data ranges help prevent you from getting lost in complex functions.
  • In spreadsheets, you can press the F2 key to highlight the range of data used by a function. Click the cell containing the function, highlight the range of data used by the function in the formula bar, and then press F2. The spreadsheet will go to and highlight the cells specified by the range.

Data ranges evaluated for a condition

COUNTIF is an example of a function that returns a value based on a condition that the data range is evaluated for. The function counts the number of cells that meet the criteria. For example, in an expense spreadsheet, use COUNTIF to count the number of cells that contain a reimbursement for “airfare.”

For more information, refer to:

Keyboard shortcuts

You can save these shortcuts for future reference. Feel free to download a .pdf version of the shortcuts below:

Function Keayboard Shortcuts


Formulas Keayboard Shortcuts

Scope Of Work or SOW (not Statement of Work)

What is SOW?

As a data analyst, it’s hard to overstate the importance of an SOW document. A well-defined SOW keeps you, your team, and everyone involved with a project on the same page. It ensures that all contributors, sponsors, and stakeholders share the same understanding of the relevant details. It can include data collection, cleaning, analysis, visualisation, reporting etc

Why do you need an SOW?

The point of data analysis projects is to complete business tasks that are useful to the stakeholders. Creating an SOW helps to make sure that everyone involved, from analysts and engineers to managers and stakeholders, shares the understanding of what those business goals are, and the plan for accomplishing them.

Clarifying requirements and setting expectations are two of the most important parts of a project. Recall the first phase of the Data Analysis Process—asking questions

As you ask more and more questions to clarify requirements, goals, data sources, stakeholders, and any other relevant info, an SOW helps you formalize it all by recording all the answers and details. In this context, the word “ask” means two things. Preparing to write an SOW is about asking questions to learn the necessary information about the project, but it’s also about clarifying and defining what you’re being asked to accomplish, and what the limits or boundaries of the “ask” are. After all, if you can’t make a distinction between the business questions you are and aren’t responsible for answering, then it’s hard to know what success means!

What is a good SOW?

There’s no standard format for an SOW. They may differ significantly from one organization to another, or from project to project. However, they all have a few foundational pieces of content in common. 

  • Deliverables: What work is being done, and what things are being created as a result of this project? When the project is complete, what are you expected to deliver to the stakeholders? Be specific here. Will you collect data for this project? How much, or for how long?   

Avoid vague statements. For example, “fixing traffic problems” doesn’t specify the scope. This could mean anything from filling in a few potholes to building a new overpass. Be specific! Use numbers and aim for hard, measurable goals and objectives. For example: “Identify top 10 issues with traffic patterns within the city limits, and identify the top 3 solutions that are most cost-effective for reducing traffic congestion.”

  • Milestones: This is closely related to your timeline. What are the major milestones for progress in your project? How do you know when a given part of the project is considered complete? 

Milestones can be identified by you, by stakeholders, or by other team members such as the Project Manager. Smaller examples might include incremental steps in a larger project like “Collect and process 50% of required data (100 survey responses)”, but may also be larger examples like ”complete initial data analysis report” or “deliver completed dashboard visualizations and analysis reports to stakeholders”. 

  • Timeline: Your timeline will be closely tied to the milestones you create for your project. The timeline is a way of mapping expectations for how long each step of the process should take. The timeline should be specific enough to help all involved decide if a project is on schedule. When will the deliverables be completed? How long do you expect the project will take to complete? If all goes as planned, how long do you expect each component of the project will take? When can we expect to reach each milestone?
  • Reports: Good SOWs also set boundaries for how and when you’ll give status updates to stakeholders. How will you communicate progress with stakeholders and sponsors, and how often? Will progress be reported weekly? Monthly? When milestones are completed? What information will status reports contain?

At a minimum, any SOW should answer all the relevant questions in the above areas. Note that these areas may differ depending on the project. But at their core, the SOW document should always serve the same purpose by containing information that is specific, relevant, and accurate. If something changes in the project, your SOW should reflect those changes.

What is in and out of scope?

SOWs should also contain information specific to what is and isn’t considered part of the project. The scope of your project is everything that you are expected to complete or accomplish, defined to a level of detail that doesn’t leave any ambiguity or confusion about whether a given task or item is part of the project or not. 

Notice how the previous example about studying traffic congestion defined its scope as the area within the city limits. This doesn’t leave any room for confusion — stakeholders need only to refer to a map to tell if a stretch of road or intersection is part of the project or not.  Defining requirements can be trickier than it sounds, so it’s important to be as specific as possible in these documents, and to use quantitative statements whenever possible. 

For example, assume that you’re assigned to a project that involves studying the environmental effects of climate change on the coastline of a city: How do you define what parts of the coastline you are responsible for studying, and which parts you are not?

In this case, it would be important to define the area you’re expected to study using GPS locations, or landmarks. Using specific, quantifiable statements will help ensure that everyone has a clear understanding of what’s expected.

An example SoW Document

The importance of context

Context in data analytics is the condition and circumstances that surround and give meaning to the data. Context is important in data analytics because it helps make disorganized data accessible and understood. The fact is, data has little value if it is not paired with context.

Understanding the context behind the data can help us make it more meaningful at every stage of the data analysis process. For example, you might be able to make a few guesses about what you’re looking at in the following table, but you couldn’t be certain without more context. 

201028000
200518000
200023000
199510000

On the other hand, if the first column was labeled to represent the years when a survey was conducted, and the second column showed the number of people who responded to that survey, then the table would start to make a lot more sense. Take this a step further, and you might notice that the survey is conducted every 5 years. This added context helps you understand why there are five-year gaps in the table. 

Years (Collected every 5 years)Respondents
201028000
200518000
200023000
199510000

Context can turn raw data into meaningful information. It is very important for data analysts to contextualize their data. This means giving the data perspective by defining it. To do this, you need to identify:

  • Who: The person or organization that created, collected, and/or funded the data collection
  • What: The things in the world that data could have an impact on
  • Where: The origin of the data
  • When: The time when the data was created or collected
  • Why: The motivation behind the creation or collection
  • How: The method used to create or collect it

Understanding and including the context is important during each step of your analysis process, so it is a good idea to get comfortable with it early in your career. For example, when you collect data, you’ll also want to ask questions about the context to make sure that you understand the business and business process. During organisation, the context is important for your naming conventions, how you choose to show relationships between variables, and what you choose to keep or leave out. And finally, when you present, it is important to include contextual information so that your stakeholders understand your analysis.

Glossary terms from module 3

Terms and definitions for Course 2, Module 3

AVERAGE: A spreadsheet function that returns an average of the values from a selected range 

Borders: Lines that can be added around two or more cells on a spreadsheet

Cell reference: A cell or a range of cells in a worksheet typically used in formulas and functions

COUNT: A spreadsheet function that counts the number of cells in a range that meet a specific criteria

Equation: A calculation that involves addition, subtraction, multiplication, or division (also called a math expression)

Fill handle: A box in the lower-right-hand corner of a selected spreadsheet cell that can be dragged through neighboring cells in order to continue an instruction

Filtering: The process of showing only the data that meets a specified criteria while hiding the rest

Header: The first row in a spreadsheet that labels the type of data in each column

Math expression: A calculation that involves addition, subtraction, multiplication, or division (also called an equation)

Math function: A function that is used as part of a mathematical formula

MAX: A spreadsheet function that returns the largest numeric value from a range of cells

MIN: A spreadsheet function that returns the smallest numeric value from a range of cells

Open data: Data that is available to the public

Operator: A symbol that names the operation or calculation to be performed

Order of operations: Using parentheses to group together spreadsheet values in order to clarify the order in which operations should be performed

Problem domain: The area of analysis that encompasses every activity affecting or affected by a problem

Range: A collection of two or more cells in a spreadsheet

Report: A static collection of data periodically given to stakeholders 

Return on investment (ROI): A formula that uses the metrics of investment and profit to evaluate the success of an investment

Revenue: The total amount of income generated by the sale of goods or services 

Scope of work (SOW): An agreed-upon outline of the tasks to be performed during a project

Sorting: The process of arranging data into a meaningful order to make it easier to understand, analyze, and visualize

SUM: A spreadsheet function that adds the values of a selected range of cells

Series Navigation<< Course 2: Ask questions to make data driven decisions, Module 2: Make data-driven decisionsCourse 2: Ask questions to make data driven decisions, Module 4: Always remember the stakeholder >>

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.