- Foundation Course Module 1: Introduction of data analytics and analytical thinking
- Foundation Course Module 2 : The wonderful world of data
- Foundation Course Module 3 : Setup of data analytics toolbox
- Foundation Course Module 4: Becoming a fair and impactful data professional
- Foundation Course: Glossary
- Course 2: Ask questions to make data driven decisions, Module 1: Ask effective questions
- Course 2: Ask questions to make data driven decisions, Module 2: Make data-driven decisions
- Course 2: Ask questions to make data driven decisions, Module 3: Spreadsheet magic
- Course 2: Ask questions to make data driven decisions, Module 4: Always remember the stakeholder
- Course 3: Prepare Data For Exploration: Learning objectives and overviews
- Course 3: Prepare Data For Exploration, Module 1: Data types and structures
- Course 3: Prepare Data For Exploration, Module 2: Data responsibility
- Course 3: Prepare Data For Exploration, Module 3: Database Essentials
- Course 3: Prepare Data For Exploration, Module 4: Organise and Secure Data
- Course 4: Process Data from Dirty to Clean: Overview
- Course 4: Process Data from Dirty to Clean, Module 1: The importance of integrity
- Course 4: Process Data from Dirty to Clean, Module 2: Clean it up
- Course 4: Process Data from Dirty to Clean, Module 3: SQL
- Course 4: Process Data from Dirty to Clean, Module 4: Verify and Report Results
- Course 5: Analyse Data to Answer Questions, Module 1: Organise data for more effective analysis
- Course 5: Analyse Data to Answer Questions, Module 2: Format and adjust data
- Course 5: Analyse Data to Answer Questions, Module 3: Aggregate data for analysis
- Course 5: Analyse Data to Answer Questions, Module 4: Perform Data Calculations
- Course 6: Share Data Through the Art of Visualisation, Course Overview plus Module 1: Visualise Data
- Course 6: Share Data Through the Art of Visualisation, Course Overview plus Module 2: Create Data Visualisation with Tableau
In this part of the course, we will learn about the data life cycle and data analysis process. They are both relevant to our work in this program and on the job. We will also be introduced to applications that help guide data through the data analysis process.
Module 2 Learning Objectives
- Identify key software applications critical to the work of a data analyst, including spreadsheets, databases, query languages, and visualization tools
- Identify relationships between the data analysis process and the courses in the Google Data Analytics Certificate
- Explain the data analysis process, making specific reference to the ask, prepare, process, analyse, share, and act phases
- Discuss the use of data in everyday life decisions
- Discuss the role of spreadsheets, query languages, and data visualization tools in data analytics
- Discuss the phases of the data life cycle
Variations of the data life cycle
You have learned that there are six stages to the data life cycle. Here’s a recap:
- Plan: Decide what kind of data is needed, how it will be managed, and who will be responsible for it.
- Capture: Collect or bring in data from a variety of different sources.
- Manage: Care for and maintain the data. This includes determining how and where it is stored and the tools used to do so.
- Analyse: Use the data to solve problems, make decisions, and support business goals.
- Archive: Keep relevant data stored for long-term and future reference.
- Destroy: Remove data from storage and delete any shared copies of the data.
Note: Be careful not to confuse the six stages of the data life cycle (plan, capture, manage, analyse, archive, and destroy) with the six phases of the data analysis process (ask, prepare, process, analyse, share, and act). They are not interchangeable.
The data life cycle provides a generic or common framework for how data is managed.
Fact vs Gut feeling/ instinct: Why gut instinct can be a problem
At the heart of data-driven decision-making is data. Therefore, it’s essential that data analysts focus on the data to ensure they make informed decisions. If you ignore data by preferring to make decisions based on your own experience, your decisions may be biased. But even worse, decisions based on gut instinct without any data to back them up can cause mistakes.
Consider an example of a restaurant entrepreneur, partnering with a well-known chef to develop a new restaurant in a bustling part of the city’s central shopping district. The chef has several restaurants across the city. Banking on their reputation, the restaurant entrepreneur and chef followed gut instinct and created another uniquely themed restaurant. However, fundraising efforts fell short to fund the opening of the restaurant after months of planning and preparation. The property will go back on the market to be sold at a loss. Had the entrepreneur done more research, they would’ve found data showing prospective customers in this new restaurant location were very different from the chef’s other restaurants.
The more you understand the data related to a project, the easier it will be to figure out what is required. These efforts will also help you identify errors and gaps in your data so you can communicate your findings more effectively. Sometimes experience helps you make a connection that no one else would notice. For example, a detective might be able to crack open a case because they remember an old case just like the one they’re solving today. It’s not just gut instinct.
Data + business knowledge = mystery solved
Blending data with business knowledge, plus maybe a touch of gut instinct, will be a common part of your process as a junior data analyst. The key is figuring out the exact mix for each particular project. A lot of times, it will depend on the goals of your analysis. That is why analysts often ask, “How do I define success for this project?”
In addition, try asking yourself these questions about a project to help find the perfect balance:
- What kind of results are needed?
- Who will be informed?
- Am I answering the question being asked?
- How quickly does a decision need to be made?
For instance, if you are working on a rush project, you might need to rely on your own knowledge and experience more than usual. There just isn’t enough time to thoroughly analyse all the available data. But if you get a project that involves plenty of time and resources, then the best strategy is to be more data-driven. It’s up to you, the data analyst, to make the best possible choice. You will probably blend data and knowledge a million different ways over the course of your data analytics career. And the more you practice, the better you will get at finding that perfect blend.
Anyway, More on the phases of data analysis
Each step in the data analysis process—ask, prepare, process, analyse, share, and act—plays a crucial role in extracting meaningful insights from data. As you navigate through each phase, from asking the right questions to taking informed actions, you harness the true power of data. In this reading, you’ll explore how the data analysis process guides this program.
The ask phase
At the start of any successful data analysis, the data analyst:
- Takes the time to fully understand stakeholder expectations
- Defines the problem to be solved
- Decides which questions to answer in order to solve the problem
Qualifying stakeholder expectations means determining who the stakeholders are, what they want, when they want it, why they want it, and how best to communicate with them. Defining the problem means looking at the current state and identifying the ways in which it’s different from the ideal state. With expectations qualified, and the problem defined, you can derive questions that will help achieve these goals.
In an upcoming course, you’ll learn how to ask effective questions and define the problem by working with stakeholders. You’ll also cover strategies that can help you share what you discover in a way that keeps people interested.
The prepare phase
In the prepare phase, the emphasis is on identifying and locating data you can use to answer your questions. In an upcoming course, you’ll learn more about the different types of data and how to identify which kinds of data are most useful for solving a particular problem. You’ll also discover why it’s so important that data and results are objective and unbiased. In other words, any decisions made from an analysis should always be based on facts and be fair and impartial.
The process phase
In this phase, the aim is to refine the data. Data analysts find and eliminate any errors and inaccuracies that can get in the way of results. This usually means:
- Cleaning data
- Transforming data into a more useful format
- Combining two or more datasets to make information more complete
- Removing outliers (data points that could skew the information)
After data analysts process data, they check the data they prepared to make sure it’s complete and correct. This phase is all about getting the details right. Accordingly, the data analyst will refine strategies for verifying and sharing their data cleaning with stakeholders. In this course, we will use spreadsheets and Structured Query Language or SQL, to clean data.
The analysis phase
With a solid foundation of well-defined questions and clean data, you’ll delve into the analysis phase. This is when you turn the data you’ve gathered, prepared, and processed into actionable information. Data analysts use many powerful tools in their work. We will use two of them for this course: spreadsheets and SQL.
The share phase
This phase is exactly what it sounds like: It’s time to share what you’ve learned with your stakeholders! In this part of the program, you’ll learn how data analysts interpret results and share them with others to help stakeholders make effective, data-driven decisions. In the share phase, visualization is a data analyst’s best friend. So, an upcoming course will highlight why visualization is essential to getting others to understand what your data is telling you. In this course, we will learn how to visualize data with R.
The act phase
The data analysis journey culminates in the act phase, when data insights are put to work. This is where we find the solution of the ask phase problem and use it to our business logic.
Key data analyst tools for this course
As we are learning, the most common programs and solutions used by data analysts include spreadsheets, query languages, and visualization tools. In this reading, you will learn more about each one. You will cover when to use them, and why they are so important in data analytics.
Spreadsheets
Data analysts rely on spreadsheets to collect and organize data. Two popular spreadsheet applications you will probably use a lot in your future role as a data analyst are Microsoft Excel and Google Sheets.
Spreadsheets structure data in a meaningful way by letting you
- Collect, store, organize, and sort information
- Identify patterns and piece the data together in a way that works for each specific data project
- Create excellent data visualizations, like graphs and charts.
Databases and query languages
A database is a collection of structured data stored in a computer system. Some popular Structured Query Language (SQL) programs include MySQL, Microsoft SQL Server, and BigQuery.
Query languages
- Allow analysts to isolate specific information from a database(s)
- Make it easier for you to learn and understand the requests made to databases
- Allow analysts to select, create, add, or download data from a database for analysis
Visualization tools
Data analysts use a number of visualization tools, like graphs, maps, tables, charts, and more. Two popular visualization tools are Tableau and Looker.
These tools
- Turn complex numbers into a story that people can understand
- Help stakeholders come up with conclusions that lead to informed decisions and effective business strategies
- Have multiple features
– Tableau‘s simple drag-and-drop feature lets users create interactive graphs in dashboards and worksheets
– Looker communicates directly with a database, allowing you to connect your data right to the visual
tool you choose
A career as a data analyst also involves using programming languages, like R and Python, which are used a lot for statistical analysis, visualization, and other data analysis.
Choose the right tool for the job
As a data analyst, you will usually have to decide which program or solution is right for the particular project you are working on. In this reading, you will learn more about how to choose which tool you need and when.
Depending on which phase of the data analysis process you’re in, you will need to use different tools. For example, if you are focusing on creating complex and eye-catching visualizations, then the visualization tools we discussed earlier are the best choice. But if you are focusing on organizing, cleaning, and analyzing data, then you will probably be choosing between spreadsheets and databases using queries. Spreadsheets and databases both offer ways to store, manage, and use data. The basic content for both tools are sets of values. Yet, there are some key differences, too:
Spreadsheets | Databases |
---|---|
Accessed through a software application | Database accessed using a query language |
Structured data in a row and column format | Structured data using rules and relationships |
Organizes information in cells | Organizes information in complex collections |
Provides access to a limited amount of data | Provides access to huge amounts of data |
Manual data entry | Strict and consistent data entry |
Generally one user at a time | Multiple users |
Controlled by the user | Controlled by a database management system |
You don’t have to choose one or the other because each serves its own purpose. Generally, data analysts work with a combination of the two, as both tools are very useful in data analytics. For example, you can store data in a database, then export it to a spreadsheet for analysis. Or, if you are collecting information in a spreadsheet, and it becomes too much for that particular platform, you can import it into a database. And, later in this course, you will learn about programming languages like R that give you even greater control of your data, its analysis, and the visualizations you create.
Terms and definitions for Module 2
Database: A collection of data stored in a computer system
Formula: A set of instructions used to perform a calculation using the data in a spreadsheet
Function: A preset command that automatically performs a specified process or task using the data in a spreadsheet
Query: A request for data or information from a database
Query language: A computer programming language used to communicate with a database
Stakeholders: People who invest time and resources into a project and are interested in its outcome
Structured Query Language: A computer programming language used to communicate with a database
Spreadsheet: A digital worksheet
SQL: (Refer to Structured Query Language)