Knowing a variety of ways to clean data can make a data analyst’s job much easier. Learning Objectives How a junior data analyst uses SQL In this reading, you will learn more about how to decide when to use SQL, or Structured Query Language. As a data analyst, you will be tasked with handling a lot of data, and SQL…
Category: Technical
Course 4: Process Data from Dirty to Clean, Module 2: Clean it up
What is dirty data? Earlier, we discussed that dirty data is data that is incomplete, incorrect, or irrelevant to the problem you are trying to solve. This section summarizes: Types of dirty data Duplicate data Description Possible causes Potential harm to businesses Any data record that shows up more than once Manual data entry, batch data imports, or data migration…
Course 4: Process Data from Dirty to Clean, Module 1: The importance of integrity
Scenario: calendar dates for a global company Calendar dates are represented in a lot of different short forms. Depending on where you live, a different format might be used. Now, think about what would happen if you were working as a data analyst for a global company and didn’t check date formats. Well, your data integrity would probably be questionable.…
Course 4: Process Data from Dirty to Clean: Overview
This course is the fourth in the Google Data Analytics Certificate program. It will teach you how to clean data using spreadsheets and SQL, as well as how to verify and report your data cleaning results. This is an important skill for data analysts, as it ensures that the data they are working with is accurate and reliable. Here are…
Course 3: Prepare Data For Exploration, Module 4: Organise and Secure Data
File organisation guidelines Every data analyst’s goal is to conduct efficient data analysis. One way to increase the efficiency of your analyses is to streamline processes that help save time and energy in the long run. Meaningful, logical, and consistent file names help data analysts organise their data and automate their analysis process. When you use consistent guidelines to describe…
Course 3: Prepare Data For Exploration, Module 3: Database Essentials
Maximise databases in data analytics Databases enable analysts to manipulate, store, and process data. This helps them search through data a lot more efficiently to get the best insights. Relational databases A relational database is a database that contains a series of tables that can be connected to form relationships. Basically, they allow data analysts to organise and link data…
Course 3: Prepare Data For Exploration, Module 2: Data responsibility
Data Responsibility Rundown Key Learnings: Specific Topics Covered: Data anonymization What is data anonymization? We have been learning about the importance of privacy in data analytics. Now, it is time to talk about data anonymization and what types of data should be anonymized. Personally identifiable information, or PII, is information that can be used by itself or with other data to…
Course 3: Prepare Data For Exploration, Module 1: Data types and structures
Select the right data Following are some data-collection considerations to keep in mind for your analysis: How the data will be collected Decide if you will collect the data using your own resources or receive (and possibly purchase it) from another party. Data that you collect yourself is called first-party data. Data sources If you don’t collect the data using…
Course 3: Prepare Data For Exploration: Learning objectives and overviews
A massive amount of data is generated every single day. In this part of the course, you will discover how this data is generated and how analysts decide which data to use for analysis. You’ll also learn about structured and unstructured data, data types, and data formats as you start thinking about how to prepare your data for analysis. Learning…
SQL best practices
Content collected from Google Data Analytics course Feel free to download a .pdf version of this reading below: These best practices include guidelines for entering SQL queries, developing documentation, and examples that demonstrate these practices. This is a great resource to have handy when you are using SQL yourself; you can just go straight to the relevant section to review…