Technical, Data Analytics

Course 5: Analyse Data to Answer Questions, Module 2: Format and adjust data

graphical user interface

In this module, you will learn about converting data from one format to another, data validation and adjust the data for our analysis. One of the ways to help ensure that you have an accurate analysis of your data is by putting all of it in the correct format. This is true even if you have already cleaned and processed your data. As a part of getting your data ready for analysis, you will need to convert and format your data early on in the process.

As a data analyst, there are lots of scenarios when you might need to convert data in a spreadsheet:

String to date

  • How to convert text to date in Excel: Transforming a series of numbers into dates is a common scenario you will encounter. This resource will help you learn how to use Excel functions to convert text and numbers to dates, and how to turn text strings into dates without a formula. 
  • Google Sheets: Change date format: If you are working with Google Sheets, this resource will demonstrate how to convert your text strings to dates and how to apply the different date formats available in Google Sheets. 

String to numbers

  • How to convert text to number in Excel: Even though you will have values in your spreadsheet that resemble numbers, they may not actually be numbers. This conversion is important because it will allow your numbers to add up and be used in formulas without errors in Excel. 
  • How to convert text to numbers in Google Sheets: This resource is useful if you are working in Google Sheets; it will demonstrate how to convert text strings to numbers in Google Sheets. It also includes multiple formulas you can apply to your own sheets, so you can find the method that works best for you. 

Combining columns

  • Convert text from two or more cells: Sometimes you may need to merge text from two or more cells. This Microsoft Support page guides you through two distinct ways you can accomplish this task without losing or altering your data. It also includes a step-by-step video tutorial to help guide you through the process.
  • How to split or combine cells in Google Sheets: This guide will demonstrate how to to split or combine cells using Google Sheets specifically. If you are using Google Sheets, this is a useful resource to reference if you need to combine cells. It includes an example using real data. 

Number to percentage

  • Format numbers as percentages: Formatting numbers as percentages is a useful skill to have on any project. This Microsoft Support page will provide several techniques and tips for how to display your numbers as percentages. 
  • TO_PERCENT: This Google Sheets support page demonstrates how to use the TO_PERCENT formula to convert numbers to percentages. It also includes links to other formulas that can help you convert strings. 

Pro tip: Keep in mind that you may have lots of columns of data that require different formats. Consistency is key, and best practice is to make sure an entire column has the same format. 

Additional resources

If you find yourself needing to convert other types of data, you can find resources on Microsoft Support for Excel or Google Docs Editor Help for Google Sheets. 

Converting data is quick and easy, and the same functions can be used again and again. You can also keep these links bookmarked for future use, so you will always have them ready in case any of these issues arise. Now that you know how to convert data, you are on your way to becoming a successful data analyst.

Data Validation: Transform data with SQL

Data analysts usually need to convert data from one format to another to complete an analysis. But what if you are using SQL rather than a spreadsheet? Just like spreadsheets, SQL uses standard rules to convert one type of data to another. If you are wondering why data transformation is an important skill to have as a data analyst, think of it like being a driver who is able to change a flat tire. Being able to convert data to the right format speeds you along in your analysis. You don’t have to wait for someone else to convert the data for you. 

In this reading, you will go over the conversions that can be done using the CAST function. There are also more specialized functions like COERCION to work with big numbers, and UNIX_DATE to work with dates. UNIX_DATE returns the number of days that have passed since January 1, 1970 and is used to compare and work with dates across multiple time zones. You will likely use CAST most often. 

Common conversions 

The following table summarizes some of the more common conversions made with the CAST function. Refer to Conversion Rules in Standard SQL for a full list of functions and associated rules. 

Starting withCAST function can convert to:
Numeric (number)– Integer – Numeric (number) – Big number – Floating integer – String
String– Boolean – Integer – Numeric (number) – Big number – Floating integer – String – Bytes – Date – Date time – Time – Timestamp
Date– String – Date – Date time – Timestamp

The CAST function (syntax and examples)

CAST is an American National Standards Institute (ANSI) function used in lots of programming languages, including BigQuery. This section provides the BigQuery syntax and examples of converting the data types in the first column of the previous table. The syntax for the CAST function is as follows:

SQL
CAST(expression AS typename)

Where expression is the data to be converted and typename is the data type to be returned.

Converting a number to a string

The following CAST statement returns a string from a numeric identified by the variable MyCount in the table called MyTable

SQL
SELECT CAST(MyCount AS STRING) FROM MyTable

In the above SQL statement, the following occurs:

  • SELECT indicates that you will be selecting data from a table
  • CAST indicates that you will be converting the data you select to a different data type
  • AS comes before and identifies the data type which you are casting to
  • STRING indicates that you are converting the data to a string
  • FROM indicates which table you are selecting the data from

Converting a string to a number

The following CAST statement returns an integer from a string identified by the variable MyVarcharCol in the table called MyTable. (An integer is any whole number.)

SQL
SELECT CAST(MyVarcharCol AS INT) FROM MyTable

In the above SQL statement, the following occurs:

  • SELECT indicates that you will be selecting data from a table
  • CAST indicates that you will be converting the data you select to a different data type
  • AS comes before and identifies the data type which you are casting to
  • INT indicates that you are converting the data to an integer
  • FROM indicates which table you are selecting the data from

Converting a date to a string

The following CAST statement returns a string from a date identified by the variable MyDate in the table called MyTable

SQL
SELECT CAST(MyDate AS STRING) FROM MyTable

In the above SQL statement, the following occurs:

  • SELECT indicates that you will be selecting data from a table
  • CAST indicates that you will be converting the data you select to a different data type
  • AS comes before and identifies the data type which you are casting to
  • STRING indicates that you are converting the data to a string
  • FROM indicates which table you are selecting the data from

Converting a date to a datetime

Datetime values have the format of YYYY-MM-DD hh: mm: ss format, so date and time are retained together. The following CAST statement returns a datetime value from a date. 

SQL
SELECT CAST (MyDate AS DATETIME) FROM MyTable

In the above SQL statement, the following occurs:

  • SELECT indicates that you will be selecting data from a table
  • CAST indicates that you will be converting the data you select to a different data type
  • AS comes before and identifies the data type which you are casting to
  • DATETIME indicates that you are converting the data to a datetime value
  • FROM indicates which table you are selecting the data from

The SAFE_CAST function

Using the CAST function in a query that fails returns an error in BigQuery. To avoid errors in the event of a failed query, use the SAFE_CAST function instead. The SAFE_CAST function returns a value of Null instead of an error when a query fails. 

The syntax for SAFE_CAST is the same as for CAST. Simply substitute the function directly in your queries. The following SAFE_CAST statement returns a string from a date.

SQL
SELECT SAFE_CAST(MyDate AS STRING) FROM MyTable

More information

Browse these resources for more information about data conversion using other SQL dialects (instead of BigQuery):

Advanced spreadsheet tips and tricks

Spreadsheets will get easier the more you practice. This reading provides you with a list of resources that may help advance your knowledge and experience with spreadsheet functions and functionality. The goal is to provide you with access to a variety of advanced tips and tricks that will help make you more efficient and effective when working with spreadsheets to perform data analysis. Review the description of each resource below, click the links to learn more, and save or bookmark any links that are useful to you. You can immediately start practising anything that you learn to increase the chances of your understanding and to build your familiarity with spreadsheets.

Here are some handy spreadsheet resources:

Google Sheets

  • Keyboard shortcuts for Google Sheets: This is a great resource for quickly learning a range of keyboard shortcuts that can make regular tasks quicker and easier, like navigating your spreadsheet or accessing formulas and functions. This list contains shortcuts for the desktop and mobile versions of Google Sheets so that you can apply them to your work no matter what device you are using. 
  • List of Google Sheets Functions: This is a comprehensive list of the Google Sheets functions and syntax. Each function is listed with a link to learn more.
  • 23 Must-Know Google Sheet Formulas: This blog article from Layer summarizes and describes 20 of the most useful Google Sheets formulas.
  • 18 Google Sheets Formula Tips and Techniques: These are tips for using Google Sheets shortcuts when working with formulas.

Excel

  • Keyboard shortcuts in Excel: Earlier in this list, you were provided with a resource for keyboard shortcuts in Google Sheets. Similarly, this resource provides a list of keyboard shortcuts in Excel that will make performing regular spreadsheet tasks more efficient. This includes keyboard shortcuts for both desktop and mobile versions of Excel, so you can apply them no matter what platform you are working on. 
  • 222 Excel shortcuts: A compilation of shortcuts includes links to more detailed explanations about how to use them. This is a great way to quickly reference keyboard shortcuts. The list has been organized by functionality, so you can go directly to the sections that are most useful to you. 
  • List of spreadsheet functions: This is a comprehensive list of Excel spreadsheet functions with links to more detailed explanations. This is a useful resource to save so that you can reference it often; that way, you’ll have access to functions and examples that you can apply to your work. 
  • List of spreadsheet formulas: Similar to the previous resource, this comprehensive list of Excel spreadsheet formulas with links to more detailed explanations and can be saved and referenced any time you need to check out a formula for your analysis. 
  • Essential Excel Skills for Analyzing Data: This blog post includes more advanced functionalities of some spreadsheet tools that you have previously learned about, like pivot tables and conditional formatting. These skills have been identified as particularly useful for data analysis. Each section includes a how-to video that will take you through the process of using these functions step-by-step, so that you can apply them to your own analysis. 
  • Advanced Spreadsheet Skills: Mark Jhon C. Oxillo’s presentation starts with a basic overview of spreadsheet but also includes advanced functions and exercises to help you apply formulas to actual data in Excel. This is a great way to review some basic concepts and practice the skills you have been learning so far. 

There are lots of resources online about advanced spreadsheet tips and tricks. You’ll probably discover new resources and tools on your own, but this list is a great starting point as you become more familiar with spreadsheets. 

Series Navigation<< Course 5: Analyse Data to Answer Questions, Module 1: Organise data for more effective analysisCourse 5: Analyse Data to Answer Questions, Module 3: Aggregate data for analysis >>
Tagged ,

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.