- Foundation Course Module 1: Introduction of data analytics and analytical thinking
- Foundation Course Module 2 : The wonderful world of data
- Foundation Course Module 3 : Setup of data analytics toolbox
- Foundation Course Module 4: Becoming a fair and impactful data professional
- Foundation Course: Glossary
- Course 2: Ask questions to make data driven decisions, Module 1: Ask effective questions
- Course 2: Ask questions to make data driven decisions, Module 2: Make data-driven decisions
- Course 2: Ask questions to make data driven decisions, Module 3: Spreadsheet magic
- Course 2: Ask questions to make data driven decisions, Module 4: Always remember the stakeholder
- Course 3: Prepare Data For Exploration: Learning objectives and overviews
- Course 3: Prepare Data For Exploration, Module 1: Data types and structures
- Course 3: Prepare Data For Exploration, Module 2: Data responsibility
- Course 3: Prepare Data For Exploration, Module 3: Database Essentials
- Course 3: Prepare Data For Exploration, Module 4: Organise and Secure Data
- Course 4: Process Data from Dirty to Clean: Overview
- Course 4: Process Data from Dirty to Clean, Module 1: The importance of integrity
- Course 4: Process Data from Dirty to Clean, Module 2: Clean it up
- Course 4: Process Data from Dirty to Clean, Module 3: SQL
- Course 4: Process Data from Dirty to Clean, Module 4: Verify and Report Results
- Course 5: Analyse Data to Answer Questions, Module 1: Organise data for more effective analysis
- Course 5: Analyse Data to Answer Questions, Module 2: Format and adjust data
- Course 5: Analyse Data to Answer Questions, Module 3: Aggregate data for analysis
- Course 5: Analyse Data to Answer Questions, Module 4: Perform Data Calculations
- Course 6: Share Data Through the Art of Visualisation, Course Overview plus Module 1: Visualise Data
- Course 6: Share Data Through the Art of Visualisation, Course Overview plus Module 2: Create Data Visualisation with Tableau
File organisation guidelines
Every data analyst’s goal is to conduct efficient data analysis. One way to increase the efficiency of your analyses is to streamline processes that help save time and energy in the long run. Meaningful, logical, and consistent file names help data analysts organise their data and automate their analysis process. When you use consistent guidelines to describe the content, date, or version of a file and its name, you’re using file naming conventions.
Best practices for naming files
File-naming conventions help you organise, access, process, and analyse data because they act as quick reference points to identify what’s in a file. One important practice is to decide on file naming conventions—as a team or company—early in a project. This will prevent you from spending time updating file names later, which can be a time-consuming process. In addition, you should align your project’s file names with your team’s or company’s existing file-naming conventions. You don’t want to spend time learning a new file-naming convention each time you look up a file in a new project!
It’s also critical to ensure that file names are meaningful, consistent, and easy-to-read. File names should include:
- The project’s name
- The file creation date
- Revision version
- Consistent style and order
Further, file-naming conventions should act as quick reference points to identify what is in the file. Because of this, they should be short and to the point.
In the following sections, you’ll explore each part of a sales report file name that follows an established naming convention, SalesReport_20231125_v02. This example will help you understand the key parts of a strong file name and why they’re important.
Name
Giving a file a meaningful name to describe its contents makes searching for it straightforward. It also makes it easy to understand the type of data the file contains.
In the example, the file name includes the text SalesReport, a succinct description of what the file contains: a sales report.
Knowing when a file was created can help you understand if it is relevant to your current analysis. For example, you might want to analyse only data from 2023.
Creation date
In the example, the year is described as 20231125. This reads as the sales report from November 25, 2023 following the year, month, and day (YYYYMMDD) format of the international date standard. Keep in mind that different countries follow different date conventions, so make sure you know the date standard your company follows.
Revision version
Including a revision version helps ensure you’re working with the correct file. You wouldn’t want to make edits to an old version of a file without realising it! When you include revision numbers in a file name, lead with a zero. This way, if your team reaches more than nine rounds of revisions, double digits are already built into your convention.
In the example, the version is described as v02. The v is short for the version of the file, and the number following the v indicates which round of revisions the file is currently in.
Consistent order and style
Make sure the information you include in a file name follows a consistent order. For example, you wouldn’t want version three of the sales report in the example to be titled 20231125_v03_SalesReport. It would be difficult to find and compare multiple documents.
When you use spaces and special characters in a file name, software may not be able to recognize them, which causes problems and errors in some applications. An alternative is to use hyphens, underscores, and capital letters. The example includes underscores between each piece of information, but your team could choose to use hyphens between year, month, and date, too: SalesReport_2023_11_25_v02.
Ensure team consistency
To ensure all team members use the agreed-upon file naming conventions, create a text file as a sample that includes all the naming conventions on a project. This can benefit new team members to help them quickly get up to speed or a current team member who just needs a refresher on the file naming conventions.
File organisation
To keep your files organised, create folders and subfolders—in a logical hierarchy—to ensure related files are stored together and can be found easily later. A hierarchy is a way of organising files and folders. Broader-topic folders are located at the top of the hierarchy, and more specific subfolders and files are contained within those folders. Each folder can contain other folders and files. This allows you to group related files together and makes it easier to find the files you need. In addition, it’s a best practice to store completed files separately from in-progress files so the files you need are easy to find. Archive older files in a separate folder or in an external storage location.
Data Security
Data security means protecting data from unauthorised access or corruption by putting safety measures in place. Usually, the purpose of data security is to keep unauthorised users from accessing or viewing sensitive data. Data analysts have to find a way to balance data security with their actual analysis needs. This can be tricky– we want to keep our data safe and secure, but we also want to use it as soon as possible so that we can make meaningful and timely observations.
In order to do this, companies need to find ways to balance their data security measures with their data access needs.
Luckily, there are a few security measures that can help companies do just that. The two we will talk about here are encryption and tokenisation.
Encryption uses a unique algorithm to alter data and make it unusable by users and applications that don’t know the algorithm. This algorithm is saved as a “key” which can be used to reverse the encryption; so if you have the key, you can still use the data in its original form.
Tokenisation replaces the data elements you want to protect with randomly generated data referred to as a “token.” The original data is stored in a separate location and mapped to the tokens. To access the complete original data, the user or application needs to have permission to use the tokenised data and the token mapping. This means that even if the tokenised data is hacked, the original data is still safe and secure in a separate location.
Encryption and tokenisation are just some of the data security options out there. There are a lot of others, like using authentication devices for AI technology.
As a junior data analyst, you probably won’t be responsible for building out these systems. A lot of companies have entire teams dedicated to data security or hire third party companies that specialise in data security to create these systems. But it is important to know that all companies have a responsibility to keep their data secure, and to understand some of the potential systems your future employer might use.
However, one thing you absolutely can do to help strike the right balance is to use version control best practices. Version control enables all collaborators within a file to track changes over time. You can understand who made what changes to a file, when they were made, and why.
Here’s a simple example: Perhaps you’re working on a project with a team of other people. You are all collaborating within the same set of files, but each person is responsible for a different part of the project. Without version control, it would be very difficult to keep track of who made what changes to the files and when. This would lead to confusion and, even worse, people accidentally overwriting each other’s work! Version control is essential for data analytics professionals because it allows users to effectively collaborate with others and experiment with new ideas without fear of losing their work.
Please read here for more about version control (in Kaggle)