Technical, Data Analytics

Basic Overview of SQL Language / RDBMS

SQL photo

What is SQL?

SQL stands for Structured Query Language. SQL is a standard language for accessing and manipulating databases.

Just as humans use different languages to communicate with others, so do computers. Structured Query Language (or SQL, often pronounced “sequel”) enables us to talk to their databases. SQL is one of the most useful data analyst tools, especially when working with large datasets in tables. It can help you investigate huge databases, track down text (referred to as strings) and numbers, and filter for the exact kind of data you need—much faster than a spreadsheet can. 

If you haven’t used SQL before, this reading will help you learn the basics so you can appreciate how useful SQL is and how useful SQL queries are in particular. You will be writing SQL queries in no time at all.

Why SQL?

Well SQL is almost the backbones of our everyday database systems. It’s reliable, fast and very efficient. Hundreds and thousands of databases are written in SQL and it is not going anywhere soon. It is an integral part of Data Analytics, Software and Web Application developement etc etc.

What Can SQL do?

  • SQL can execute queries against a database
  • SQL can retrieve data from a database
  • SQL can insert records in a database
  • SQL can update records in a database
  • SQL can delete records from a database
  • SQL can create new databases
  • SQL can create new tables in a database
  • SQL can create stored procedures in a database
  • SQL can create views in a database
  • SQL can set permissions on tables, procedures, and views

RDBMS

RDBMS stands for Relational Database Management System.

RDBMS is the basis for SQL, and for all modern database systems such as MS SQL Server, IBM DB2, Oracle, MySQL, and Microsoft Access.

The data in RDBMS is stored in database objects called tables. A table is a collection of related data entries, and it consists of columns and rows.

Let’s learn the basics of SQL

To access a SQL database, we need to learn how to do it. Most basic, fundamental and integral part of accessing a database is by using a database or SQL Query. What is a query, I hear you ask..

What is a query?

A query is a request for data or information from a database. When you query databases, you use SQL to communicate your question or request. You and the database can always exchange information as long as you speak the same language.

Every programming language, including SQL, follows a unique set of guidelines known as syntax. Syntax is the predetermined structure of a language that includes all required words, symbols, and punctuation, as well as their proper placement. As soon as you enter your search criteria using the correct syntax, the query starts working to pull the data you’ve requested from the target database.

The syntax of every SQL query is the same: 

  • Use SELECT to choose the columns you want to return.
  • Use FROM to choose the tables where the columns you want are located.
  • Use WHERE to filter for certain information.

A SQL query is like filling in a template. You will find that if you are writing a SQL query from scratch, it is helpful to start a query by writing the SELECT, FROM, and WHERE keywords in the following format: 

SQL
SELECT
FROM
WHERE

Next, enter the table name after the FROM; the table columns you want after the SELECT; and, finally, the conditions you want to place on your query after the WHERE. Make sure to add a new line and indent when adding these, as shown below:

SELECTSpecifies the columns from which to retrieve data
FROMSpecifies the table from which to retrieve data
WHERESpecifies criteria that the data must meet 

Following this method each time makes it easier to write SQL queries. It can also help you make fewer syntax errors.

Example of a query

Here is how a simple query would appear in BigQuery, a data warehouse on the Google Cloud Platform.

SQL

SELECT first_name
FROM customer_data.customer_name
WHERE first_name = 'Tony'

  1. SELECT the column named first_name
  2. FROM a table named customer_name (in a dataset named customer_data) (The dataset name is always followed by a dot, and then the table name.)
  3. But only return the data WHERE the first_name is ‘Tony’

The results from the query might be similar to the following:

first_name
Tony
Tony
Tony

As you can conclude, this query had the correct syntax, but wasn’t very useful after the data was returned.

Multiple columns in a query

Of course, as a data professional, you will need to work with more data beyond customers named Tony. Multiple columns that are chosen by the same SELECT command can be indented and grouped together.

If you are requesting multiple data fields from a table, you need to include these columns in your SELECT command. Each column is separated by a comma as shown below:

SQL


SELECT
	  ColumnA,
	  ColumnB,
	  ColumnC
FROM
	  Table where the data lives
WHERE
	  Certain condition is met
	
	

Here is an example of how it would appear, a basic query:

SQL
SELECT
    customer_id,
    first_name,
    last_name
FROM
    customer_data.customer_name
WHERE
    first_name = 'Tony'

The above query uses three commands to locate customers with the first_name, ‘Tony’.

  1. SELECT the columns named customer_id, first_name, and last_name
  2. FROM a table named customer_name (in a dataset named customer_data) (The dataset name is always followed by a dot, and then the table name.)
  3. But only return the data WHERE the first_name is ‘Tony’

The only difference between this query and the previous one is that more data columns are selected. The previous query selected first_name only while this query selects customer_id and last_name in addition to first_name. In general, it is a more efficient use of resources to select only the columns that you need. For example, it makes sense to select more columns if you will actually use the additional fields in your WHERE clause. If you have multiple conditions in your WHERE clause, they may be written like this:

SQL
SELECT
ColumnA,
ColumnB,
ColumnC
FROM
	Table where the data lives
WHERE
	Condition 1
	AND Condition 2
	AND Condition 3

Notice that unlike the SELECT command that uses a comma to separate fields / variables / parameters, the WHERE command uses the AND statement to connect conditions. As you become a more advanced writer of queries, you will make use of other connectors / operators such as OR and NOT

Here is an example with multiple fields used in a WHERE clause:

SQL
SELECT
	customer_id,
	first_name,
	last_name
FROM
	customer_data.customer_name
WHERE
	customer_id > 0
	AND first_name = 'Tony'
	AND last_name = 'Magnolia'

The above query uses three commands to locate customers with a valid (greater than 0), customer_id whose first_name is ‘Tony’ and last_name is ‘Magnolia’.

  1. SELECT the columns named customer_id, first_name, and last_name
  2. FROM a table named customer_name (in a dataset named customer_data) (The dataset name is always followed by a dot, and then the table name.)
  3. But only return the data WHERE customer_id is greater than 0, first_name is Tony, and last_name is Magnolia.

Note that one of the conditions is a logical condition that checks to see if customer_id is greater than zero.

If only one customer is named Tony Magnolia, the results from the query could be:

customer_idfirst_namelast_name
1967TonyMagnolia

If more than one customer has the same name, the results from the query could be:

customer_idfirst_namelast_name
1967TonyMagnolia
7689TonyMagnolia

The SELECT, FROM, and WHERE clauses are the essential building blocks of SQL queries. Queries with multiple fields will become simpler after practice.

SQL Rules

OK so now we have learned that a SQL query uses SELECT, FROM, and WHERE to specify the data to be returned from the query. This reading provides more detailed information about formatting queries, using WHERE conditions, selecting all columns in a table, adding comments, and using aliases. All of these make it easier for you to understand (and write) queries to put SQL in action.  

Capitalization, indentation, and semicolons

We can write our SQL queries in all lowercase and don’t have to worry about extra spaces between words. However, using capitalization and indentation can help us read the information more easily. Try to keep our queries neat, and they will be easier to review or troubleshoot if we need to check them later on.

SQL
SELECT field1
FROM table
WHERE field1 = condition;

Notice that the SQL statement shown above has a semicolon at the end. The semicolon is a statement terminator and is part of the American National Standards Institute (ANSI) SQL-92 standard, which is a recommended common syntax for adoption by all SQL databases. However, not all SQL databases have adopted or enforce the semicolon, so it’s possible we may come across some SQL statements that aren’t terminated with a semicolon. If a statement works without a semicolon, it’s fine.

WHERE conditions

In the query shown above, the SELECT clause identifies the column we want to pull data from by name, field1, and the FROM clause identifies the table where the column is located by name, table. Finally, the WHERE clause narrows our query so that the database returns only the data with an exact value match or the data that matches a certain condition that we want to satisfy. 

For example, if we are looking for a specific customer with the last name Chavez, the WHERE clause would be: 

SQL
WHERE field1 = 'Chavez'

However, if we are looking for all customers with a last name that begins with the letters “Ch,” the WHERE clause would be:

SQL
WHERE field1 LIKE 'Ch%'

We can conclude that the LIKE clause is very powerful because it allows us to tell the database to look for a certain pattern! The percent sign % is used as a wildcard to match one or more characters. In the example above, both Chavez and Chen would be returned. Note that in some databases an asterisk * is used as the wildcard instead of a percent sign %.

SELECT all columns

Can you use  SELECT * ?

In the example, if you replace SELECT field1 with SELECT * , you would be selecting all of the columns in the table instead of the field1 column only. From a syntax point of view, it is a correct SQL statement, but you should use the asterisk * sparingly and with caution. Depending on how many columns a table has, you could be selecting a tremendous amount of data. Selecting too much data can cause a query to run slowly.

Comments

Some tables aren’t designed with descriptive enough naming conventions. In the example, field1 was the column for a customer’s last name, but you wouldn’t know it by the name. A better name would have been something such as last_name. In these cases, you can place comments alongside your SQL to help you remember what the name represents. Comments are text placed between certain characters, /* and */, or after two dashes ) as shown below. 

SQL
SELECT
	field1 /* this is the last name column */
FROM
	table -- this is the customer data table  
WHERE
	field1 LIKE 'Ch%';

Comments can also be added outside of a statement as well as within a statement. You can use this flexibility to provide an overall description of what you are going to do, step-by-step notes about how you achieve it, and why you set different parameters/conditions.

SQL
-- This is an important query used later to join with the accounts table 
SELECT
 		rowkey,  -- key used to join with account_id
Info.date,  -- date is in string format YYYY-MM-DD HH:MM:SS
Info.code  -- e.g., 'pub-###'

FROM  Publishers

The more comfortable you get with SQL, the easier it will be to read and understand queries at a glance. Still, it never hurts to have comments in a query to remind yourself of what you’re trying to do. This also makes it easier for others to understand your query if your query is shared. As your queries become more and more complex, this practice will save you a lot of time and energy to understand complex queries you wrote months or years ago. 

Example of a query with comments

Here is an example of how comments could be written in Google Clouds BigQuery:

SQL
-- Pull basic information from the customer table
SELECT
	customer_id, --main ID used to join with customer_addresss
	first_name, --customer's first name from loyalty program
	last_name --customer's last name
FROM
	customer_data.customer_name

In the above example, a comment has been added before the SQL statement to explain what the query does. Additionally, a comment has been added next to each of the column names to describe the column and its use. Two dashes are generally supported. So it is best to use and be consistent with it. You can use # in place of in the above query, but # is not recognized in all SQL versions; for example, MySQL doesn’t recognize #.  You can also place comments between /* and */ if the database you are using supports it. 

As you develop your skills professionally, depending on the SQL database you use, you can pick the appropriate comment delimiting symbols you prefer and stick with those as a consistent style. As your queries become more and more complex, the practice of adding helpful comments will save you a lot of time and energy to understand queries that you may have written months or years prior.

Aliases

You can also make it easier on yourself by assigning a new name or alias to the column or table names to make them easier to work with (and avoid the need for comments). This is done with a SQL AS clause. In the example below, aliases are used for both a table name and a column. Within the database, the table is called actual_table_name and the column in that table is called actual_column_name. They are aliased as my_table_alias and my_column_alias, respectively. These aliases are good for the duration of the query only. An alias doesn’t change the actual name of a column or table in the database.

SQL
SELECT 
	my_table_alias.actual_column_name AS my_column_alias
FROM
	actual_table_name AS my_table_alias

So there it is! A basic Overview of SQL! There are fantastic tutorials and documentations all over the internet if you would like to dive deep in SQL learning.

  • W3Schools SQL Tutorial: If you would like to explore a detailed tutorial of SQL, this is the perfect place to start. This tutorial includes interactive examples you can edit, test, and recreate. Use it as a reference or complete the whole tutorial to practice using SQL. Click the green Start learning SQL now button or the Next button to begin the tutorial.
Tagged , ,

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.