What Can SQL do?#
This tutorial will give you a basic introduction to SQL, the standard language for accessing different data sources within DataDistillr.
Using SQL syntax will provide you with the ability to fully access your data, run analysis on your datasets, and quickly present desired results. SQL can execute, retrieve, insert, update, delete, and create records in a data source.
SQL is an ANSI standard, but different versions of the SQL language exist. There are basic operators that must stay the
WHERE), but there are some slight differences that can trip up even the
most talented analyst.
The SQL you will use to write queries in DataDistillr supports the ANSI standard, but includes special operators and functions that enable users to drill into nested data formats. It is specifically formulated to query against complex data: data made up of various types of records and fields, rather than discrete rows and columns.
SQL is not case-sensitive except inside of strings. It is a standard practice to capitalize clauses for clarity.
The order of clauses matters. The order of precedence is:
LIMIT. Not all of these clauses are required, details will be provided below.
To write a comment in SQL use
;is not part of a query. The database server uses it to separate two SQL statements.
Practice Basic SQL#
Start off by downloading the following files then Connect a Data Source and Upload a File.
This clause extracts data from a data source. A data source can be an uploaded file (xls, csv, json, etc.), or an externally connected source such as a database, cloud storage, or an API.
- To return the whole table, use a
- To return specific columns from a table, use the column name(s)
- You can also perform simple calculations within the
This clause constrains the number of rows returned by a
This clause sorts the result set based on specified criteria in ascending or descending orders.
- By default, this sorts by in ascending order. The
ASCkeyword does the same thing.
- To change to descending order, add the
DESCkeyword after the column name
This clause filters out duplicate from the selected column or whole dataset and returns only unique records.
This clause filters columns to extract only records that fulfill the specified condition.
You can use the following comparison operators to test if conditions are True or False
|=||Equal||column = expression|
|>||Greater Than||column > expression|
|<||Less Than||column < expression|
|>=||Greater Than or Equal||column >= expression|
|<=||Less Than or Equal to||column <= expression|
|<> , !=||Not Equal to||column != expression|
The WHERE clause can also be combined with the following operators detailed below:
This operator filters records based on more than one condition and displays a record if the conditions on both sides of
AND are TRUE.
This operator filters records based on more than one condition and displays a record if any of the conditions on either
side of the
OR are TRUE.
This operator filters records based on more than one condition and displays a record if the condition(s) is NOT TRUE.
This operator allows you to specify multiple values in a WHERE clause and can be shorthand for multiple OR conditions.
This operator selects values within a given range. The range is inclusive meaning the begin and end values are included. The data type can be numbers, text, or dates.
DataDistillr allows you to link different datasource on the platform. This will show you how to link tables
together using the
JOIN clause which combines rows from two or more tables, based on a related column between them.
There are different types of joins in SQL:
- FULL OUTER JOIN
- INNER JOIN
- LEFT JOIN
- RIGHT JOIN
This clause allows you to give a temporary name to a table or column through aliases. In the table join
examples below, we will demonstrate the following reasons why the
AS clause comes in handy:
- Write clear column names for readability
- If you don't want to rewrite full column names every time you reference a column, you can rename columns to an acronym
- If you have duplicate column names across different tables, you can rename the columns to make them unique
FULL OUTER JOIN
This join includes all rows from the joined tables whether or not the other table has the matching row
This join returns rows if there is, at least, one row in both tables that matches the join condition. This eliminates the rows that do not match with a row of the other table
For each row in column_name1, the inner join clause finds the matching rows in column_name2. If a row is matched, it is included in the final result set.
The query below shouldn't return anything because first names are not repeated in both tables
This join returns all rows from the left table regardless of whether there is a matching row in the right table.
This clause returns all rows from the right table whether or not there is a matching row in the left table.
SQL has built-in functions that allow you to perform basic calculations on your dataset detailed below:
This function returns the smallest value of the selected column.
This function returns the largest value of the selected column.
This function returns the number of rows that matches a specified criterion.
This function returns the average value of a numeric column. This calculates the average value of a set.
This function returns the total sum of a numeric column only. This can be distinct values or all values within the column