Rarely will your data be located in a single table. There is actually good reason for that — to avoid data redundancy.
Suppose I have a table
Customers structured as follows.
Now, suppose that for each person we want to capture a dynamic data point, transaction amount.
So far so good.
However, as soon as someone makes two purchases, we begin to see data redundancy.
We have captured the entries
We can instead drop the
txn_amnt column from the first table and create a
Transactions, say. This table will have a
txn_amnt column, as shown below.
The problem now is that our data lives in two separate tables.
Ok, now our data lives in two tables. The common denominator between the above
two tables is the
customer_id column. Now suppose that we would like to return
records with all three columns
If you read about joins you will see a slew of different types. Despite the many different types of joins, they all follow from the following simple fact.
The process is the following. A field which both tables share in common is
chosen. In the above case,
The key point is this: you cannot know a priori (unless you explicitly check)
whether or not the value of
customer_id in a particular record in one table
will exist in a record in the other table.
Every join is a particular way of addressing this fact.
This is the case where we wish only to include those records for which there exists a match between both tables in the chosen mutually shared field.
customer_id value assumes the value
1 in both columns. That is, there
exists a match for
customer_id=1.These records would therefore be returned in
SELECT sys.Customers.customer_id, sys.Transactions.txn_amnt FROM sys.Customers INNER JOIN sys.Transactions ON sys.Customers.customer_id=sys.Transactions.Id;
Note the importance of specifying the parent table for each column. This is to avoid namespace ambiguity in the case that both tables have a column with the same name.
INNER JOIN can be visualized as follows.
Consider the following additional entry in the
It is conceivable that
Jane Doe may not yet have made a purchase, so that
customer_id=2 has no match in the
Transactions table. If we were to perform
INNER JOIN then, the
Jane Doe records would not appear.
You can override this by using a
LEFT JOIN will behave like an
INNER JOIN, the difference being that all
records in the table specified on the left of the
LEFT JOIN statement will
be included irrespective of whether or not there exists a match in the
field in the second table. The catch is that the fields of any such unmatched
records in the right table will render as
NULL in the result.
SELECT sys.Customers.customer_id, sys.Transactions.txn_amnt FROM sys.Customers LEFT JOIN sys.Transactions ON sys.Customers.ID=sys.Transactions.customer_id;
LEFT JOIN can be visualized as follows.
You can think of
RIGHT JOIN as the reflection of
LEFT JOIN. You can actually
use symmetry to always replace one with the other. That is,
T1 LEFT JOIN T2 ~ T2 RIGHT JOIN T1
Suppose the records of interest are precisely those for which there is no
match in the
ON field. A join that returns such a result is often referred to
as an anti-join.
Consider again the
A left anti-join in this context would mean we are interested in those customers
who have not made a purchase. For example,
There are several ways to construct such a join.
SELECT sys.Customers.customer_id FROM sys.Customers WHERE sys.Customers.customer_id NOT IN (SELECT customer_id FROM Transactions);
Here we are achieving an anti-join without performing a join at all!
SELECT sys.Customers.customer_id FROM sys.Customers LEFT JOIN sys.Transactions.customer_id ON sys.Customers.customer_id=sys.Transactions.customer_id WHERE sys.Trasactions.customer_id IS NULL
The idea is that the
customer_id field will be
NULL after the
precisely for those records for which there is no match.
You can use symmetry to construct the right anti-join case.