Page 1 of 1

Table Joins

#1 andrewsw  Icon User is online

  • lashings of ginger beer
  • member icon

Reputation: 6333
  • View blog
  • Posts: 25,533
  • Joined: 12-December 12

Posted 12 October 2015 - 08:16 AM

Exploring and demonstrating the different join types.

I'm using T-SQL and running the statements in Microsoft SQL Server Management Studio. However, the examples are simple enough, and the SQL syntax consistent across databases, that all the code should run from any database with little, if any, alteration. However, not all databases support FULL joins (MySQL for example), and there may be some that don't support CROSS JOIN (although I provide code that creates such a join without the keyword CROSS).

The Data
CREATE DATABASE Critters
GO
USE Critters 
GO


PetTypes
CREATE TABLE PetTypes (
    PetTypeID int PRIMARY KEY,
    PetType varchar(40)
)
GO

INSERT INTO PetTypes VALUES 
    (1, 'Dog'), (2, 'Cat'), (3, 'Frog'), (4, 'Alligator');

SELECT * FROM PetTypes


Pets
CREATE TABLE Pets (
    PetID int PRIMARY KEY,
    PetName varchar(30),
    PetTypeID int,
    CONSTRAINT fkPetType FOREIGN KEY (PetTypeID) REFERENCES PetTypes(PetTypeID)
)
GO

INSERT INTO Pets VALUES
    (1, 'Tiddles', 2), (2, 'Fluffy', 2), (4, 'Rover', 1),
    (5, 'Kermit', 3), (6, 'Nemo', NULL);

SELECT * FROM Pets


PetTypeID   PetType
----------- ----------------------------------------
1           Dog
2           Cat
3           Frog
4           Alligator

(4 row(s) affected)

PetID       PetName                        PetTypeID
----------- ------------------------------ -----------
1           Tiddles                        2
2           Fluffy                         2
4           Rover                          1
5           Kermit                         3
6           Nemo                           NULL

(5 row(s) affected)


The CONSTRAINT syntax should be the same across databases, but it isn't needed for the examples to work.

The data is obviously very simple. You could use your own data if you like. In order to recognise differences between the join types just have one or two records in each table that don't appear in other table(s).

I have a follow-up tutorial here which joins three tables.

Join Types, In Brief

Here I am just listing the different join types for reference, without going into any details. Scan this section for the moment; return to it once you've gone through a few of my examples and it will make more sense.

There are two main join types, INNER and OUTER. INNER is the most commonly used.

An INNER join is often referred to as an equi-join, but OUTER joins can also be equi-joins because it refers to the use of the equality sign. However, in my experience, when people say 'equi-join' they tend to mean an INNER join.

OUTER joins can be either LEFT or RIGHT (OUTER) joins. The three join types INNER, LEFT OUTER and RIGHT OUTER are the ones that you will use nearly all the time, rarely using the others that I will now mention.

There is also a FULL OUTER JOIN, for those rare occasions when you need to see everything, and a CROSS JOIN (which is neither INNER nor OUTER). I'll demonstrate these.

One more that I will demonstate is a self-join. This isn't a distinct join type like the others already mentioned; it doesn't introduce a new SQL keyword. It is just a way of joining a table to a copy of itself.

Not discussed in this tutorial are natural, theta, semi- and anti- joins. Don't worry about these until you have a good understanding of the more important joins demonstrated in this tutorial. For later reference:

SQL JOIN and different types of JOINs
Performing Semi-Joins with EXCEPT and INTERSECT

INNER JOIN
SELECT PetName, PetType 
FROM Pets INNER JOIN PetTypes 
    ON Pets.PetTypeID = PetTypes.PetTypeID;

PetName                        PetType
------------------------------ ----------------------------------------
Tiddles                        Cat
Fluffy                         Cat
Rover                          Dog
Kermit                         Frog


Hint: Try to list tables in the ON clause in the same order they appear in the FROM clause. This isn't necessary but makes it easier to read and decipher the statement.

Inner joins are the most commonly used. Records must appear in both tables in order to appear in the results. For our example, a Pet must have a valid PetTypeID in order to be listed in the output.

A Pet would also not be listed if it had a PetTypeID that wasn't present in the PetTypes table. This should not happen and our CONSTRAINT prevents us from assigning a PetTypeID to a Pet that doesn't already exist in the PetTypes table.



Consider it this way. Imagine an empty table with the column-headings, the field-names, that we have selected. The query will look through the Pets table examining each Pet's PetTypeID. If the Pet has a PetTypeID, and this PetTypeID exists in the PetTypes table, then the matched details will be added as a row to our results table. If a Pet doesn't have a PetTypeID, or it does but this ID doesn't exist in PetTypes, then the row is skipped.



NULL does not match NULL. NULL means no value. Joins are attempting to match values.
(Null Values and Joins). This isn't a concern with our example because, in PetTypes, PetTypeID is the primary key, so cannot be NULL anyway.

The same output will be achieved with the order of the tables switched:
SELECT PetName, PetType 
FROM PetTypes INNER JOIN Pets
	ON PetTypes.PetTypeID = Pets.PetTypeID;


You could consider that each PetTypeID is examined in the PetTypes table, then, for each PetTypeID, all matching Pets are listed. The output is the same. That is, for an INNER JOIN, it doesn't matter which table you consider first, the set of rows returned are the same.

However, notice that the cats are listed first, even though Dog was created, and numbered, first in the PetTypes table. First and foremost, databases do not guarantee the order that records are returned. If you want your output in a particular order use the ORDER BY clause.

Databases optimize queries. They build a query execution plan. This execution plan determines (but does not guarantee) the order that records are returned, and tables are parsed.

The same output could be obtained using a WHERE clause:
SELECT PetName, PetType 
    FROM Pets, PetTypes 
    WHERE Pets.PetTypeID = PetTypes.PetTypeID


This was the previous syntax used. It is valid to use, and there is no difference in performance, but using join syntax is preferred. Discussed here.

LEFT OUTER JOIN
SELECT PetName, PetType 
FROM Pets LEFT OUTER JOIN PetTypes 
    ON Pets.PetTypeID = PetTypes.PetTypeID;

PetName                        PetType
------------------------------ ----------------------------------------
Tiddles                        Cat
Fluffy                         Cat
Rover                          Dog
Kermit                         Frog
Nemo                           NULL


The word OUTER is optional. I prefer to write it.

This returns all the records from the table written on the left, even if they don't have a matching record in the table written on the right (i.e. Nemo). The missing details from the right are populated with NULLs.

Imagine that you have a table of staff members and another with their holiday details. You might want to produce a holiday report that lists all the staff members even if they haven't booked a holiday.

People tend to interpret LEFT and RIGHT joins as being quite distinct. They aren't. They simply refer to either the table written on the left, or the right, of the join. If there is more than one LEFT and/or RIGHT join though, then the order in which tables are parsed becomes important. This is discussed further in my follow-up tutorial Join Three Tables.

RIGHT OUTER JOIN
SELECT PetName, PetType 
FROM Pets RIGHT OUTER JOIN PetTypes 
    ON Pets.PetTypeID = PetTypes.PetTypeID;

PetName                        PetType
------------------------------ ----------------------------------------
Rover                          Dog
Tiddles                        Cat
Fluffy                         Cat
Kermit                         Frog
NULL                           Alligator


RIGHT refers to the table PetTypes. All records from PetTypes are returned even if there isn't any Pet of a given type (i.e. Alligator).



With LEFT or RIGHT joins we can use a WHERE clause to collect the NULL records. Here we are listing only those Pets that don't have a PetType:
SELECT PetName, PetType 
FROM Pets LEFT OUTER JOIN PetTypes 
    ON Pets.PetTypeID = PetTypes.PetTypeID
    WHERE PetTypes.PetTypeID IS NULL;

PetName                        PetType
------------------------------ ----------------------------------------
Nemo                           NULL

(1 row(s) affected)


The database populates the missing values with NULLs, which we then filter.

Similarly, this will show the PetTypes that haven't yet occurred in the Pets table:
SELECT PetName, PetType 
FROM Pets RIGHT OUTER JOIN PetTypes 
    ON Pets.PetTypeID = PetTypes.PetTypeID
    WHERE Pets.PetTypeID IS NULL;

PetName                        PetType
------------------------------ ----------------------------------------
NULL                           Alligator

(1 row(s) affected)


FULL OUTER JOIN
SELECT PetName, PetType 
FROM Pets FULL OUTER JOIN PetTypes 
    ON Pets.PetTypeID = PetTypes.PetTypeID;

PetName                        PetType
------------------------------ ----------------------------------------
Tiddles                        Cat
Fluffy                         Cat
Rover                          Dog
Kermit                         Frog
Nemo                           NULL
NULL                           Alligator


Records from both tables are returned in full, with all missing (unmatched) values populated with NULLs.

As noted earlier, not all databases support FULL joins, although there are ways to obtain the same output for these databases.

Again the word OUTER is optional.

The following collects only those items that are unmatched. That is, only Pets without a PetType and PetTypes not appearing in the Pets table:
SELECT PetName, PetType 
FROM Pets FULL OUTER JOIN PetTypes 
	ON Pets.PetTypeID = PetTypes.PetTypeID
	WHERE Pets.PetTypeID IS NULL OR PetTypes.PetTypeID IS NULL;

PetName                        PetType
------------------------------ ----------------------------------------
Nemo                           NULL
NULL                           Alligator


CROSS JOIN
SELECT PetName, PetType 
FROM Pets CROSS JOIN PetTypes;

PetName                        PetType
------------------------------ ----------------------------------------
Tiddles                        Dog
Fluffy                         Dog
Rover                          Dog
Kermit                         Dog
Nemo                           Dog
Tiddles                        Cat
Fluffy                         Cat
Rover                          Cat
Kermit                         Cat
Nemo                           Cat
Tiddles                        Frog
Fluffy                         Frog
Rover                          Frog
Kermit                         Frog
Nemo                           Frog
Tiddles                        Alligator
Fluffy                         Alligator
Rover                          Alligator
Kermit                         Alligator
Nemo                           Alligator

(20 row(s) affected)


This creates a cartesian product of the two tables, matching all records from the first with all records from the second table. There are 5x4 rows. This has limited, or specialized, usage. For example, it can be useful with date and/or time values to help generate a timetable or calendar.

I believe that some databases might not support the keyword CROSS. In which case the same results can be obtained simply by listing tables but omitting a join:
SELECT PetName, PetType 
FROM Pets, PetTypes;


It is with this syntax that quite often a cartesian product of records is produced by mistake, by forgetting to include a join. Be aware of this possiblity, particularly with large tables.

Self Join

A self join is not a distinct join type, it is simply a way to use a table alias to allow a table to join to a copy of itself.

This example doesn't do anything meaningful as my sample table only has three fields. It is included just to present the basic syntax for a self join.
SELECT p1.PetName, p2.PetTypeID 
FROM Pets AS p1 INNER JOIN Pets AS p2 
    ON p1.PetName = p2.PetName;


We only actually need one table alias:
SELECT Pets.PetName, p2.PetTypeID 
FROM Pets INNER JOIN Pets AS p2 
    ON Pets.PetName = p2.PetName;


An example would be a staff table, that also has a column for each staff member's line manager, using the line manager's staff id. Using a self join we could display staff details alongside some details for their line manager (name, etc.).

Join Three Tables

This post has been edited by andrewsw: 12 October 2015 - 11:08 AM


Is This A Good Question/Topic? 1
  • +

Replies To: Table Joins

#2 modi123_1  Icon User is online

  • Suitor #2
  • member icon



Reputation: 13382
  • View blog
  • Posts: 53,409
  • Joined: 12-June 08

Posted 12 October 2015 - 08:19 AM

Let's not forget the almighty venn diagrams!

Posted Image
Was This Post Helpful? 1
  • +
  • -

#3 andrewsw  Icon User is online

  • lashings of ginger beer
  • member icon

Reputation: 6333
  • View blog
  • Posts: 25,533
  • Joined: 12-December 12

Posted 12 October 2015 - 09:59 AM

Yes, I meant to include this link at some point:

A Visual Explanation of SQL Joins
Was This Post Helpful? 0
  • +
  • -

Page 1 of 1