Microsoft KB Archive/136699

= Description of the usage of joins in Microsoft Query =

Article ID: 136699

Article Last Modified on 3/2/2007

-

APPLIES TO


 * Microsoft Office Excel 2007
 * Microsoft Office Excel 2003
 * Microsoft Excel 2002 Standard Edition
 * Microsoft Excel 2000 Standard Edition
 * Microsoft Excel 97 Standard Edition
 * Microsoft Excel 95 Standard Edition
 * Microsoft Excel 5.0 Standard Edition
 * Microsoft Excel X for Mac
 * Microsoft Excel 2001 for Mac
 * Microsoft Excel 98 for Macintosh
 * Microsoft Query 2000

-



This article was previously published under Q136699



SUMMARY
This article discusses using joins in Microsoft Query. A join is a method you can use with databases to define and illustrate relationships between two tables. In Microsoft Query, you can create and manipulate various types of joins. In many cases, Microsoft Query joins the tables in your query for you. Joins are represented by lines connecting the tables in the Table pane.

Topics covered in this article include the following:

  What Is a Join? Inner Joins SQL Statements Outer Joins Subtract Joins Full Outer Joins Self Joins Equi-Joins Natural Joins Cartesian Products



What Is a Join?
A join is a connection between two tables where the two tables are merged according to a field that they have in common, creating a new virtual table (which can be saved out as a real table). For example, with the following two tables:

     Color_Table:

Join_Field  Color_Field 1           Red 2           Blue 3           Green

Pattern_Table:

Join_Field  Pattern_Field 2           Striped 3           Checkered 4           Polka-Dot

a simple join resembles the following:

  Join_Field   Color_Field      Pattern_Field 2           Blue             Striped 3           Green            Checkered

The result table contains only records 2 and 3 in the Join_Field field because they are the only records that exist in both the Color_Table and the Pattern_Table.

A practical example of a join is a retailer's list of products and manufacturers; a Products table and a Vendors table could be joined on a Product ID field.

The Inner Join
The join in the previous example, called an inner join, is the simplest type of join. Usually, you want to use only a fraction of the fields in the tables. For example, in a real inner join, such as the one above, you may want to exclude Join_Field, so that the join looks like this:

  Color_Field      Pattern_Field Blue            Striped Green           Checkered

In Microsoft Query, inner joins are the default join type (for more information, see page 105 in the "Microsoft Query User's Guide," version 1.0).

The SQL Statement
A Structured Query Language (SQL) SELECT statement is a type of macro that you can use when you create a join. Note that SQL is very different from Microsoft Excel's other macro languages (Visual Basic for Applications, and the Excel 4.0 macro language).

It is NOT necessary to understand SQL in order to easily create joins in Microsoft Query.

Every join has a SELECT statement associated with it. You can view the SELECT Statement for any join in Microsoft Query by clicking "SQL" on the toolbar. As with Microsoft Excel's macro recorder, you can use Query to record the SELECT statement. Unlike Microsoft Excel's macro recorder, the SELECT statement recorder is always on and cannot be turned off. Here's how the SELECT statement might look in Microsoft Query for the above inner join:

  SELECT Color_Table.Color_Field, Pattern_Table.Pattern_Field FROM `c:\database`.Color_Table, `c:\database`.Pattern_Table WHERE Color_Table.Join_Field = Pattern_Table.Join_Field

Note that we use the database name 'C:\database' whose .mdb is the file name extension for a Microsoft Access database, which can have multiple tables in one file. In some other databases, such as dBASE, Paradox, and FoxPro, each table must have its own file. In these cases, the SQL syntax may appear redundant because the table name is always the same as the file name without the extension.

SQL syntax varies among querying engines; for example, in Microsoft Access the query from the above example resembles the following:

  SELECT Color_Table.[Color_Field], Pattern_Table.Pattern_Field FROM Pattern_Table INNER JOIN Color_Table ON  Pattern_Table.[Join_Field] = Color_Table.[Join_Field];

The path to the table is not used in Microsoft Access because the table is in a Microsoft Access .mdb file. Even if an external table is attached and used in the query, the Microsoft Access SQL statement does not display the path to the external table.

The Outer Join
Another kind of join is called an outer join. With an outer join, you get all the records from one table and only those records from the other table that have matching values from the first table. This may leave some of the field entries blank, or "Null." For any two tables to be joined, there are two possible outer joins, the "left outer join" and the "right outer join," (so named because you usually view the tables side-by-side). Using the previous two tables in an example, the following is one of the two possible outer joins:

  Join_Field   Color_Field   Pattern_Field 1           Red           (NULL) 2           Blue          Striped 3           Green         Checkered

The other possible join is as follows:

  Join_Field   Color_Field   Pattern_Field 2           Blue          Striped 3           Green         Checkered 4           (NULL)         Polka-Dot

NOTE: In an join, you do not see the word "NULL" when you view the data; use the keyword "NULL" when you are working with joins.

In Microsoft Query, both kinds of outer join can be created easily using the mouse (for more information about this procedure, see page 112 of the "Microsoft Query User's Guide," version 1.0).

The following is how the SQL statement might look for the second example of an outer join:

  SELECT Color_Table.Color_Field, Pattern_Table.Pattern_Field FROM {oj `c:\database`.Color_Table LEFT OUTER JOIN `c:\database`.Pattern_Table ON Color_Table.Join_Field = Pattern_Table.Join_Field}

To create a practical example of an outer join, make a list of a company's products with sales figures for the products that had been sold, but not excluding products that had not been sold. To do this, use a Product ID field to join a Products table and a Sales table.

The Subtract Join
A third kind of join is the subtract join. A subtract join is the opposite of an outer join; it includes ONLY those records in one table that don't match any record in the other table. Like outer joins, there are two possible subtract joins for any two tables that you want to join; however, they are not usually referred to as "Left subtract join" or "Right subtract join." A subtract join normally return fields from only one of the tables, because by definition the other table's fields return only NULL values. The following is one possible subtract join:

  join_Field   Color_Field 1           Red

and here is the other:

  Join_Field   Pattern_Field 4           Polka-Dot

In Microsoft Query, a subtract join is created by first creating an Outer join, and then using the criteria "IS NULL" on an appropriate field (Pattern_Field in the first example above; Color_Field in the second example) to exclude records that match between the tables.

The following is how the SQL statement might look for the first subtract join above:

  SELECT Color_Table.Join_Field, Color_Table.Color_Field FROM {oj `c:\database`.Color_Table LEFT OUTER JOIN `c:\database`.Pattern_Table ON Color_Table.Join_Field = Pattern_Table.Join_Field} WHERE (Pattern_Table.Pattern_Field Is Null)

To create a practical example of a subtract join, a list of customers that had not ordered recently. To do this, use an Order ID field to join a Customers table and an Orders table.

The Full Outer Join
A fourth kind of join is the full outer join. A full outer join is a combination of an outer join with its complimentary subtract join. A full outer join includes all records from both tables and merges those records that are common between the two tables. The following is a full outer join:

  Join_Field   Color_Field   Pattern_Field 1           Red           (NULL) 2           Blue          Striped 3           Green         Checkered 4           (NULL)        Polka-Dot

In Microsoft Query, a full outer join is created by inserting the UNION operator on its own line between the outer join's SELECT statement and the subtract join's SELECT statement (see above).

To create a full outer join in Microsoft Query, use the following steps to create the corresponding SQL sentence:


 * 1) Create an outer join, then click New on the File menu and create a subtract join.
 * 2) Copy the subtract join's SQL.
 * 3) Switch to the outer join, type the word UNION on its own line below the outer join's SQL statement, paste the subtract join's SQL below the word UNION, and close the subtract join's window.

Note The results of the following table miss the Join_Field's value "4" because the record that Join_Field's value equals 4 is in the Pattern_Table table. The below SQL sentence dones't select the Pattern_Table.John_Field field.

<pre class="fixed_text">  Join_Field   Color_Field   Pattern_Field (NULL)       Polka-Dot 1           Red           (NULL) 2           Blue          Striped 3           Green         Checkered

Here is how the SQL might look for the above full outer join (commands below the UNION operator is pasted):

<pre class="fixed_text">  SELECT Color_Table.Join_Field, Color_Table.Color_Field, Pattern_Table.Pattern_Field FROM {oj `C:\database`.Pattern_Table LEFT OUTER JOIN `C:\database`.Color_Table ON Color_Table.Join_Field = Pattern_Table.Join_Field} UNION SELECT Color_Table.Join_Field, Color_Table.Color_Field, Pattern_Table.Pattern_Field FROM {oj `C:\database`.Color_Table LEFT OUTER JOIN `C:\database`.Pattern_Table ON Color_Table.Join_Field = Pattern_Table.Join_Field} WHERE (Color_Table.Color_Field Is Null) or(Pattern_Table.Pattern_Field Is Null)

To create a practical example of a full outer join, merge overlapping customer lists used by different departments, including fax numbers (which were only on the first list), and Internet e-mail names (which are only in the second list). Each department could continue to use its partial list while having the complete, joined list available. They could be joined on a Customer ID field.

The Self-Join
A fifth kind of join is the self-join. A self-join is a connections where a field in a table is matched with a different field in a copy of the same table. Using this example table:

<pre class="fixed_text">     Table_Three

Employee_ID  Employee_Name   Reports_To 1            Bob             3 2            Sue             (NULL) 3            Jim             2 4            Jane            3

and a copy of it, as follows:

<pre class="fixed_text">     Table_Three_01

Employee_ID  Employee_Name   Reports_To 1            Bob             3 2            Sue             (NULL) 3            Jim             2 4            Jane            3

A self-join could be used to create a list of employee names with their supervisor's names. The Employee_ID in Table_Three would be joined with the Reports_To in Table_Three_01. The following is how it might look at first:

<pre class="fixed_text">  Employee_Name   Employee_Name Bob            Jim Sue            (NULL) Jim            Sue Jane           Jim

However, because it is confusing to have the same field name for both fields, change one of the field names, as follows:

<pre class="fixed_text">  Employee_Name   Supervisor Bob            Jim Sue            (NULL) Jim            Sue Jane           Jim

The following is how the SQL might look for the above self-join:

<pre class="fixed_text">  SELECT table_three.Employee_Name, table_three_01.Employee_Name 'Supervisor' FROM `c:\database`.table_three, `c:\database`.table_three_01 WHERE table_three.Employee_ID = table_three_01.Reports_To

When you return data to Microsoft Excel, it is of no use to rename the field in Microsoft Query. This is true because Microsoft Excel uses the original field name. For more information about this issue, please see the following article in the Microsoft Knowledge Base:

121551 : XL5: Field Instead of Column Name in MSQUERY Returned to Excel

A Microsoft Excel macro must change the column name every time the returned data is refreshed (unless you return the data in a PivotTable, in which case the Pivot itself can create and keep a custom field name).

The Equi-Join and Natural Join
Almost all joins, including all examples given so far, are equi-joins and natural joins. The meanings of these terms are of little significance to the average user of Microsoft Query, but the next two paragraphs attempt to explain the terms for those who may be curious.

The equi-join is a join in which records are retrieved based on whether the join fields have matching values in both tables. That may seem like just a definition of a join, but it isn't. An example of a non-equal join is a join where records in the first table are joined to those records in the second table where the joined field in the first table is greater than (instead of equal to) the joined field in the second table (or less than, or anything besides equal to). Naturally, this returns more records than an equi-join.

A natural join is one in which only one of the two tables' joined fields is returned. Since these two fields are by definition identical in an equi-join, it is redundant to include both. For a non-equal join, it is important to include both of those fields. So, equi-joins and natural joins go together. You would want an equi- joins (which describes most joins) to be natural join by returning only one of the joined fields; but, if you ever use a non-equal join, you might also want to make it a non-natural join by returning both of the joined fields.

There are other kinds of joins. The full spectrum of joins was most recently defined in 1992 and this standard is known as SQL-92. Some joins are not important to Microsoft Excel users because these joins do things that are easier to do in Microsoft Excel.

The Cartesian Product
Trying to return data from two or more tables without any joins creates what is called a "Cartesian product." A Cartesian product is defined as all possible combinations of rows in all tables. Be sure you have joins before trying to return data, because a Cartesian product on tables with many records and/or on many tables could take several hours to complete. The following is a Cartesian product as used on the two example tables; note that this table is only 3 records times 3 records, which yields a total of 9 records. However, imagine if instead, the table was 100 records times 1,000 records times 10,000 records; then the table would contain 1,000,000,000 records!

<pre class="fixed_text">  Join_Field   Join_Field   Color_Field   Pattern_Field 1           2            Red           Striped 1           3            Red           Checkered 1           4            Red           Polka-Dot 2           2            Blue          Striped 2           3            Blue          Checkered 2           4            Blue          Polka-Dot 3           2            Green         Striped 3           3            Green         Checkered 3           4            Green         Polka-Dot

Occasionally, some users want to use a Cartesian product; however, most users who get them do so accidentally, and are often confused by them. Because most users exclude most of the fields in a join, a real Cartesian product can easily look as perplexing as this:

<pre class="fixed_text">  Color_Field Red Red Red Blue Blue Blue Green Green Green

If 100 records are added to Pattern_Table, this query would have 309 records (103 records each of Red, Blue, and Green).

Cartesian products have relatively simple SELECT statements. The following is how the SQL might look for the above Cartesian product:

<pre class="fixed_text">  SELECT Color_Table.Color_Field, Pattern_Table.Pattern_Field FROM `c:\database`.Color_Table, `c:\database`.Pattern_Table

A practical example of a Cartesian product would be to create a list of all possible combinations of options on a merchandise product, with price totals for each combination.

<div class="references_section">