Microsoft KB Archive/172199

= How To Optimize Queries in Visual Basic =

Article ID: 172199

Article Last Modified on 7/1/2004

-

APPLIES TO


 * Microsoft Visual Basic 5.0 Professional Edition
 * Microsoft Visual Basic 5.0 Enterprise Edition
 * Microsoft Visual Basic 4.0 Professional Edition
 * Microsoft Visual Basic 4.0 Professional Edition
 * Microsoft Visual Basic 4.0 16-bit Enterprise Edition
 * Microsoft Visual Basic 4.0 32-Bit Enterprise Edition
 * Microsoft Visual Basic 3.0 Professional Edition
 * Microsoft Visual Basic 3.0 Professional Edition

-



This article was previously published under Q172199



SUMMARY
Visual Basic allows you to retrieve data from Jet databases (MDB files) by using Structured Query Language (SQL). These query operations can be made more efficient by implementing some of the suggestions in this article.

This article assumes that you are using the Microsoft Jet database engine. If you are querying ODBC tables, many of these points still apply. For more information regarding improving performance of ODBC queries, please search on the following words in the Microsoft Knowledge Base:

ODBC and Optimizing and Tables



MORE INFORMATION
Here are some tips for optimizing your SQL queries:

Performance Analyzer
If you have Microsoft Access 95 or 97, you can open the database and use the Performance Analyzer to profile your queries and suggest improvements.

Table Design
When defining a field in a table, choose the smallest data type appropriate for the data in the field. This increases the number of records that can fit on a page.

Fields you use in joins should have the same or compatible data types.

Compact the Database
This has two performance benefits:

 The Microsoft Jet database engine uses a cost-based method of optimization. As your database grows, the optimization scheme may no longer be efficient. Compacting the database updates the database statistics and re-optimizes all queries.  As your database grows, it will become fragmented. Compacting writes all the data in a table into contiguous pages on the hard disk, improving performance of sequential scans.

To compact your database, use the CompactDatabase statement. This example compacts the database and makes a backup:

  DBEngine.CompactDatabase "C:\VB\BIBLIO.MDB", "C:\VB\BIBLIO2.MDB" Kill "C:\VB\BIBLIO.BAK" Name "C:\VB\BIBLIO.MDB" As "C:\VB\BIBLIO.BAK" Name "C:\VB\BIBLIO2.MDB" As "C:\VB\BIBLIO.MDB"

If your database is updated heavily, you may want to consider compacting nightly. 

Avoid Expressions in Query Output
Expressions in query output can cause query optimization problems if the query is used later as input to another query and you add criteria to the calculated output. In the following example, Query1 is used as input for a second SELECT statement:

Dim DB As Database Dim RS As RecordSet Set DB = DBEngine.Workspaces(0).Opendatabase("Biblio.MDB") DB.CreateQueryDef("Query1", _     "SELECT IIF(Au_ID=1,'Hello','Goodbye') AS X FROM Authors") Set RS = DB.OpenRecordSet("SELECT * FROM Query1 WHERE X='Hello'")

Because the IIF expression in Query1 cannot be optimized, the WHERE condition in second SELECT statement also cannot be optimized. If an expression gets buried deeply enough in a query tree, you can forget that it is there. As a result, your entire string of queries cannot be optimized.

If you can, merge the SQL into a single level of nesting:

Set RS = DB.OpenRecordSet("SELECT * FROM Authors WHERE Au_ID=1")

For more complex nested queries, expose the fields that make up the expression:

DB.CreateQueryDef("Query1", _     "SELECT IIF(Au_ID=1,'Hello','Goodbye') AS X, Au_ID, FROM Authors") Set RS = DB.OpenRecordSet("SELECT * FROM Query1 WHERE Au_ID=1")

If you cannot avoid calculated values in query output, place them in the top-level query and not in lower-level queries.

Output Only the Fields Needed
When creating a query, return only the fields you need. If a field doesn't have to be in the SELECT clause, don't add it. The above example of exposing additional fields to make nested queries more efficient is an exception.

GROUP BY, Joins, and Aggregates
This is an issue when you are joining two tables. For example, if you join two tables on the Customer Name field, and also GROUP BY the Customer Name field, make sure that both the GROUP BY field (Customer Name) and the field that is in the aggregate (Sum, Count, and so on) come from the same table.

NOTE: This query is less efficient because the SUM aggregate is on the Ord table and the GROUP BY clause is on the Cust table:

SELECT Cust.CustID, FIRST(Cust.CustName) AS CustName, SUM(Ord.Price) AS Total FROM Cust INNER JOIN Ord ON Cust.CustID = Ord.CustID GROUP BY Cust.CustID

A more efficient query would be to GROUP BY on Ord.CustID:

SELECT Ord.CustID, FIRST(Cust.CustName) AS CustName, SUM(Ord.Price) AS Total FROM Cust INNER JOIN Ord ON Cust.CustID = Ord.CustID GROUP BY Ord.CustID

NOTE: The First and Last functions do not have the overhead of other aggregates and should not weigh very heavily in this decision.

GROUP BY As Few Fields As Possible
The more fields in the GROUP BY clause, the longer the query takes to execute. Use the First aggregate function to help reduce the number of fields required in the GROUP BY clause.

Less efficient:

SELECT Cust.CustID, Cust.CustName, Cust.Phone, SUM(Ord.Price) AS Total FROM Cust INNER JOIN Ord ON Cust.CustID = Ord.CustID GROUP BY Cust.CustID, Cust.CustName, Cust.Phone

More efficient:

SELECT Ord.CustID, FIRST(Cust.CustName) AS CustName, FIRST(Cust.Phone) AS Phone, SUM(Ord.Price) AS Total FROM Cust INNER JOIN Ord ON Cust.CustID = Ord.CustID GROUP BY Ord.CustID

Nest GROUP BY Clause Before Joining
If you are joining two tables and only grouping by fields in one of them, it may be more efficient to split the SELECT statement into two queries. making the SELECT statement with the GROUP BY clause into a nested query joined to the non-grouped table in the top-level query.

Less efficient:

SELECT Ord.CustID, FIRST(Cust.CustName) AS CustName, FIRST(Cust.Phone) AS Phone, SUM(Ord.Price) AS Total FROM Cust INNER JOIN Ord ON Cust.CustID = Ord.CustID GROUP BY Ord.CustID

More efficient:

Query1: SELECT CustID, SUM(Price) AS Total FROM Ord GROUP BY CustID

Query2: SELECT Query1.CustID, Cust.CustName, Cust.Phone, Query1.Total FROM Cust INNER JOIN Ord ON Cust.CustID = Ord.CustID

Index Both Fields Use in Join
When joining tables, try to index the fields on both sides of a join. This can speed query execution by allowing the query optimizer to use more sophisticated internal join strategy.

However, if you know one table is going to remain relatively small (occupy 1-2 2K pages), it may be more efficient to remove indexes in that table because fewer pages will have to be read into memory. You should try this on a case-by-case basis.

Add Indexes to Speed Searches and Sorts
Place an index on all fields that are used in a join or in a restriction. With the use of Rushmore query optimization technology, the Microsoft Jet 2.0 and later database engine is able to take advantage of multiple indexes on a single table, which makes indexing multiple fields advantageous.

Avoid restrictive query criteria on calculated and non-indexed columns whenever possible.

Use sorting judiciously, especially with calculated and non-indexed fields.

Use Optimizable Expressions
Try to construct your queries so that Rushmore technology can be used to help optimize them. Rushmore is a data-access technology that permits sets of records to be queried very efficiently. With Rushmore, when you use certain types of expressions in query criteria, your query will run much faster. Rushmore does not automatically speed up all your queries. You must construct your queries in a certain way for Rushmore to be able to improve them.

Use the REFERENCES section at the end of the article to locate more specific information.

Use COUNT(*) Instead of COUNT([Column Name])
The Microsoft Jet database engine has special optimizations that allow COUNT(*) to be executed much faster than COUNT([Column Name]).

NOTE: These two operations also have slightly different behavior:


 * Count(*) counts all rows returned.
 * Count([Column Name]) counts all rows where [Column Name] is not NULL.

Avoid LIKE on Parameters
Because the value of the parameter is unknown at the time the query is compiled, indexes will not be used. You can gain performance by concatenating the parameter value as a literal in the SQL statement.

Use the REFERENCES section at the end of the article to locate more specific information.

Avoid LIKE and Leading Wildcard
If you use the LIKE operator with a wildcard to find approximate matches, use only one asterisk at the end of character string to ensure that an index is used. For example, the following criteria uses an index:

  Like "Smith" Like "Sm*"

The following criteria does not use an index:

  Like "*sen" Like "*sen*"

Test Joins with Restrictions
If you use criteria to restrict the values in a field used in a join, test whether the query runs faster with the criteria placed on the "one" side or the "many" side of the join. In some queries, you get faster performance by adding the criteria to the field on the "one" side of the join instead of the "many" side.

Use Intermediate Tables
Use SELECT INTO statements to create work tables, especially if the results are going to be used in a number of other queries. The more work you can do up-front, the more efficient the process.

Avoid NOT IN with SubSelects
Using sub-selects and NOT IN is poorly optimized. Converting to nested queries or OUTER JOINs are more efficient. The following example finds customers without orders:

Less efficient:

SELECT Customers.* FROM Customers WHERE Customers.[Customer ID] NOT IN (SELECT [Customer ID] FROM Orders);

More efficient:

SELECT Customers.* FROM Customers LEFT JOIN Orders ON Customers.[Customer ID] = Orders.[Customer ID] WHERE ((Orders.[Customer ID] Is Null));

