Microsoft KB Archive/826112

From BetaArchive Wiki

Article ID: 826112

Article Last Modified on 1/18/2007



APPLIES TO

  • Microsoft Office Excel 2007
  • Microsoft Office Excel 2003
  • Microsoft Excel 2004 for Mac



SUMMARY

The purpose of this article is to describe the VAR function in Microsoft Office Excel 2003 and in later versions of Excel, to illustrate how the VAR function is used, and to compare the results of the VAR function in Excel 2003 and in later versions of Excel with the results of the VAR function in earlier versions of Excel.

Microsoft Excel 2004 for Mac information

The statistical functions in Excel 2004 for Mac were updated by using the same algorithms that were used to update the statistical functions in Excel 2003 and in later versions of Excel. Any information in this article that describes how a function works or how a function was modified for Excel 2003 and for later versions of Excel also applies to Excel 2004 for Mac.

MORE INFORMATION

The VAR function returns the sample variance for a sample whose values are contained in an Excel worksheet and whose values are specified by the arguments to VAR.

Syntax

The following code illustrates the VAR function (where value1, value2, and value3 represent up to 30 value arguments):

VAR(value1, value2, value3, ...)

The most common usage of VAR includes only one value argument that specifies a range of cells that contain the sample (for example, VAR(A1:B100)).

Example of usage

Create a blank Excel worksheet, copy the following table, select cell A1 in your blank Excel worksheet, and then paste the entries so that the following table fills cells A1:D17 in your worksheet.

Data
6 Sample Mean =AVERAGE(A3:A8)
4 Sample Size =Count(A3:A8)
2 VAR =VAR(A3:A8)
1 Pre-Excel 2003 VAR v1 =(D4*SUMSQ(A3:A8) - SUM(A3:A8)^2)/(D4*(D4 - 1))
3 Pre-Excel 2003 VAR v2 =(SUMSQ(A3:A8) - (SUM(A3:A8)^2)/D4)/(D4 - 1)
5 VAR in Excel 2003 and in later versions of Excel =DEVSQ(A3:A8)/(D4 - 1)
Modified Data Power of 10 Added to Data 1
=A3 + 10^$D$10 Sample Mean =AVERAGE(A12:A17)
=A4 + 10^$D$10 Sample Size =COUNT(A12:A17)
=A5 + 10^$D$10 VAR =VAR(A12:A17)
=A6 + 10^$D$10 Pre-Excel 2003 VAR v1 =(D13*SUMSQ(A12:A17) - SUM(A12:A17)^2)/(D13*(D13 - 1))
=A7 + 10^$D$10 Pre-Excel 2003 VAR v2 =(SUMSQ(A12:A17) - (SUM(A12:A17)^2)/D13)/(D13 - 1)
=A8 + 10^$D$10 VAR in Excel 2003 and in later versions of Excel =DEVSQ(A12:A17)/(D13 - 1)

After you paste this table into your new Excel worksheet, click Paste Options, and then click Match Destination Formatting. With the pasted range still selected, use one of the following procedures, as appropriate for the version of Excel that you are running:

  • In Microsoft Office Excel 2007, click the Home tab, click Format in the Cells group, and then click AutoFit Column Width.
  • In Excel 2003, point to Column on the Format menu, and then click AutoFit Selection.

Cells A3:A8 contain six data points that are used in this example.

Cell D5 contains the value of VAR for your current version of Excel. If you use Excel 2003 or a later version of Excel, this value should agree with the value in cell D8 because cell D8 shows the value of VAR for Excel 2003 and for later versions of Excel, regardless of which version of Excel you use. Cells D6 and D7 show two approximations of the value of VAR that were computed by earlier versions of Excel. The formula in cell D6 is the formula that is shown in the Help file for Microsoft Excel 2002 and earlier versions of Excel.

In this example, all versions return the value 3.5000000. There are no computational problems that may cause differences in VAR between versions of Excel.

Rows 10 to 17 let you experiment with modified data by adding a constant (in this case, a power of 10 is added) to each data point. Adding a constant to each data point should not affect the value of sample variance.

If you change the value in cell D10 (for example, you change the value to 1, 2, 3, 4, 5, 6, or 7), the revised data values are in cells A12:A17 and all versions of VAR are well-behaved in these seven cases.

However, if you try the values 8, 9, and 10 in cell D10, the value for Excel 2003 and for later versions of Excel remains at 3.5 (as it should), while the values for Excel 2002 and for earlier versions of Excel change, even though they should remain constant at 3.5. This behavior would not occur if computations could be performed with infinite precision.

Earlier versions of Excel show answers that are not correct in these cases because the effects of round-off errors are more profound with the computational formula that is used in these versions. However, the cases in this example are rather extreme.

Results in earlier versions of Excel

In extreme cases where there are many significant digits in the data that have little variation, the old computational formula causes results that are not accurate. Earlier versions of Excel use a one-pass process through the data to compute the sum of squares of the data values, the sum of the data values, and the count of the data values (sample size). These quantities are then combined in the computational formula that is provided in the Help files in earlier versions of Excel.

Results in Excel 2003 and in later versions of Excel

The procedure that is used in Excel 2003 and in later versions of Excel uses a two-pass process through the data.

First, the sum and count of the data values are computed. The sample mean, or average, is computed from these values. In the second pass, the squared difference between each data point and the sample mean is found and these squared differences are summed.

In the numeric examples, even with a high power of 10 in cell D10, these squared differences are not affected, and the results of the second pass are independent of the entry in cell D10. Therefore, the results in Excel 2003 and in later versions of Excel are more stable numerically.

Conclusions

Replacing a one-pass process with a two-pass process guarantees better numeric performance of VAR in Excel 2003 and in later versions of Excel. Results that you obtain in Excel 2003 and in later versions of Excel are never less accurate than results that you obtain in earlier versions of Excel.

In most practical examples, however, you are not likely to see a difference between results in later versions of Excel and results in earlier versions of Excel because typical data is not likely to exhibit the kind of unusual behavior that this example illustrates. Numeric instability is most likely to appear in earlier versions of Excel when the data contains many significant digits that have relatively little variation.

If you use an earlier version of Excel, and if you want to determine whether switching to Excel 2003 or a later version of Excel will make a difference, compare the results of

VAR(values)

with the results of

DEVSQ(values)/(COUNT(values) – 1)

If the results match the level of accuracy that you want, the value of VAR is not affected if you switch to Excel 2003 or to a later version of Excel.

If you use Excel 2003 or a later version of Excel, and if you want to determine whether the computed value of VAR(values) has changed from the value that you found in an earlier version of Excel, compare

VAR(values)

with

(SUMSQ(values) - (SUM(values)^2)/COUNT(values))/(COUNT(values) - 1)

This comparison provides a good approximation of the value of VAR in earlier versions of Excel.

The procedure of finding the sum of squared deviations about a sample mean by finding the sample mean, computing each squared deviation, and then summing the squared deviations is more accurate than the following procedure:

  • Find the sum of squares of all observations, the sample size, and the sum of all observations.
  • Compute the sum of squares of all observations minus ((sum of all observations)^2)/sample size).

This procedure is frequently referred to as the "calculator formula" because you can use it on a calculator for a small number of data points.

There are many other functions that have been improved for Excel 2003 and for later versions of Excel. These functions were improved by replacing the one-pass process with the two-pass process that finds the sample mean on the first pass and computes the sum of squared deviations on the second pass.

The following functions have been improved in Excel 2003 and in later versions of Excel:

  • VAR
  • VARP
  • STDEV
  • STDEVP
  • DVAR
  • DVARP
  • DSTDEV
  • DSTDEVP
  • FORECAST
  • SLOPE
  • INTERCEPT
  • PEARSON
  • RSQ
  • STEYX

Similar improvements have been made in each of the three Analysis of Variance tools in the Analysis ToolPak.

Keywords: kbformula kbexpertisebeginner kbinfo KB826112