Microsoft KB Archive/66403

{| = How to Calculate the Correlation Coefficient in Excel =
 * width="100%"|

Last reviewed: November 4, 1994

Article ID: Q66403

SUMMARY
The formula for the correlation coefficient is as follows:

{=SQRT(SUM((TREND(Y's,X's)-AVERAGE(Y's))^2/SUM((Y's-AVERAGE(Y's))^2))}

This MUST be entered as an array formula.

The example assumes the following:

Known-Ys     Known-Xs

A1: 3100    B1: 3000 A2: 4500    B2: 4000 A3: 4400    B3: 5000 A4: 5400    B4: 6000 A5: 7500    B5: 7000 A6: 8100    B6: 8000 {=SQRT(SUM((TREND(A1:A6,B1-B6)-AVERAGE(A1:A6)) ^2/SUM((A1:A6-AVERAGE(A1:A6))^2)))} In this case, the known &quot;Y&quot; values are close to the estimated &quot;Y&quot; values. Applying the above formula for the correlation coefficient returns a value of .9335. This value is quite close to +1, indicating that the TREND function returns Y values that are good predictions.

MORE INFORMATION
You may want to know how closely the TREND function predicts new Ys according to your actual data. The correlation coefficient is a measure used primarily in psychological and educational research and helps to predict the &quot;error&quot; of the New Ys. The correlation coefficient is plus or minus the square root of the coefficient of determination. The correlation coefficient gives an approximation of the error of the Known-Ys in relation to the New-Ys (which are returned by the TREND function). The correlation coefficient is a number between -1 and +1.

A value close to 0 means that the residuals (distance between the Known-Ys and New-Ys) are quite large, and that the plotted points are relatively far from the regression line. In other words, the New-Ys do not predict the Known-Ys very well, and the model (best-fit line) is a failure. Conversely, if the coefficient is close to +1, the residuals are relatively small and the plotted points are close to the regression line.

In such a situation, the model is a relatively good predictor of the dependent variable (Known-Ys).
 * }