Microsoft KB Archive/41678

From BetaArchive Wiki

Method to Calculate Simple Linear Regression in Excel PSS ID Number: Q41678 Article last modified on 02-26-1993 PSS database name: W_eXceL

2.00 2.01 2.10 3.00 | 2.20 2.21 3.00

WINDOWS | OS/2

The information in this article applies to:
- Microsoft Excel for Windows, versions 2.0, 2.1 and 3.0 - Microsoft Excel for OS/2, version 2.2 and 3.0

Summary:

The following article discusses the manual method of calculating simple linear regression. Please note that there are two additional articles discussing curve fitting and regression techniques using the Excel TREND function and the Newton method of curve fitting. For more information about the Newton method, query on the words:

Excel and Regress and Newton

More Information:

The formula used to calculate the coefficient of determination (the R Squared) value is as follows:

(n * sum(X * Y)) - (sum(X) * sum(Y)) —————————————————- <-divided by Sqrt[n * sum(X) squared] * Sqrt[n * sum(Y) squared]

where n = the number of observations plotted X = the independent variable (Predictor) Y = the dependent variable (Predicted)

This formula yields the number that determines approximately how much of the data is explained/accounted for by the data gathered. Therefore, a coefficient of variation of .85 explains about 85% of the distribution of the plotted data.

The regression equation itself is determined by first calculating the slope of the line (m) and the value of Y when x = 0 (b) (that is, the height of the line at the Y intercept).

The “m” variable is determined by the following function:

(n * sum(X * Y)) - (sum(X) * sum(Y)) —————————————— <-divided by (n * sum(Xsquared)) - (sum(X) squared)

The “a” variable is determined by the following function:

sum(Y)sum(Xsquared) - sum(X)sum(XY) ————————————– <-divided by n*sum(Xsquared) - (sum(X) squared)

The best fit line is therefore calculated by the following formula:

Y = mX + b

Therefore, Y is calculated by multiplying the slope of the line (m) by each predicted X value. The resulting values (when plotted) are the regression (or best fit) line.

Additional reference words: 2.0 2.00 2.01 2.1 2.10 2.2 2.20 2.21 3.0 3.00

Copyright Microsoft Corporation 1993.