Microsoft KB Archive/51026

{| = Excel: Calculating Linear Regression =
 * width="100%"|

Last reviewed: November 2, 1994

Article ID: Q51026

SUMMARY
Microsoft Excel can be very useful when you need to make predictions of future data based on historical data. One of the most common techniques used for making such predictions is that of linear regression.

MORE INFORMATION
For example, assume you have accumulated eight years' worth of sales figures for a company. These figures are as follows:

|  A  |       B   1 | Year | Quantity Sold 2 |  1  |     1250   3 |   2  |     1540   4 |   3  |     1798   5 |   4  |     3300   6 |   5  |     4000   7 |   6  |     3700   8 |   7  |     3100   9 |   8  |     3484 From these numbers, you want to predict the sales for the next years (years 9, 10, and 11). To accomplish this, use the Excel TREND function. The TREND function accepts arrays of known y's and known x's, and returns a linear trend of values along the best fit line for these x's and y's. (Note: The cells in column C are formatted as &quot;0.00&quot;.) |  A  |        B      |     C    - 1 | Year | Quantity Sold | Predicted 2 |  1  |      1250     |  1528.00    3 |   2  |      1540     |  1883.29    4 |   3  |      1798     |  2238.57    5 |   4  |      3300     |  2593.86    6 |   5  |      4000     |  2949.14    7 |   6  |      3700     |  3304.43    8 |   7  |      3100     |  3659.71    9 |   8  |      3484     |  4015.00   10 |   9  |               |  4370.29   11 |  10  |               |  4725.57   12 |  11  |               |  5080.86 In cells C2:C12, enter the formula =TREND(B2:B9,A2:A9,A2:A12) as an array (by selecting C2:C12, typing the formula, and pressing COMMAND+ENTER). In this formula, cells B2:B9 are the known y's, A2:A9 are the known x's, and A2:A12 are the new x's (the points on the line for which we want to have y values returned). You can now chart the data to view your results. If you're using Excel 2.20, do the following:


 * 1) Select the area A1:C12.
 * 2) From the Edit menu, choose Copy to copy this area to the Clipboard.
 * 3) From the File menu, choose New, select Chart, and click OK.
 * 4) From the Edit menu, choose Paste Special (Values in Columns, Series Names in First Row, Categories in First Column).
 * 5) From the Gallery menu, choose Line.

If you're using Excel version 1.00, 1.03, 1.04, 1.06, or 1.50, do the following:
 * 1) Select the area C1:C12.
 * 2) From the Edit menu, choose Copy to copy this area to the Clipboard.
 * 3) From the File menu, choose New, select Chart, and click OK.
 * 4) From the Edit menu, choose Paste.
 * 5) Use the Gallery menu to make a line chart.
 * 6) Using the Window menu, activate your worksheet.
 * 7) Select the area B1:B9.
 * 8) From the Edit menu, choose Copy.
 * 9) Using the Window menu, activate the chart you created in Step 3.
 * 10) From the Edit menu, choose Paste.

The chart created should look similar to the following (without the labels): 6000.00|        |         |  5000.00|     Predicted ==> ! |             ___ /         |             /  /\  % <== Actual 4000.00|           / /   \/         |           |/         |          /|  3000.00|        / /         |      /  |         |    /   /  2000.00|  /   _/         |/  _/         ! /  1000.00%         |         |     0.00|         --         1 2 3 4 5 6 7 8 9 10 11 After predicting the values, you need to determine how accurately the predicted line reflects the actual data. To do this, calculate the coefficient of determination, or the &quot;R squared&quot; value, as follows: r2=sum(Yest-Yavg)2/sum(Y-Yavg)2 To begin with, calculate the average for the y's (Quantity Sold). In cell B15, enter the formula =AVERAGE(B2:B9) and receive the result of 2771.50. The worksheet now looks like the following: |  A  |       B       |       C    -- 1 | Year | Quantity Sold | Predicted 2 |  1  |     1250      |   1528.00    3 |   2  |     1540      |   1883.29    4 |   3  |     1798      |   2238.57    5 |   4  |     3300      |   2593.86    6 |   5  |     4000      |   2949.14    7 |   6  |     3700      |   3304.43    8 |   7  |     3100      |   3659.71    9 |   8  |     3484      |   4015.00   10 |   9  |               |   4370.29   11 |  10  |               |   4725.57   12 |  11  |               |   5080.86   13 |      |               |   14 |      |               |   15 | Yavg |               |   2771.5 Next, establish two columns to represent the two parts of your equation. In column D, enter the formula for (Yest-Yavg)2, and in column E, enter the formula for (Y-Yavg)2. In cell D2, enter the formula =(C2-$B$15)^2 and then fill the formula through cell D9. In cell E2, enter the formula =(B2-$B$15)^2 and then fill this formula down through cell E9. Now, calculate the coefficient of determination in cell E15 by entering the formula =SUM(D2:D9)/SUM(E2:E9). (Note: Columns C, D, and E are formatted as &quot;0.00&quot;, and cell E15 is formatted as &quot;General&quot;.) |  |   A  |       B       |     C     |       D       |      E     | | 1 | Year | Quantity Sold | Predicted | (Yest-Yavg)^2 | (Y-Yavg)^2 | The closer the R squared value comes to 1, the better the fit of data points to the predicted values. In this case, the R squared value returned is .65895, which represents a fairly good fit, but also indicates that our predicted values are not very reliable.
 * 2 |  1  |     1250      |  1528.00  |  1546292.25   | 2314962.25 |
 * 3 |  2  |     1540      |  1883.29  |   788924.62   | 1516592.25 |
 * 4 |  3  |     1798      |  2238.57  |   284012.86   |  947702.25 |
 * 5 |  4  |     3300      |  2593.86  |    31556.98   |  279312.25 |
 * 6 |  5  |     4000      |  2949.14  |    31556.98   | 1509212.25 |
 * 7 |  6  |     3700      |  3304.43  |   284012.86   |  862112.25 |
 * 8 |  7  |     3100      |  3659.71  |   788924.62   |  107912.25 |
 * 9 |  8  |     3484      |  4015.00  |  1546292.25   |  507656.25 |
 * 10 |  9  |               |  4370.29  |               |            |
 * 11 | 10  |               |  4725.57  |               |            |
 * 12 | 11  |               |  5080.86  |               |            |
 * 13 |     |               |           |               |            |
 * 14 |     |               |           |               |            |
 * 15 | Yavg |    2771.5    |           |               | 0.658952019|
 * }