Microsoft KB Archive/827459

= Excel statistical functions: BINOMDIST =

Article ID: 827459

Article Last Modified on 1/17/2007

-

APPLIES TO


 * Microsoft Office Excel 2007
 * Microsoft Office Excel 2003
 * Microsoft Excel 2004 for Mac

-



SUMMARY
This article describes the BINOMDIST function in Microsoft Office Excel 2003 and in later versions of Excel, illustrates how to use the function, and compares the results of the function for Excel 2003 and for later versions of Excel with its results for earlier versions of Excel.

Microsoft Excel 2004 for Mac information
The statistical functions in Excel 2004 for Mac were updated by using the same algorithms that were used to update the statistical functions in Excel 2003 and in later versions of Excel. Any information in this article that describes how a function works or how a function was modified for Excel 2003 and for later versions of Excel also applies to Excel 2004 for Mac.



MORE INFORMATION
When  = TRUE, the BINOMDIST function returns the probability of   or fewer successes in   independent Bernoulli trials. Each of the trials has an associated probability  of success (and probability 1-  of failure). When  = FALSE, BINOMDIST returns the probability of exactly   successes.

Syntax
BINOMDIST(x, n, p, cumulative)

Parameters

 * is a non-negative integer
 * is a positive integer
 * 0 <  < 1
 * is a logical variable that takes on the values TRUE or FALSE

Example of usage
Make the following assumptions:
 * In baseball, a &quot;.300 hitter&quot; hits (succeeds) with probability 0.300 each time he comes to bat (each trial).
 * Successive times at bat are independent Bernoulli trials.

You can use the following table to find the probability that such a batter gets exactly 0, 1, 2, ..., or 10 hits in 10 trials and the probability that the batter gets 0, 1 or fewer, 2 or fewer, ..., 9 or fewer, or 10 or fewer hits in 10 trials.

If the batter gets 50 hits in his first 200 trials (a .250 average), he must get 100 hits in his next 300 trials to have 150 hits and a .300 average over 500 trials. You can use the following table to analyze the chance that the batter gets sufficient hits to maintain his average. Baseball commentators frequently allude to the &quot;law of averages&quot; when they say that fans do not have to worry about the performance of this batter with only 50 hits in his first 200 trials because &quot;by the end of the season his average will be .300.&quot; If the trials really were independent, and the batter really had a 0.3 chance of success on any one trial, this reasoning is fallacious because the outcomes of the first 200 trials do not affect the success or the failure over the last 300 trials.

To illustrate the use of BINOMDIST, create a blank Excel worksheet, copy the following table, select cell A1 in your blank Excel worksheet, and then paste the entries so that the following table fills cells A1:C22 in your worksheet.

Note After you paste this table into your new Excel worksheet, click the Paste Options button, and then click Match Destination Formatting. With the pasted range still selected, use one of the following procedures, as appropriate for the version of Excel that you are running:
 * In Microsoft Office Excel 2007, click the Home tab, click Format in the Cells group, and then click AutoFit Column Width.
 * In Excel 2003 and in earlier versions of Excel, point to Column on the Format menu, and then click AutoFit Selection.

You may want to format cells B4:C22 for consistent readability (for example, format numbers to five decimal places).

Cells B4:B14 show the probabilities of exactly  successes in 10 trials. The most likely number of successes is 3. The chances of 0, 6, 7, 8, 9, or 10 successes are each less than 0.05 and add to about 0.076. So the chances of 1, 2, 3, 4, or 5 successes is about 1 – 0.076 = 0.924. Cells C4:C14 show the probabilities of  or fewer successes in 10 trials. You can verify that the entries in column C in any row are each equal to the sum of all the entries in column B, down to and including that row.

B18:B20 show that the most likely number of successes in 300 trials is 90. The probability of exactly  successes increases as   increases to 90, and then decreases as   continues to increase higher than 90. The chance of 90 or fewer successes is just over 50%, as C20 shows. The chance of 99 or fewer successes is about 0.884. Therefore, there is only an 11.6% chance (0.116 = 1 – 0.884) of 100 or more successes.

Results in earlier versions of Excel
Knusel (see note 1) documented instances where BINOMDIST does not return a numeric answer and yields #NUM! instead because of a numeric overflow. When BINOMDIST returns numeric answers, they are correct. BINOMDIST returns #NUM! only when the number of trials is greater than or equal to 1030. There are no computational problems if  < 1030. In practice, such high values of  are unlikely. With such a high number of independent trials, a user may want to approximate the Binomial distribution by a normal distribution (if *  and  *(1- ) are sufficiently high, for example, each is greater than 30) or by a Poisson distribution otherwise.

Note 1 Knusel, L. &quot;On the Accuracy of Statistical Distributions in Microsoft Excel 97&quot;, Computational Statistics and Data Analysis (1998), 26: 375-377.

For the non-cumulative case, BINOMDIST(, ,  , false) uses the following formula

COMBIN(n,x)*(p^x)*((1-p)^(n-x))

COMBIN is an Excel function that gives the number of combinations of  items in a population of   items. COMBIN is sometimes written C, and named a &quot;combinatorial coefficient&quot; or just, &quot;  choose  &quot;. If you experiment with COMBIN by typing =COMBIN(1029,515) in one cell and =COMBIN(1030,515) in a different cell, the first cell returns an astronomical number, 1.4298E+308, and the second cell returns #NUM! because it is even larger. The overflow of COMBIN causes an overflow of BINOMDIST in earlier versions of Excel.

COMBIN has not been modified for Excel 2003 and for later versions of Excel.

Results in Excel 2003 and in later versions of Excel
Because Microsoft has diagnosed when an overflow causes BINOMDIST to return #NUM! and knows that BINOMDIST is well-behaved when overflow does not occur, Microsoft has implemented a conditional algorithm in Excel 2003 and in later versions of Excel.

The algorithm uses BINOMDIST code from earlier versions of Excel (the computational formula mentioned earlier in this article) when  < 1030. When  >= 1030, Excel 2003 and later versions of Excel use the alternative algorithm that is described later in this article.

Typically, COMBIN overflows because it is astronomical, but ^  and (1- )^( - ) are each infinitesimal. If it were possible to multiply them together, the product would be a realistic probability between 0 and 1. However, because existing finite arithmetic cannot multiply them, an alternative algorithm avoids the evaluation of COMBIN.

Microsoft's approach calculates an unscaled sum of all the probabilities of exactly  successes that are used later for scaling purposes. It also calculates an unscaled value of the probability that you want BINOMDIST to return. Finally, it uses the scaling factor to return a correct BINOMDIST value.

The algorithm takes advantage of the fact that the ratio of successive terms of the form COMBIN*( ^ )*((1- )^( - )) has a simple form. The algorithm proceeds as described in the pseudocode in the following steps.

Step 0: (Initialization). Initialize the TotalUnscaledProbability and the UnscaledResult properties to 0. Initialize the constant EssentiallyZero to a very small number, for example, 10^(-12).

Step 1: Find *  and round down to the nearest whole number,. The most likely number of successes in  trials is either   or  +1. COMBIN*( ^ )*((1- )^( - )) decreases as  decreases from   to  -1 to  -2, and so on. Also, COMBIN*( ^ )*((1- )^( - )) decreases as  increases from  +1 to  +2 to  +3, and so on.

TotalUnscaledProbability = TotalUnscaledProbability + 1; If (m == x) then UnscaledResult = UnscaledResult + 1; If (cumulative && m < x) then UnscaledResult = UnscaledResult + 1;

Step 2: Calculate the unscaled probabilities for  >  :

PreviousValue = 1; Done = FALSE; k = m + 1; While (not Done && k <= n) { CurrentValue = PreviousValue * (n – k + 1) * p / (k * (1 – p)); TotalUnscaledProbability = TotalUnscaledProbability + CurrentValue; If (k == x) then UnscaledResult = UnscaledResult + CurrentValue; If (cumulative && k < x) then UnscaledResult = UnscaledResult + CurrentValue; If (CurrentValue <= EssentiallyZero) then Done = TRUE; PreviousValue = CurrentValue; k = k+1; } end While;

Step 3: Calculate the unscaled probabilities for  <  :

PreviousValue = 1; Done = FALSE; k = m - 1; While (not Done && k >= 0) {   CurrentValue = PreviousValue * k+1 * (1-p) / ((n – k) * p); TotalUnscaledProbability = TotalUnscaledProbability + CurrentValue; If (k == x) then UnscaledResult = UnscaledResult + CurrentValue; If (cumulative && k < x) then UnscaledResult = UnscaledResult + CurrentValue; If (CurrentValue <= EssentiallyZero) then Done = TRUE; PreviousValue = CurrentValue; k = k-1; } end While;

Step 4: Combine the unscaled results:

Return UnscaledResult/TotalUnscaledProbability;

Although this method is used only for  >= 1030, you can use the following additions to the Excel worksheet to help you hand-execute this algorithm to calculate BINOMDIST(3, 10, 0.3, TRUE) (in the baseball example, the chance of 3 or fewer hits in 10 trials for a .300 batter).

To illustrate this, copy the following table, select cell D4 in the Excel worksheet that you created earlier, and then paste the entries so that the following table fills cells D1:E15 in your worksheet.

Column D contains the unscaled probabilities. The 1 in cell D6 is the result of Step 1 of the algorithm. Excel 2003 and later versions of Excel calculate the entries in cells D7, D8, ..., D14 (in that order) in Step 2. Excel calculates the entries in cells D5 and D4 (in that order) in Step 3. The sum of all unscaled probabilities appears in D15.

To calculate the probability of 3 or fewer successes, type the following formula in any blank cell:

= SUM(D4:D7)/D15

In the previous example, EssentiallyZero does not stop Steps 2 or 3. However, if you want to evaluate BINOMDIST(550, 2000, 0.3, TRUE), EssentiallyZero may stop Step 2 or Step 3. A binomial random variable with  = 2000 and   = 0.3 has a distribution that is approximated by the normal with mean 600 and standard deviation SQRT(2000*0.3*(1 – 0.3)) = SQRT(420) = 20.5. Then 805 is 10 standard deviations higher than the mean and 395 is 10 standard deviations lower than the mean. Depending on your setting of EssentiallyZero, EssentiallyZero may stop Step 2 before you reach 805 and may stop Step 3 before you reach 395.

Conclusions
Inaccuracies in versions of Excel that are earlier than Excel 2003 occur only when the number of trials is greater than or equal to 1030. In such cases, BINOMDIST returns #NUM! in earlier versions of Excel because one term overflows in a sequence of terms that are multiplied together. To correct this behavior, Excel 2003 and later versions of Excel use the alternative procedure that is mentioned earlier in this article when such an overflow would otherwise occur.

The CRITBINOM, HYPGEOMDIST, NEGBINOMDIST, and POISSON function exhibit similar behavior in earlier versions of Excel. These functions also return either correct numeric results or #NUM! or #DIV/0!. Again, problems occur because of overflow (or underflow).

It is easy to determine when and how these problems occur. Excel 2003 and later versions of Excel use an alternative algorithm that is similar to the one for BINOMDIST to return correct answers in cases where earlier versions of Excel return #NUM!.

Keywords: kbexpertisebeginner kbinfo KB827459

-

[mailto:TECHNET@MICROSOFT.COM Send feedback to Microsoft]

© Microsoft Corporation. All rights reserved.