Microsoft KB Archive/828117

= Excel statistical functions: CRITBINOM =

Article ID: 828117

Article Last Modified on 1/18/2007

-

APPLIES TO


 * Microsoft Office Excel 2007
 * Microsoft Office Excel 2003
 * Microsoft Excel 2004 for Mac

-



SUMMARY
This article describes the CRITBINOM function in Microsoft Office Excel 2003 and in later versions of Excel, illustrates how the function is used, and compares the results of the function in Excel 2003 and in later versions of Excel with the results of CRITBINOM in earlier versions of Excel.

Microsoft Excel 2004 for Macintosh information
The statistical functions in Excel 2004 for Mac were updated by using the same algorithms that were used to update the statistical functions in Excel 2003 and in later versions of Excel. Any information in this article that describes how a function works or how a function was modified for Excel 2003 and for later versions of Excel also applies to Excel 2004 for Mac.



MORE INFORMATION
The CRITBINOM(n, p, alpha) function examines for various x the cumulative probability of x successes in n independent Bernoulli trials. Each trial has the associated probability p of success. CRITBINOM returns the smallest value of x for which this cumulative probability is greater than or equal to alpha. This cumulative probability is greater than or equal to alpha.

Syntax
CRITBINOM(n, p, alpha)

Parameter n is a positive integer, 0 < p < 1, and 0 <= alpha <= 1 (although the cases where alpha = 0 and alpha = 1 are of limited interest).

Example of usage
In baseball, you can make the simplifying supposition that a &quot;.300 hitter&quot; receives a hit (success) with probability 0.300 each time he comes to bat (each trial), and that successive times at bat are independent Bernoulli trials. In 100 trials, 30 successes is the batter's expected number of hits.

You can use CRITBINOM to answer the following questions:
 * The smallest number x of successes where the batter has at least a 0.25 chance of getting x or fewer successes (x would be less than 30)
 * (Small variation) The smallest number of successes x' where the batter has at least a 0.25 chance of not getting at least x' successes (x' would also be less than 30)
 * The smallest number y of successes so that the chance of getting y or more successes is less than 0.10 (y would be greater than 30)

To illustrate the CRITBINOM function, create a blank Excel worksheet, and copy the following table. Then, select cell A1 in your blank Excel worksheet and paste the entries so that the table fills cells A1:B21 in your worksheet.

After you paste this table into your new Excel worksheet, click the Paste Options button, and then click Match Destination Formatting. With the pasted range still selected, use one of the following procedures, as appropriate for the version of Excel that you are running:
 * In Microsoft Office Excel 2007, click the Home tab, click Format in the Cells group, and then click AutoFit Column Width.
 * In Excel 2003, point to Column on the Format menu, and then click AutoFit Selection.

Cells B4:B16 show values of CRITBINOM for respective alpha values in A4:A16. For question 1, CRITBINOM(100, 0.3, 0.25) in cell B7 is 27 so the probability of getting 27 or fewer successes is greater than or equal to 0.25 whereas the probability of getting 26 or fewer successes is less than 0.25. So, 27 is the smallest number of successes x such that the batter has at least a 0.25 chance of getting x or fewer successes.

Question 2 is just a small variation on question 1. Because there is at least a 0.25 chance of 27 or fewer successes, there is at most a 0.75 chance of getting 28 or more successes, and therefore at least a 0.25 chance of not getting 28 or more successes.

For question 3, CRITBINOM(100, 0.3, 0.9) is 36. So there is at least a 0.9 chance of getting 36 or fewer successes. Therefore, there is at most a 0.1 chance of getting 37 or more successes. Also, 37 is the smallest number that satisfies this condition.

Questions 2 and 3 illustrate the fact that many questions about the cumulative binomial distribution require finding a CRITBINOM value. CRITBINOM makes a statement like &quot;the smallest x that gives a cumulative probability associated with x that is greater than or equal to alpha is x = 27.&quot; This statement might then be used indirectly to answer the question. For example, if P(number of successes <= 27) >= alpha, then P(number of successes >= 28) < 1 – alpha.

Cells A18:B21 illustrate performance of CRITBINOM. Correct values were found by using Knusel's software, ELV. Knusel used this software as the basis for computations in his paper that is cited below. If you have a version of Excel that is earlier than Excel 2003, the entries in A20 and A21 will be #NUM!. If you have Excel 2003 or a later version of Excel, the entries in A19:A21 and B19:B21 will agree.

Results in earlier versions of Excel
CRITBINOM depends on BINOMDIST. Knusel (see note 1) documented instances where BINOMDIST does not return a numeric answer and yields #NUM! instead because of a numeric overflow. When numeric answers are returned by BINOMDIST, they are correct. #NUM! is returned only when the number of trials is greater than or equal to 1030. There are no computational problems when n < 1030. The following article about BINOMDIST discusses this more thoroughly.

Note 1: Knusel, L. On the Accuracy of Statistical Distributions in Microsoft Excel 97, Computational Statistics and Data Analysis, 26, 375-377, 1998.

For more information about BINOMDIST, click the following article number to view the article in the Microsoft Knowledge Base:

827459 Excel statistical functions: BINOMDIST

Knusel did not report on CRITBINOM in his paper. However, as expected, CRITBINOM gives the same problems, but again only when n >= 1030.

Results in Excel 2003 and in later versions of Excel
The approach to improvements in Excel 2003 and in later versions of Excel is exactly the same as with BINOMDIST: use existing pre-Excel 2003 code if n < 1030 and switch to an alternative plan if n >= 1030. The remainder of the discussion in this section deals with only the case where n >= 1030. The alternative plan is built in the same way as for BINOMDIST: find the modal value m (m = approximately n*p), assign an unscaled probability of 1 to m, find unscaled probabilities of m+1, m+2, m+3, ... stopping when such probabilities become infinitesimal, find unscaled probabilities of m-1, m-2, m-3, ... stopping when such probabilities become infinitesimal. Finally, scale the appropriate probabilities.

CRITBINOM is a kind of inverse function for a discrete distribution that is similar to NORMSINV for the continuous standard normal distribution. NORMSINV is computed through a search process that frequently calls NORMSDIST as it homes in on the result. Because BINOMDIST is computationally expensive, you want to avoid a similar process for CRITBINOM that would repeatedly call BINOMDIST. The procedure establishes an initial guess, executes the code below (similar to a single call to BINOMDIST), and then adjusts the guess to arrive at a final answer.

First, find a guess by using a normal approximation to the binomial distribution. You can assume a normal distribution with the same mean and standard deviation as the binomial, namely, a mean n*p and a standard deviation SQRT(n*p*(1-p)). This approximation should be reasonably accurate as long as n*p*(1-p) > 30. This will be the case with n >= 1030 unless p is very close to 0 or very close to 1. Because you require only an approximate value, use a quick approximation to NORMSINV instead of calling NORMSINV itself. The approximation comes from 26.2.23 in Abramowitz, M. and I. A. Stegun, Handbook of Mathematical Functions, Dover, 1972, p. 933.

If alpha <= 0.5, define t = SQRT(LN(1/(alpha^2))) then NORMSINV(alpha) is approximated by NApprox = -t + (2.515517 + 0.802853*t + 0.010328*t*t)/(1 + 1.432788*t + 0.189269*t*t + 0.001308*t*t*t)

If alpha > 0.5, define t = SQRT(LN(1/((1 – alpha)^2))) then NORMSINV(alpha) is approximated by NApprox = t - (2.515517 + 0.802853*t + 0.010328*t*t)/(1 + 1.432788*t + 0.189269*t*t + 0.001308*t*t*t)

The initial guessed value of CRITBINOM is as follows.

Guess = FLOOR(n*p + NApprox*SQRT(n*p*(1-p)))

You have to make sure that this is not outside the range [0,n]. Therefore, run the following code:

If (Guess < 0) then Guess = 0; If (Guess > n) then Guess = n;

The pseudocode that follows is adapted from the pseudocode for BINOMDIST (in the article on BINOMDIST) to find both the cumulative and non-cumulative probabilities for Guess. In other words, when this pseudocode terminates, you will have computed BINOMDIST(Guess, n, p, TRUE) and BINOMDIST(Guess, n, p, FALSE).

Step 0: (Initialization). Initialize TotalUnscaledProbability, UnscaledPGuess (for non-cumulative probability), and UnscaledCumPGuess (for cumulative probability) all to 0. Initialize the constant EssentiallyZero to a very small number, such as 10^(-12).

Step 1: find n*p and round down to the nearest whole number, m. The most likely number of successes in n trials is either m or m+1.

TotalUnscaledProbability = TotalUnscaledProbability + 1; If (m == Guess) then UnscaledPGuess = UnscaledPGuess + 1; If (m <= Guess) then UnscaledCumPGuess = UnscaledCumPGuess + 1;

Step 2: (Calculate unscaled probabilities for k > m):

PreviousValue = 1; Done = FALSE; k = m + 1; While (not Done && k <= n) { CurrentValue = PreviousValue * (n – k + 1) * p / (k * (1 – p)); TotalUnscaledProbability = TotalUnscaledProbability + CurrentValue; If (k == Guess) then UnscaledPGuess = UnscaledPGuess + CurrentValue; If (k <= Guess) then UnscaledCumPGuess = UnscaledCumPGuess + CurrentValue; If (CurrentValue <= EssentiallyZero) then Done = TRUE; PreviousValue = CurrentValue; k = k+1; } end While;

Step 3: (Calculate unscaled probabilities for k < m):

PreviousValue = 1; Done = FALSE; k = m - 1; While (not Done && k >= 0) {   CurrentValue = PreviousValue * k+1 * (1-p) / ((n – k) * p); TotalUnscaledProbability = TotalUnscaledProbability + CurrentValue; If (k == Guess) then UnscaledPGuess = UnscaledPGuess + CurrentValue; If (k <= Guess) then UnscaledCumPGuess = UnscaledCumPGuess + CurrentValue; If (CurrentValue <= EssentiallyZero) then Done = TRUE; PreviousValue = CurrentValue; k = k-1; } end While;

Step 4: (Combine unscaled results):

PGuess = UnscaledPGuess/TotalUnscaledProbability; CumPGuess = UnscaledCumPGuess/TotalUnscaledProbability;

Step 5: Adjust the result. At this point you have values for Guess, PGuess, and CumPGuess. Let CumPGuessMinus1 be the cumulative probability of Guess – 1. Then Guess is the correct answer if CumPGuessMinus1 < alpha but CumPGuess >= alpha. If both of these are < alpha, increase Guess by 1. If both of these are >= alpha, decrease Guess by 1. This is a simple process. First

CumPGuessMinus1 = CumPGuess – PGuess;

If you have to increase Guess then

PGuessPlus1 = PGuess * (n – Guess) * p / Guess / (1 – p); CumPGuessMinus1 = CumPGuess; CumPGuess = CumPGuess + PGuessPlus1; PGuess = PGuessPlus1; Guess = Guess + 1;

If you have to decrease Guess then

PGuessMinus1 = PGuess * Guess * (1 – p) / (n – Guess + 1) / p; CumPGuess = CumPGuessMinus1; CumPGuessMinus1 = CumPGuessMinus1 – PGuess; PGuess = PGuessMinus1; Guess = Guess – 1;

You might have to increase Guess more than one time or you might have to decrease Guess more than one time, but you would never have to both increase Guess at least one time and decrease Guess at least one time.

CRITBINOM has been thoroughly tested for accuracy. However, only casual anecdotal testing has been done to investigate how close the initial Guess is to the correct answer and how many times the Guess has to be increased or decreased. The normal approximation generally provides an excellent value of Guess; in our limited casual tests, we never had to increase or decrease the initial Guess by more than 2.

Conclusions
Inaccuracies in earlier versions of Excel occur only when the number of trials is greater than or equal to 1030. In such cases, CRITBINOM returns #NUM! in earlier versions of Excel. This issue occurs because one term in a sequence of terms to be multiplied together when you evaluate BINOMDIST overflows. This issue has been corrected in Excel 2003 and in later versions of Excel by implementing an alternative procedure when such an overflow would otherwise occur. This alternative procedure was described earlier in this article.

Five functions that exhibit similar behavior in earlier versions of Excel are BINOMDIST, CRITBINOM, HYPGEOMDIST, NEGBINOMDIST, and POISSON. These functions always return correct numeric results or #NUM! or #DIV/0!. Again, problems occur because of overflow (or underflow).

The conditions under which these problems will occur are easily identified, and an alternative plan algorithm that is in the same spirit as the algorithm for BINOMDIST (or CRITBINOM) is implemented in Excel 2003 and in later versions of Excel. This algorithm is implemented to return correct answers in cases where earlier versions of Excel return #NUM! or #DIV/0!.

Keywords: kbformula kbexpertisebeginner kbinfo KB828117

-

[mailto:TECHNET@MICROSOFT.COM Send feedback to Microsoft]

© Microsoft Corporation. All rights reserved.