Center for Freelance Writing: Binomial vs. Poisson vs. Normal Distribution

Consider this scenario: Given 100 bottles, each of which independently has 10% chance of being defected, what's the chance that up to 0 will be defected? Up to 1 will be defected? Up to 2? 3? ... Change to a general case, given (n) samples, each of which independently has probability (p) of returning true, what's the probability that up to (k) of the samples will return true? The binomial distribution gives an exact answer, while Poisson and normal distribution help to approximate an answer, with varying accuracy depending on the scenario.

Write MATLAB code as follows. In this case, n = 100, p = 0.05, and k = 0 to 10, but all of those values can be easily changed manually. The first column of the vector were manually-chosen (k) values for which the cumulative probability values want to be calculated:

result = 0;
table = zeros(11,4);
table(:,1) = [0;1;2;3;4;5;6;7;8;9;10];
n = 100;
p = 0.05;

for i=1:11
    table(i,2) = binocdf(i,n,p);
    table(i,3) = poisscdf(i,n*p);
    table(i,4) = normcdf((i+0.5-n*p)/(n*p*(1-p))^0.5);
end
table

Now use and tweak the program (value of n, p, and table(:,1)) above to run through two cases. In case 1, n is big while p is small. In case 2, p is relatively large. The combination of MATLAB results and Excel calculations were used to produce this table:

Case 1: n = 200, p = 0.02, λ = 4

k	Binomial	Poisson	Normal	Pois Error	Norm Error
0	0.0176	0.0183	0.0385	3.98%	118.75%
1	0.0894	0.0916	0.1034	2.46%	15.66%
2	0.2351	0.2381	0.2243	1.28%	-4.59%
3	0.4315	0.4335	0.4003	0.46%	-7.23%
4	0.6288	0.6288	0.5997	0.00%	-4.63%
5	0.7867	0.7851	0.7757	-0.20%	-1.40%
6	0.8914	0.8893	0.8966	-0.24%	0.58%
7	0.9507	0.9489	0.9615	-0.19%	1.14%
8	0.9798	0.9786	0.9885	-0.12%	0.89%
9	0.9925	0.9919	0.9973	-0.06%	0.48%
10	0.9975	0.9972	0.9995	-0.03%	0.20%

Case 2: n = 100, p = 0.4, λ = 40

k	Binomial	Poisson	Normal	Pois Error	Norm Error
15	0	0	0	#DIV/0!	#DIV/0!
20	0	0.0004	0	#DIV/0!	#DIV/0!
25	0.0012	0.0076	0.0015	533.33%	25.00%
30	0.0248	0.0617	0.0262	148.79%	5.65%
35	0.1795	0.2424	0.1792	35.04%	-0.17%
40	0.5433	0.5419	0.5406	-0.26%	-0.50%
45	0.8689	0.8097	0.8692	-6.81%	0.03%
50	0.9832	0.9474	0.984	-3.64%	0.08%
55	0.9991	0.9903	0.9992	-0.88%	0.01%
60	1	0.9988	1	-0.12%	0.00%
65	1	0.9999	1	-0.01%	0.00%

These data show that Poisson distribution is a better approximation when p is small, while normal distribution is a better approximation when p is large. As the numbers here were copied from MATLAB onto Excel, rounding errors have be distorted the percentage error calculations a bit.

Center for Freelance Writing

Monday, February 27, 2012

Binomial vs. Poisson vs. Normal Distribution

Blog Archive

About Me