# Central Limit Theorem The central limit theorem is one of the most remarkable results of the theory of probability. In its simplest form, the theorem states that the sum of a large number of independent observations from the same distribution has, under certain general conditions, an approximate normal distribution. Moreover, the approximation steadily improves as the number of observations increases. The theorem is considered the heart of probability theory, although a better name would be normal convergence theorem.

For example, suppose an ordinary coin is tossed 100 times and the number of heads is counted. This is equivalent to scoring 1 for a head and 0 for a tail and computing the total score. Thus, the total number of heads is the sum of 100 independent, identically distributed random variables. By the central limit theorem, the distribution of the total number of heads will be, to a very high degree of approximation, normal. This illustrated graphically by repeating this experiment many times. The results of this experiment are displayed in a diagram. The percentage computed over the number of experiments is arranged along the vertical axis, and the total score or the number of heads is arranged along the horizontal axis. After a large number of repetitions a curve appears that looks like the normal curve.
It has been empirically observed that various natural phenomena, such as the heights of individuals, follow approximately a normal distribution. A suggested explanation is that these phenomena are sums of a large number of independent random effects and hence are approximately normally distributed by the central limit theorem.

## Normal Limit of the Binomial Distribution

In the movie we show the improvement of the approximation for a binomial distribution with probability of 1/4. The binomial distribution is displayed with red bars, and the normal distribution is displayed by a blue line. We increase the number of observations from 10 to 50, and we see that the distributions wander because the mean is equal to np, and gets larger when we have more observations. The distributions also become flatter because the variance equals np(1 - p) and is a function of the number of observations

 back to index