Hassett, chapter 5: Commonly-used discrete distributions.
5.1 The binomial distribution (113)
A binomial random variable $X$ is the number of successes in an experiment where 1. there are $n$ in dependent trials, 2. each of which has the same probability $P(\text{success})=p$ and $P(\text{failure})=q=1-p$, and 3. each trial is independent of the others.
Coin flip, guess on an exam, etc.
The probability of $s$ successes should be $$ P(X=s) = {n\choose s}p^s q^{n-s} $$
This gives expected mean and variance $$ \begin{aligned} E(X) &= np \\ V(X) &= npq = np(1-p) \\ \end{aligned} $$
I'd like to briefly prove these, and to see what functions are
available on my calculator and in scipy.stats
.
Expectation value and variance
The expectation value seems intuitive, but writing out the definition makes it seem less obvious:'$$ \begin{aligned} E(X) = \sum k\ P(k) = \sum_{k=0}^{n} k {n\choose k} p^k q^{n-k} \end{aligned} $$
This seems like perhaps one of those differentiate-the-sum problems. To reduce clutter, and keeping in mind that $q=1-p$, let's find $$ \begin{aligned} \frac{\mathrm d}{\mathrm dp}\left( p^k q^{n-k} \right) &= k\,p^{k-1}q^{n-k} - (n-k) p^k q^{n-k-1} \\&= \left( k\left( \frac1q + \frac1p\right) - \frac nq \right) p^k q^{n-k} \\ pq \frac{\mathrm d}{\mathrm dp}\left( p^k q^{n-k} \right) &= \left( k\left( p + q \right) - np \right) p^k q^{n-k} = (k-np) p^k q^{n-k} \end{aligned} $$ Since all of the ${n\choose k}p^k q^{n-k} = P(k)$ add up to unity, whose derivative is zero, we can include the coefficients and the sum to get $$ \begin{aligned} 0 &= pq \frac{\mathrm d}{\mathrm dp} \left( \sum P(k) \right) \\ &= \sum k P(k) - np \sum P(k) \\ np &= E(X) \end{aligned} $$
To get the variance, let's take the derivative again: $$ \begin{aligned} pq \frac{\mathrm d}{\mathrm dp} \big( E(X) \big) &= \sum \left(k^2 - knp\right) P(k) \\ pq \cdot n &= E(X^2) - np \cdot E(X) \\ (np)^2 + pq\cdot n &= E(X^2) \end{aligned} $$ This gives variance $V(X) = E(X^2) - (E(X))^2 = npq,$, which is what we wanted to demonstrate.
Features of np.random
Here's one method to draw from the distribution
>>> np.random.binomial(100,1/6,10) array([18, 18, 13, 19, 19, 16, 21, 25, 15, 10])
However, the documentation for np.random.binomial
says
.. note:: New code should use the `~numpy.random.Generator.binomial` method of a `~numpy.random.Generator` instance instead; please see the :ref:`random-quick-start`.
There the minimal working example seems to be
>>> from numpy.random import Generator, PCG64 >>> g = Generator(PCG64()) >>> g.binomial(100,1/6,10) array([24, 16, 17, 15, 17, 19, 24, 16, 22, 14])
The PCG64
object is an np.random.BitGenerator
, which seems to be a
way to distinguish between real-number distributions versus the pure
randomness that you need to undergird it.
Features of scipy.stats.binom
As an instance of the
rv_discrete
class,binom
object inherits from it a collection of generic methods (see below for the full list), and completes them with details specific for this particular distribution.
Methods ------- rvs(n, p, loc=0, size=1, random_state=None) Random variates. pmf(k, n, p, loc=0) Probability mass function. logpmf(k, n, p, loc=0) Log of the probability mass function. cdf(k, n, p, loc=0) Cumulative distribution function. logcdf(k, n, p, loc=0) Log of the cumulative distribution function. sf(k, n, p, loc=0) Survival function (also defined as ``1 - cdf``, but `sf` is sometimes more accurate). logsf(k, n, p, loc=0) Log of the survival function. ppf(q, n, p, loc=0) Percent point function (inverse of ``cdf`` --- percentiles). isf(q, n, p, loc=0) Inverse survival function (inverse of ``sf``). stats(n, p, loc=0, moments='mv') Mean('m'), variance('v'), skew('s'), and/or kurtosis('k'). entropy(n, p, loc=0) (Differential) entropy of the RV. expect(func, args=(n, p), loc=0, lb=None, ub=None, conditional=False) Expected value of a function (of one argument) with respect to the distribution. median(n, p, loc=0) Median of the distribution. mean(n, p, loc=0) Mean of the distribution. var(n, p, loc=0) Variance of the distribution. std(n, p, loc=0) Standard deviation of the distribution. interval(confidence, n, p, loc=0) Confidence interval with equal areas around the median.
I think I know what all of these do. Testing them would be a fun little project for a graduate student with infinite time.
5.2 The hypergeometric distribution (122)
This is for choices without replacement. Example: choose five people randomly from twenty men plus thirty women. What is the probability you get two men and three women? It's combinatorics: $$ \begin{aligned} P(\text{two mean, three women}) &= \frac{{20\choose 2}{30\choose 3}}{50\choose 5} \end{aligned} $$
In general, if we are * choosing $n$ from $N$ items, and * we want $k$ items from a subgroup with population $r$, we have probability $$ \begin{aligned} P(k) &= \frac{ {r\choose k}{N-r \choose n-k} }{ N \choose n }. \end{aligned} $$
There's a footnote about relaxing the restriction that $r \geq n$: that is, we're choosing fewer items than are in the population of interest. In fact this relation holds only for $$ \begin{aligned} \max(0, r-(N-n)) \leq k \leq \min(r,n) \end{aligned} $$
The lower bound is from the pigeonhole principle; the upper bound prevents us from choosing more than all of our subset. If we have twenty men and thirty women, and we choose forty people, then at least ten of them must be men, but no more than twenty of them can be men. The binomial coefficients do strange things in these limits, anyway.
Expectation and variance
The expectation value is pretty sensible $$ \begin{aligned} E(X) &= n\cdot \frac rN \end{aligned} $$ The variance is kind of a big mess, though: $$ \begin{aligned} V(X) &= n \cdot \frac rN \cdot \left(1-\frac rN\right) \cdot \frac{N-n}{N-1} \end{aligned} $$
What's the right way to think of this? In the limit where you're taking a small sample of a large population, $n\ll N$, the final fraction looks like $\frac{1-n/N}{1-1/N} \to 1$, and we recover the result $V(X) = np(1-p)$ for independent samples. The factor $\frac{N-n}{N-1}$ is apparently called the "finite population correction factor," and a secondhand reference suggests neglecting it for $\frac nN\lesssim \frac1{10}$.
To actually compute these moments of the distribution, we'd need to do $$ \begin{aligned} E(X) = \sum k P(k) &= \frac{1}{N\choose n} \sum k{r\choose k}{N-r\choose n-k} \end{aligned} $$
The method probably involves some dumb identities with the binomial coefficients. The one that lets you recursively build Pascal's triangle is $$ \begin{aligned} {n\choose k} &= {n-1\choose k-1} + {n-1\choose k}, \end{aligned} $$ which is a special case of an identity named for Vandermonde, $$ \begin{aligned} {m+n\choose r} &= \sum_{k=0}^r {m\choose k}{n \choose r-k} \end{aligned} $$
Computational stuff
From scipy.stats.hypergeometric
:
The hypergeometric distribution models drawing objects from a bin.
M
is the total number of objects,n
is total number of Type I objects. The random variate represents the number of Type I objects inN
drawn without replacement from the total population.As an instance of the
rv_discrete
class,hypergeom
object inherits from it a collection of generic methods (see below for the full list), and completes them with details specific for this particular distribution.
Methods ------- rvs(M, n, N, loc=0, size=1, random_state=None) Random variates. pmf(k, M, n, N, loc=0) Probability mass function.
Let's watch the probabilities of the hypergeometric distribution converge to the binomial distribution as we make the population larger.
import scipy.stats as stats import matplotlib.pyplot as plt import numpy as np n=5 plot.plot(stats.binom.pmf(np.arange(n+1), n, 3/5), ...) M = 0; M += 5; plt.plot(stats.hypergeom.pmf(np.arange(n+1), M, n, M*3//5), ...)
5.3 The Poisson distribution (126)
Poisson is also used to model numbers of events, but these events are independent of one another. For example, "this intersection has an average $\lambda=2$ accidents per month."
The distribution is $$ \begin{aligned} P(X=k) &= \frac{e^{-\lambda} \lambda^k}{k!} \\ E(X) &= \lambda \\ V(X) &= \lambda \end{aligned} $$
I feel like Poisson should arise in some limit of binomial. Let's match the means $\lambda = np$, and consider the case where there are zillions of trials and only a few successes, $k\ll n$. Then the binomial probability
$$ P_\text{binomial}(k) = {n\choose k} p^k q^{n-k} $$
has three terms we want to approximate. The coefficient goes like
$$ {n\choose k} \approx \frac{n^k}{k!}. $$
For example, ${100\choose 3} = \frac{100\times 99\times 98}{3!} \approx \frac{100^3}{3!}$. Looking at $q=(1-p)$, we have
$$ \begin{aligned} (1-p)^{n-k} &\approx (1-p)^n \\ &= 1 - np + {n\choose 2}p^2 - {n\choose 3}{p^3} + \cdots \\ &\approx 1 - \frac{np}{1!} + \frac{(np)^2}{2!} + \cdots \\ &= e^{-np} \end{aligned} $$
So the relation overall is $$ \begin{aligned} P(k) &= {n\choose k} \cdot p^k \cdot q^{n-k} \\ &\approx \frac{n^k}{k!} \cdot p^k \cdot e^{-np} \\ &= \frac{(np)^k e^{-np}}{k!} = P_\text{poisson}(k) \end{aligned} $$
Oh, look! That's the next section in the book:
If $n$ is large and $p = \lambda/n$ is small, then $P(X=k)$ can be calculated approximately using either the Poisson or the binomial distributions:
$$ \frac{e^{-\lambda}\lambda^k}{k!} \approx {n\choose k} \left(\frac{\lambda}{n}\right)^k \left(1-\frac{\lambda}{n}\right)^{n-k}. $$
5.4 The geometric distribution (132)
The geometric distribution is for waiting-time problems, like "how many dice rolls until a six"? There are (at least) two definitions, depending on whether you count the success or not — that is, whether a six on the first die roll counts as $k=0$ or as $k=1$. With $X$ meaning "failures before success" and $Y = X+1$ meaning "attempts including success," we have $$ \begin{aligned} \\ P(X=k) &= q^k p \qquad & P(Y=k) &= q^{k-1} p \\ E(X) &= \frac{q}{p} & E(Y) &= 1 + \frac qp = \frac 1p \\ V(X) &= \frac{q}{p^2} & V(Y) &= V(X+1) = V(X) = \frac{q}{p^2} \end{aligned} $$
5.5 The negative binomial distribution (136)
This extends the geometric series. Suppose I want to know how many dice rolls before the third time I roll a six? One such sequence is $\set{1,4,3,5,6,3,4,5,2,6,3,4,6}$. goes back to combinatorics, which is why the binomial expansion gets involved. The probability that I get my third six on the twelfth roll is the probability that the twelfth roll itself is a six, plus the fraction of eleven-roll sequences which contain exactly two sixes.
A series of independent trials has probability $P(S)=p$ on each trial. Let $X$ be the number of failures before the $r$-th success.
$$ \begin{aligned} P(k) = {r+k-1 \choose r-1} q^k p^r \end{aligned} $$
For $r=1$, we get the geometric back: $$ \begin{aligned} P(k; r=1) &= {1+k-1\choose r-1} q^k p = {k\choose 0} q^k p \end{aligned} $$
Delightfully, the mean and variance are $$ \begin{aligned} E(X) &= \frac{rq}{p} & \qquad V(X) &= \frac{rq}{p^2} \end{aligned} $$
5.6 The discrete uniform distribution (141)
This one's simple: $$ \begin{aligned} P(k) &= \frac 1n, &&k\in\set{1\cdots n} \\ E(X) &= \frac{n+1}{2} \\ V(X) &= \frac{n^2-1}{12} \end{aligned} $$
Problems
The problems are split up as follows:
- 1–10 Binomial
- 11–15 Hypergeometric
- 16–22 Poisson
- 23–26 Geometric
- 27–31 Negative Binomial
- 33–35 Discrete Uniform
- 36–41 Sample actuarial problems