Hassett, chapter 5: Commonly-used discrete distributions.

5.1 The binomial distribution (113)
5.2 The hypergeometric distribution (122)
- Expectation and variance
- Computational stuff
5.3 The Poisson distribution (126)
5.4 The geometric distribution (132)
5.5 The negative binomial distribution (136)
5.6 The discrete uniform distribution (141)
Problems

5.1 The binomial distribution (113)

A binomial random variable $X$ is the number of successes in an experiment where 1. there are $n$ in dependent trials, 2. each of which has the same probability $P(\text{success})=p$ and $P(\text{failure})=q=1-p$, and 3. each trial is independent of the others.

Coin flip, guess on an exam, etc.

The probability of $s$ successes should be $$ P(X=s) = {n\choose s}p^s q^{n-s} $$

This gives expected mean and variance $$ \begin{aligned} E(X) &= np \\ V(X) &= npq = np(1-p) \\ \end{aligned} $$

I'd like to briefly prove these, and to see what functions are available on my calculator and in scipy.stats.

Expectation value and variance

The expectation value seems intuitive, but writing out the definition makes it seem less obvious:'$$ \begin{aligned} E(X) = \sum k\ P(k) = \sum_{k=0}^{n} k {n\choose k} p^k q^{n-k} \end{aligned} $$

This seems like perhaps one of those differentiate-the-sum problems. To reduce clutter, and keeping in mind that $q=1-p$, let's find $$ \begin{aligned} \frac{\mathrm d}{\mathrm dp}\left( p^k q^{n-k} \right) &= k\,p^{k-1}q^{n-k} - (n-k) p^k q^{n-k-1} \\&= \left( k\left( \frac1q + \frac1p\right) - \frac nq \right) p^k q^{n-k} \\ pq \frac{\mathrm d}{\mathrm dp}\left( p^k q^{n-k} \right) &= \left( k\left( p + q \right) - np \right) p^k q^{n-k} = (k-np) p^k q^{n-k} \end{aligned} $$ Since all of the ${n\choose k}p^k q^{n-k} = P(k)$ add up to unity, whose derivative is zero, we can include the coefficients and the sum to get $$ \begin{aligned} 0 &= pq \frac{\mathrm d}{\mathrm dp} \left( \sum P(k) \right) \\ &= \sum k P(k) - np \sum P(k) \\ np &= E(X) \end{aligned} $$

To get the variance, let's take the derivative again: $$ \begin{aligned} pq \frac{\mathrm d}{\mathrm dp} \big( E(X) \big) &= \sum \left(k^2 - knp\right) P(k) \\ pq \cdot n &= E(X^2) - np \cdot E(X) \\ (np)^2 + pq\cdot n &= E(X^2) \end{aligned} $$ This gives variance $V(X) = E(X^2) - (E(X))^2 = npq,$, which is what we wanted to demonstrate.

Features of `np.random`

Here's one method to draw from the distribution

>>> np.random.binomial(100,1/6,10)
array([18, 18, 13, 19, 19, 16, 21, 25, 15, 10])

However, the documentation for np.random.binomial says

.. note::
    New code should use the `~numpy.random.Generator.binomial`
    method of a `~numpy.random.Generator` instance instead;
    please see the :ref:`random-quick-start`.

There the minimal working example seems to be

>>> from numpy.random import Generator, PCG64
>>> g = Generator(PCG64())
>>> g.binomial(100,1/6,10)
array([24, 16, 17, 15, 17, 19, 24, 16, 22, 14])

The PCG64 object is an np.random.BitGenerator, which seems to be a way to distinguish between real-number distributions versus the pure randomness that you need to undergird it.

Features of `scipy.stats.binom`

As an instance of the rv_discrete class, binom object inherits from it a collection of generic methods (see below for the full list), and completes them with details specific for this particular distribution.

Methods
-------
rvs(n, p, loc=0, size=1, random_state=None)
    Random variates.
pmf(k, n, p, loc=0)
    Probability mass function.
logpmf(k, n, p, loc=0)
    Log of the probability mass function.
cdf(k, n, p, loc=0)
    Cumulative distribution function.
logcdf(k, n, p, loc=0)
    Log of the cumulative distribution function.
sf(k, n, p, loc=0)
    Survival function  (also defined as ``1 - cdf``, but `sf` is sometimes more accurate).
logsf(k, n, p, loc=0)
    Log of the survival function.
ppf(q, n, p, loc=0)
    Percent point function (inverse of ``cdf`` --- percentiles).
isf(q, n, p, loc=0)
    Inverse survival function (inverse of ``sf``).
stats(n, p, loc=0, moments='mv')
    Mean('m'), variance('v'), skew('s'), and/or kurtosis('k').
entropy(n, p, loc=0)
    (Differential) entropy of the RV.
expect(func, args=(n, p), loc=0, lb=None, ub=None, conditional=False)
    Expected value of a function (of one argument) with respect to the distribution.
median(n, p, loc=0)
    Median of the distribution.
mean(n, p, loc=0)
    Mean of the distribution.
var(n, p, loc=0)
    Variance of the distribution.
std(n, p, loc=0)
    Standard deviation of the distribution.
interval(confidence, n, p, loc=0)
    Confidence interval with equal areas around the median.

I think I know what all of these do. Testing them would be a fun little project for a graduate student with infinite time.

5.2 The hypergeometric distribution (122)

This is for choices without replacement. Example: choose five people randomly from twenty men plus thirty women. What is the probability you get two men and three women? It's combinatorics: $$ \begin{aligned} P(\text{two mean, three women}) &= \frac{{20\choose 2}{30\choose 3}}{50\choose 5} \end{aligned} $$

In general, if we are * choosing $n$ from $N$ items, and * we want $k$ items from a subgroup with population $r$, we have probability $$ \begin{aligned} P(k) &= \frac{ {r\choose k}{N-r \choose n-k} }{ N \choose n }. \end{aligned} $$

There's a footnote about relaxing the restriction that $r \geq n$: that is, we're choosing fewer items than are in the population of interest. In fact this relation holds only for $$ \begin{aligned} \max(0, r-(N-n)) \leq k \leq \min(r,n) \end{aligned} $$

The lower bound is from the pigeonhole principle; the upper bound prevents us from choosing more than all of our subset. If we have twenty men and thirty women, and we choose forty people, then at least ten of them must be men, but no more than twenty of them can be men. The binomial coefficients do strange things in these limits, anyway.

Expectation and variance

The expectation value is pretty sensible $$ \begin{aligned} E(X) &= n\cdot \frac rN \end{aligned} $$ The variance is kind of a big mess, though: $$ \begin{aligned} V(X) &= n \cdot \frac rN \cdot \left(1-\frac rN\right) \cdot \frac{N-n}{N-1} \end{aligned} $$

What's the right way to think of this? In the limit where you're taking a small sample of a large population, $n\ll N$, the final fraction looks like $\frac{1-n/N}{1-1/N} \to 1$, and we recover the result $V(X) = np(1-p)$ for independent samples. The factor $\frac{N-n}{N-1}$ is apparently called the "finite population correction factor," and a secondhand reference suggests neglecting it for $\frac nN\lesssim \frac1{10}$.

To actually compute these moments of the distribution, we'd need to do $$ \begin{aligned} E(X) = \sum k P(k) &= \frac{1}{N\choose n} \sum k{r\choose k}{N-r\choose n-k} \end{aligned} $$

The method probably involves some dumb identities with the binomial coefficients. The one that lets you recursively build Pascal's triangle is $$ \begin{aligned} {n\choose k} &= {n-1\choose k-1} + {n-1\choose k}, \end{aligned} $$ which is a special case of an identity named for Vandermonde, $$ \begin{aligned} {m+n\choose r} &= \sum_{k=0}^r {m\choose k}{n \choose r-k} \end{aligned} $$

Computational stuff

From scipy.stats.hypergeometric:

The hypergeometric distribution models drawing objects from a bin. M is the total number of objects, n is total number of Type I objects. The random variate represents the number of Type I objects in N drawn without replacement from the total population.

As an instance of the rv_discrete class, hypergeom object inherits from it a collection of generic methods (see below for the full list), and completes them with details specific for this particular distribution.

Methods
-------
rvs(M, n, N, loc=0, size=1, random_state=None)
    Random variates.
pmf(k, M, n, N, loc=0)
    Probability mass function.

Let's watch the probabilities of the hypergeometric distribution converge to the binomial distribution as we make the population larger.

import scipy.stats as stats
import matplotlib.pyplot as plt
import numpy as np

n=5
plot.plot(stats.binom.pmf(np.arange(n+1), n, 3/5), ...)
M = 0;
M += 5; plt.plot(stats.hypergeom.pmf(np.arange(n+1), M, n, M*3//5), ...)

plot of binomial and hypergeometric distributions

5.3 The Poisson distribution (126)

Poisson is also used to model numbers of events, but these events are independent of one another. For example, "this intersection has an average $\lambda=2$ accidents per month."

The distribution is $$ \begin{aligned} P(X=k) &= \frac{e^{-\lambda} \lambda^k}{k!} \\ E(X) &= \lambda \\ V(X) &= \lambda \end{aligned} $$

I feel like Poisson should arise in some limit of binomial. Let's match the means $\lambda = np$, and consider the case where there are zillions of trials and only a few successes, $k\ll n$. Then the binomial probability

$$ P_\text{binomial}(k) = {n\choose k} p^k q^{n-k} $$

has three terms we want to approximate. The coefficient goes like

$$ {n\choose k} \approx \frac{n^k}{k!}. $$

For example, ${100\choose 3} = \frac{100\times 99\times 98}{3!} \approx \frac{100^3}{3!}$. Looking at $q=(1-p)$, we have

$$ \begin{aligned} (1-p)^{n-k} &\approx (1-p)^n \\ &= 1 - np + {n\choose 2}p^2 - {n\choose 3}{p^3} + \cdots \\ &\approx 1 - \frac{np}{1!} + \frac{(np)^2}{2!} + \cdots \\ &= e^{-np} \end{aligned} $$

So the relation overall is $$ \begin{aligned} P(k) &= {n\choose k} \cdot p^k \cdot q^{n-k} \\ &\approx \frac{n^k}{k!} \cdot p^k \cdot e^{-np} \\ &= \frac{(np)^k e^{-np}}{k!} = P_\text{poisson}(k) \end{aligned} $$

Oh, look! That's the next section in the book:

If $n$ is large and $p = \lambda/n$ is small, then $P(X=k)$ can be calculated approximately using either the Poisson or the binomial distributions:

$$ \frac{e^{-\lambda}\lambda^k}{k!} \approx {n\choose k} \left(\frac{\lambda}{n}\right)^k \left(1-\frac{\lambda}{n}\right)^{n-k}. $$

5.4 The geometric distribution (132)

The geometric distribution is for waiting-time problems, like "how many dice rolls until a six"? There are (at least) two definitions, depending on whether you count the success or not — that is, whether a six on the first die roll counts as $k=0$ or as $k=1$. With $X$ meaning "failures before success" and $Y = X+1$ meaning "attempts including success," we have $$ \begin{aligned} \\ P(X=k) &= q^k p \qquad & P(Y=k) &= q^{k-1} p \\ E(X) &= \frac{q}{p} & E(Y) &= 1 + \frac qp = \frac 1p \\ V(X) &= \frac{q}{p^2} & V(Y) &= V(X+1) = V(X) = \frac{q}{p^2} \end{aligned} $$

5.5 The negative binomial distribution (136)

This extends the geometric series. Suppose I want to know how many dice rolls before the third time I roll a six? One such sequence is $\set{1,4,3,5,6,3,4,5,2,6,3,4,6}$. goes back to combinatorics, which is why the binomial expansion gets involved. The probability that I get my third six on the twelfth roll is the probability that the twelfth roll itself is a six, plus the fraction of eleven-roll sequences which contain exactly two sixes.

A series of independent trials has probability $P(S)=p$ on each trial. Let $X$ be the number of failures before the $r$-th success.

$$ \begin{aligned} P(k) = {r+k-1 \choose r-1} q^k p^r \end{aligned} $$

For $r=1$, we get the geometric back: $$ \begin{aligned} P(k; r=1) &= {1+k-1\choose r-1} q^k p = {k\choose 0} q^k p \end{aligned} $$

Delightfully, the mean and variance are $$ \begin{aligned} E(X) &= \frac{rq}{p} & \qquad V(X) &= \frac{rq}{p^2} \end{aligned} $$

5.6 The discrete uniform distribution (141)

This one's simple: $$ \begin{aligned} P(k) &= \frac 1n, &&k\in\set{1\cdots n} \\ E(X) &= \frac{n+1}{2} \\ V(X) &= \frac{n^2-1}{12} \end{aligned} $$

Problems

The problems are split up as follows:

1–10 Binomial
11–15 Hypergeometric
16–22 Poisson
23–26 Geometric
27–31 Negative Binomial
33–35 Discrete Uniform
36–41 Sample actuarial problems