Hassett, chapter 8: Commonly-used continuous distributions

$ %%%%%%%%%%%%%%%% Helpful KaTeX macros. \gdef\bar#1{\overline{ #1} } \gdef\and{\cap} \gdef\or{\cup} \gdef\set#1{\left\{ #1\right\} } \gdef\mean#1{\left\langle#1\right\rangle} $

excluding 8.6, 8.7, Pareto and Weibull

8.1 the Uniform distribution

Given $X$ uniform on the interval $[a,b]$, we have the density functions $$\begin{aligned} \text{p.d.f.} && f(x) &= \begin{cases} \frac{1}{b-a} & a \leq x \leq b \\ 0 & \text{otherwise} \end{cases} \\ \text{c.d.f.} && F(x) &= \begin{cases} 0 & \phantom{x<{}} x < a \\ \frac{x-a}{b-a} & a ≤ x < b \\ 1 & b ≤ x \end{cases} \\ \text{mean} && E(X) &= \frac{a+b}{2} \\ \text{variance} && V(X) &= \frac{(b-a)^2}{12} \end{aligned}$$

8.2 the Exponential distribution

Integration by parts tells us that, for integer $n$ and real $a>0$, $$\begin{aligned} \int_0^\infty x^n e^{-ax}\mathrm dx &= \frac{n!}{a^{n+1}}. \end{aligned}$$

For noninteger $n$, the factorial generalizes to $n! \to \Gamma(n-1)$, but apparently you can't go past the pole at $\Gamma(0)$.

The exponential distribution gives the probability for the waiting time between Poisson-distributed events.

  • An example.

    Accidents at a busy intersection occur at an average rate of $\lambda = 2\,\text{month}^{-1}$, following a Poisson distribution. The time $T$ between accidents is a random variable with density function $$\begin{aligned} f(t) &= 2e^{-2t}, \text{ for } t \geq 0. \end{aligned}$$

    I was bothered here by the dimensionful $\lambda$.

The general form of the distribution is $$\begin{aligned} \text{p.d.f.:} && f(t) &= \lambda e^{-\lambda t} \\ \text{c.d.f.:} && F(t) &= 1-e^{-\lambda t} \\ \text{survival function:} && S(t) &= 1-F(t) &&= e^{-\lambda t} \\ \text{mean:} && E(T) &= \int \lambda t\, e^{-\lambda t} \mathrm dt &&= \frac{1}{\lambda} \\ \text{mean square:} && E(T^2) &= \int \lambda t^2\, e^{-\lambda t} \mathrm dt &&= \frac{2}{\lambda^2} \\ \text{variance:} && V(T) &= \frac2{\lambda^2} - \frac{1}{\lambda^2} &&= \frac{1}{\lambda^2} \end{aligned}$$

Failure (or hazard) rates

Let $T$ be a random variable with density function $f(t)$, cumulative distribution function $F(t) = \int_0^t \mathrm dt' f(t')$, and survival function $S(t) = 1-F(t)$. The "failure rate function" $\lambda(t)$ is defined by $$\begin{aligned} \lambda(t) = \frac{f(t)}{1-F(t)} = \frac{f(t)}{S(t)}. \end{aligned}$$ For exponential distributions this is cute: $$\begin{aligned} \lambda(x) &= \frac{\lambda e^{-\lambda x}}{ e^{-\lambda x}} = \lambda. \end{aligned}$$ This is a sort of conditional probability: $$\begin{aligned} \lambda(t)\delta t & \approx \frac{P(t<T<t+\delta t)}{P(t < T)} \\&= P(t<T<t+\delta t | t < T) \end{aligned}$$

Why the waiting time is exponential for Poisson-distributed events

One final assumption:

If the number of events in a time period of unit length is Poisson-distributed with parameter $\lambda$, then the number of events in a time period with length $t$ is Poisson-distributed with parameter $\lambda t$.

This give $$\begin{aligned} P(X=0) = \frac{e^{-\lambda t} (\lambda t)^0}{0!} = e^{-\lambda t}. \end{aligned}$$

But $P(X=0)$ must be the same as the survival function, $S(t)$. From the discovery that $S(t) = e^{-\lambda t}$, we can derive all the other features of the exponential distribution.

8.3 the Gamma distribution

If an exponential random variable can be used to model the waiting time before the next independent event, a gamma distributed can model the waiting time before the $n$th next event. Consider modeling the failures of machine parts or the survival time for a disease.

The gamma density function is $$\begin{aligned} f(x) &= \frac{ \beta^\alpha }{ \Gamma(\alpha)} x^{\alpha-1} e^{-\beta x} \end{aligned}$$

Here $\beta$ is a mean-time parameter and $\alpha$ seems to correspond to the number of events of interest. (Beware that some authors prefer instead the waiting-time parameter $1/\beta$.) Note that the $\alpha=1$ case,

$$\begin{aligned} f(x) &= \frac{\beta}{\Gamma(1)} e^{-\beta x} \end{aligned}$$

corresponds to the exponential distribution.

Theorem: sums of independent exponential random variables

Stated without proof:

Let $X_1,X_2,\cdots,X_n$ be independent random variables, exponentially distributed with the same constant $\beta$. Then the sum $\sum X_i$ is gamma-distributed, with the same $\beta$ and with $\alpha = n$.

Mean and variance

As one might hope, $$\begin{aligned} E(X) &= \frac{\alpha}{\beta} & V(X) &= \frac{\alpha}{\beta^2} \end{aligned}$$

8.4 the Normal distribution

The distribution and its friends are

$$\begin{aligned} \text{p.d.f.:} && f(x) &= \frac{1}{\sqrt{2\sigma^2\pi}} e^{-\frac{(x-\mu)^2}{2\sigma^2}} \\ \text{mean:} && E(X) &= \mu \\ \text{variance:} && V(X) &= \sigma^2 \end{aligned}$$

The cumulative distribution function is non-analytic. I happen to know that it's some nastiness involving the error function, but I always have to look it up:

$$\begin{aligned} F(x) & = \frac12\left( 1 + \text{erf}\left( \frac{x-\mu}{\sigma\sqrt 2} \right) \right) \end{aligned}$$

Transformations remain normally distributed, with

$$\begin{aligned} E(aX + b) &= a E(X) + b \\ V(aX + b) &= a^2 V(X) = a^2\sigma^2 \\ \sigma_Y &= \sqrt{V(Y)} = \text{abs}(a)\cdot\sigma \end{aligned}$$

We can "standardize" a random variable by considering instead the random variable

$$\begin{aligned} Z &= \frac{X-\mu}{\sigma} = \frac1\sigma X - \frac\mu\sigma \end{aligned}$$

which has $E(Z) = 0$ and $V(Z) = 1$.

Ordinarly one consults a computer or a table for values of $F(Z)$.

The central limit theorem

The central limit theorem says that, if we have lots of independent and identically-distributed (i.i.d.) random variables, their sum is approximately normally-distributed with mean $n\mu$ and variance $n\sigma^2$.

The continuity correction

If we're modeling a discrete problem, consider adjusting the limits of the continuous normal distribution by half a unit, to include all the probability associated with real numbers which round to the (included) integer endpoints.

8.5 the Lognormal distribution

The log-normal distribution is nice for one-sided phenomena with a long tail, like insurance claims. It holds for a random variable $Y = e^X$ if $X$ is normally distributed. So we have

$$\begin{aligned} f(y) &= \frac{1}{\sqrt{(y\sigma)^2 \cdot 2\pi}} \exp\left( -\frac12\left(\frac{\ln y - \mu}{\sigma} \right)^2 \right) \end{aligned}$$

This gives the annoying $$\begin{aligned} E(Y) &= \exp\left( \mu + \frac{\sigma^2}{2}\right) \\ V(Y) &= e^{2\mu + \sigma^2} \left(e^{\sigma^2} - 1\right) \end{aligned}$$

But for cumulative probabilities, we can say $$\begin{aligned} F_Y(c) &= P(Y \leq c) \\&= P(e^X \leq c) \\&= P(X \leq \ln c) \\&= F_X(\ln c) \end{aligned}$$

Lognormal distribution for stock prices

Stock prices over time grow exponentially, $$\begin{aligned} A(t) &= A_0 e^{rt} \end{aligned}$$ Apparently this makes the lognormal distribution a good fit? That would be the case if the growth rates $r$ were normally distributed. I guess that's pretty reasonable.

8.6 (exclude) the Pareto distribution

The one-parameter1 Pareto distribution has density

$$\begin{aligned} f(x) &= \frac{\alpha}{\beta}\left( \frac{\beta}{x}\right)^{\alpha+1} &\text{ with }\begin{cases} 2 < \alpha \\ 0 < \beta \leq x \\ \end{cases} \end{aligned}$$

If the restriction $\alpha>2$ is relaxed, the mean and the variance aren't guaranteed to exist. The parameter $\beta$ defines the domain of the density function, because the cumulative distribution works out to be

$$\begin{aligned} F(x) &= 1 - \left(\frac \beta x\right)^\alpha, \end{aligned}$$

and $x<\beta$ would have negative cumulative probability. The Pareto distribution is defined for $x$ above some threshold:

A plot of the Pareto density function

The figure shows the graph of a Pareto density function for loss amounts measured in hundreds of dollars, i.e. $\$300$ is represented by $x=3$. This insurance policy has a deductible of $\$300$. Claims for under this minimum aren't filed.

The mean and variance are

$$\begin{aligned} E(X) &= \frac{\alpha\beta}{\alpha - 1} \\ V(X) &= \frac{\alpha\beta^2}{\alpha-2} - \left(\frac{\alpha\beta}{\alpha-1}\right)^2 \end{aligned}$$

We can also define the failure rate2:

$$\begin{aligned} \lambda(t) &= \frac{f(t)}{1-F(t)} \\ &= \frac \alpha\beta \frac{ (\beta/x)^{\alpha+1} }{ (\beta/x)^\alpha } \\ &= \frac \alpha x \end{aligned}$$

The failure rate here is useful for things like insurance claim amounts.

8.7 (exclude) the Weibull distribution

If the failure rate isn't constant with time, the Weibull distribution might be useful. It has a failure rate

$$\begin{aligned} \lambda(x) &= \alpha \beta x^{\alpha - 1}, \end{aligned}$$

which comes from a probability density $$\begin{aligned} f(x) &= \alpha\beta x^{\alpha-1} e^{-\beta x^\alpha}, & \text{for } x &\geq 0,\ \alpha > 0,\ \beta > 0 \end{aligned}$$

For $\alpha=1$ we recover the exponential distribution; for $\alpha > 1$ we have a failure rate that increases with time. We also have

$$\begin{aligned} \text{cumulative distribution:} && F(x) &= 1 - e^{-\beta x^\alpha} \\ \text{mean:} && E(X) &= \frac{\Gamma\left( 1 + \frac{1}{\alpha}\right)}{\beta^{1/\alpha}} \\\ \text{variance:} && V(X) &= \frac{1}{\beta^{2/\alpha}} \left( \Gamma\left(1 + \frac2\alpha\right) + \Gamma\left(1 + \frac1\alpha\right)^2 \right) \end{aligned}$$

8.8 the Beta distribution

The beta distribution is defined on $[0,1]$, and is useful for modeling things that can be written as percentages. It has two positive parameters and the probability density

$$\begin{aligned} f(x) &= \frac{\Gamma(\alpha + \beta)}{\Gamma(\alpha)\Gamma(\beta)} \cdot x ^ {\alpha - 1} (1-x)^{\beta - 1} \end{aligned}$$

Here's a figure showing $(\alpha,\beta) = (4,3)$:

A beta density function

The $\Gamma$ garbage is just a normalization. But it does imply that integrals of the form

$$\begin{aligned} \int_0^1 x^m (1-x)^n \mathrm dx &= \frac{\Gamma(m+1)\ \Gamma(n+1)}{\Gamma(m+n+1)} \\ &= \frac{n!\ m!}{(n+m)!} \end{aligned}$$

result in ratios of factorials.

We have

$$\begin{aligned} \text{mean:} && E(X) &= \frac{\alpha}{\alpha + \beta} \\ \text{variance:} && V(X) &= \frac{\alpha\beta}{ (\alpha+\beta)^2 (\alpha+\beta+1)} \end{aligned}$$

8.9 Fitting theoretical distributions to real problems

The reader may wonder how to decide which distribution is a good fit. Well, keep wondering, cupcake. We're just going to tell you what to choose.


  1. There's also a two-parameter version. 

  2. The gamma, normal, and lognormal distributions apparently failure rates which are nontrivial to derive, so they're not in this book. Cue the standard eye-rolling about how "everything in the textbook is trivial."