# Probability Models

### The (a,b,0) and (a,b,1) classes

Posted on Updated on

This post is on two classes of discrete distributions called the (a,b,0) class and (a,b,1) class. This post is a follow-up on two previous posts – summarizing the two posts and giving more examples. The (a,b,0) class is discussed in details in this post in a companion blog. The (a,b,1) class is discussed in details in this post in a companion blog.

Practice problems for the (a,b,0) class is found here. The next post is a practice problem set on the (a,b,1) class.

The (a,b,0) Class

A counting distribution is a discrete probability distribution that takes on the non-negative integers (0, 1, 2, …). Counting distributions are useful when we want to model occurrences of a certain random events. The three commonly used counting distributions would be the Poisson distribution, the binomial distribution and the negative binomial distribution. All three counting distributions can be generated recursively. For these three distributions, the ratio of any two consecutive probabilities multiplied by integers can be expressed as a linear quantity.

To make the last point in the preceding paragraph clear, let’s set some notations. For any integer $k=0,1,2,\cdots$, let $P_k$ be the probability that the counting distribution in question takes on the value $k$. For example, if we are considering the counting random variable $N$, then $P_k=P[N=k]$. Let’s look at the situation where the ratio of any two consecutive values of $P_k$ can be expressed as an expression for some constants $a$ and $b$.

(1)……….$\displaystyle \frac{P_k}{P_{k-1}}=a + \frac{b}{k} \ \ \ \ \ \ \ \ \ \ \ \ \ k=1,2,3,\cdots$

Multiplying (1) by $k$ gives the following.

(1a)……….$\displaystyle k \ \frac{P_k}{P_{k-1}}=a \ k+ b \ \ \ \ \ \ \ \ \ \ \ \ \ k=1,2,3,\cdots$

Note that the right-hand side of (1a) is a linear expression of $k$. This provides a way to fit observations to (a,b,0) distributions.

Any counting distribution that satisfies the recursive relation (1) is said to be a member of the (a,b,0) class of distributions. Note that the recursion starts at $k=1$. Does that mean $P_0$ can be any probability value we assign? The value of $P_0$ is fixed because all the $P_k$ must sum to 1.

The three counting distribution mentioned above – Poisson, binomial and negative binomial – are all members of the (a,b,0) class. In fact the (a,b,0) class essentially has three distributions. In other words, any member of (a,b,0) class must be one of the three distributions – Poisson, binomial and negative binomial.

An (a,b,0) distribution has its usual parameters, e.g. Poisson has a parameter $\lambda$, which is its mean. So we need to way to translate the usual parameters to and from the parameters $a$ and $b$. This is shown in the table below.

Table 1

Distribution Usual Parameters Probability at Zero Parameter a Parameter b
Poisson $\lambda$ $e^{-\lambda}$ 0 $\lambda$
Binomial $n$ and $p$ $(1-p)^n$ $\displaystyle -\frac{p}{1-p}$ $\displaystyle (n+1) \ \frac{p}{1-p}$
Negative binomial $r$ and $p$ $p^r$ $1-p$ $(r-1) \ (1-p)$
Negative binomial $r$ and $\theta$ $\displaystyle \biggl(\frac{1}{1+\theta} \biggr)^r$ $\displaystyle \frac{\theta}{1+\theta}$ $\displaystyle (r-1) \ \frac{\theta}{1+\theta}$
Geometric $p$ $p$ $1-p$ 0
Geometric $\theta$ $\displaystyle \frac{1}{1+\theta}$ $\displaystyle \frac{\theta}{1+\theta}$ 0

Table 1 provides the mapping to translate between the usual parameters and the recursive parameters $a$ and $b$.

Example 1
Let $a=-1/3$ and $b=5/3$. Let the initial probability be $P_0=81/256$. Generate the first 4 probabilities according to the recursion formula (1)

$\displaystyle P_0=\frac{81}{256}$

$\displaystyle P_1=\biggl(-\frac{1}{3}+\frac{5}{3} \biggr) \ \frac{81}{256}=\frac{108}{256}$

$\displaystyle P_2=\biggl(-\frac{1}{3}+\frac{5}{3} \cdot \frac{1}{2} \biggr) \ \frac{108}{256}=\frac{54}{256}$

$\displaystyle P_3=\biggl(-\frac{1}{3}+\frac{5}{3} \cdot \frac{1}{3} \biggr) \ \frac{54}{256}=\frac{12}{256}$

$\displaystyle P_4=\biggl(-\frac{1}{3}+\frac{5}{3} \cdot \frac{1}{4} \biggr) \ \frac{12}{256}=\frac{1}{256}$

Note that the sum of $P_0$ to $P_4$ is 1. So this has to be a binomial distribution and not Poisson or negative binomial. The binomial parameters are $n=4$ and $p=1/4$. According to Table 1, this translate to $a=-1/3$ and $b=5/3$. The initial probability is $P_0=(1-p)^4$.

Example 2
This example generates several probabilities recursively for the negative binomial distribution with $r=2$ and $\theta=4$. According to Table 1, this translates to $a=4/5$ and $b=4/5$. The following shows the probabilities up to $P_6$.

$\displaystyle P_0=\biggl(\frac{1}{5}\biggr)^2=\frac{1}{25}=0.04$

$\displaystyle P_1=\biggl(\frac{4}{5}+\frac{4}{5} \biggr) \ \frac{1}{25}=\frac{8}{125}=0.064$

$\displaystyle P_2=\biggl(\frac{4}{5}+\frac{4}{5} \cdot \frac{1}{2} \biggr) \ \frac{8}{125}=\frac{48}{625}=0.0768$

$\displaystyle P_3=\biggl(\frac{4}{5}+\frac{4}{5} \cdot \frac{1}{3} \biggr) \ \frac{48}{625}=\frac{256}{3125}=0.08192$

$\displaystyle P_4=\biggl(\frac{4}{5}+\frac{4}{5} \cdot \frac{1}{4} \biggr) \ \frac{256}{3125}=\frac{256}{3125}=0.08192$

$\displaystyle P_5=\biggl(\frac{4}{5}+\frac{4}{5} \cdot \frac{1}{5} \biggr) \ \frac{256}{3125}=\frac{6144}{78125}=0.0786432$

$\displaystyle P_6=\biggl(\frac{4}{5}+\frac{4}{5} \cdot \frac{1}{6} \biggr) \ \frac{6144}{78125}=\frac{28672}{390625}=0.07340032$

The above probabilities can also be computed using the probability function given below.

$\displaystyle P_k=(1+k) \ \frac{1}{25} \ \biggl(\frac{4}{5} \biggr)^k \ \ \ \ \ \ \ \ k=0,1,2,3,\cdots$

For the (a,b,0) class, it is not just about calculating probabilities recursively. The parameters $a$ and $b$ also give information about other distributional quantities such as moments and variance. For a more detailed discussion of the (a,b,0) class, refer to this post in a companion blog.

The (a,b,1) Class

If the (a,b,0) class is just another name for the three distributions of Poisson, binomial and negative binomial, what is the point of (a,b,0) class? Why not just work with these three distributions individually? Sure, generating the probabilities recursively is a useful concept. The probability functions of the three distributions already give us a clear and precise way to calculate probabilities. The notion of (a,b,0) class leads to the notion of (a,b,1) class, which gives a great deal more flexibility in the modeling counting distributions. It is possible that the (a,b,0) distributions do not adequately describe a random counting phenomenon being observed. For example, the sample data may indicate that the probability at zero may be larger than is indicated by the distributions in the (a,b,0) class. One alternative is to assign a larger value for $P_0$ and recursively generate the subsequent probabilities $P_k$ for $k=2,3,\cdots$. This recursive relation is the defining characteristics of the (a,b,1) class.

A counting distribution is a member of the (a,b,1) class of distributions if the following recursive relation holds for some constants $a$ and $b$.

(2)……….$\displaystyle \frac{P_k}{P_{k-1}}=a + \frac{b}{k} \ \ \ \ \ \ \ \ \ \ \ \ \ k=2,3,4, \cdots$

Note that the recursion begins at $k=2$. Can the values for $P_0$ and $P_1$ be arbitrary? The initial probability $P_0$ is an assumed value. The probability $P_1$ is the value such that the sum $P_1+P_2+P_3+\cdots$ is $1-P_0$.

The (a,b,1) class gives more flexibility in modeling. For example, the initial probability is $P_0=0.04$ in the negative binomial distribution in Example 2. If this $P_0$ is felt to be too small, then a larger value for $P_0$ can be assigned and then let the remaining probabilities be generated by recursion. We demonstrate how this is done using the same (a,b,0) distribution in Example 2.

Before we continue with Example 2, we comment that there are two subclasses in the (a,b,1) class. The subclasses are distinguished by whether $P_0=0$ or $P_0>0$. The (a,b,1) distributions are called zero-truncated distributions in the first case and are called zero-modified distributions in the second case.

Because there are three related distributions, we need to establish notations to keep track of the different distributions. We use the notations established in this post. The notation $P_k$ refers to the probabilities for an (a,b,0) distribution. From this (a,b,0) distribution, we can derive a zero-truncated distribution whose probabilities are notated by $P_k^T$. From this zero-truncated distribution, we can derive a zero-modified distribution whose probabilities are denoted by $P_k^M$. For example, for the negative binomial distribution in Example 2, we derive a zero-truncated negative binomial distribution (Example 3) and from it we derive a zero-modified negative binomial distribution (Example 4).

Example 3
In Example 3, we calculated the (a,b,0) probabilities $P_k$ up to $k=6$. We now calculate the probabilities $P_k^T$ for the corresponding zero-truncated negative binomial distribution. For a zero-truncated distribution, the value of zero is not recorded. So $P_k^T$ is simply $P_k$ divided by $1-P_0$.

(3)……….$\displaystyle P_k^T=\frac{1}{1-P_0} \ P_k \ \ \ \ \ \ \ \ \ \ k=1,2,3,4,\cdots$

The sum of $P_k^T$, $k=1,2,3,\cdots$, must be 1 since $P_0,P_1,P_3,\cdots$ is a probability distribution. The (a,b,0) $P_0$ is $1/25$. Then $1-P_0=24/25$, which means $1/(1-P_0)=25/24$. The following shows the zero-truncated probabilities.

$\displaystyle P_1^T=\frac{8}{125} \cdot \frac{25}{24}=\frac{8}{120}$

$\displaystyle P_2^T=\frac{8}{125} \cdot \frac{25}{24}=\frac{48}{600}$

$\displaystyle P_3^T=\frac{256}{3125} \cdot \frac{25}{24}=\frac{256}{3000}$

$\displaystyle P_4^T=\frac{256}{3125} \cdot \frac{25}{24}=\frac{256}{3000}$

$\displaystyle P_5^T=\frac{6144}{78125} \cdot \frac{25}{24}=\frac{6144}{75000}$

$\displaystyle P_6^T=\frac{28672}{390625} \cdot \frac{25}{24}=\frac{28672}{375000}$

The above are the first 6 probabilities of the zero-truncated negative binomial distribution with $a=4/5$ and $b=4/5$ or with the usual parameters $r=2$ and $\theta=4$. The above $P_k^T$ can also be calculated recursively by using (2). Just calculate $P_1^T$ and the rest of the probabilities can be generated using the recursion relation (2).

Example 4
From the zero-truncated negative binomial distribution in Example 3, we generate a zero-modified negative binomial distribution. If the original $P_0=0.04$ is considered too small,e.g. not big enough to account for the probability of zero claims, then we can assign a larger value to the zero probability. Let’s say 0.10 is more appropriate. So we set $P_0^M=0.10$. Then the rest of the $P_k^M$ must sum to $1-P_0^M$, or 0.9 in this example. The following shows how the zero-modified probabilities are related to the zero-truncated probabilities.

(4)……….$\displaystyle P_k^M=(1-P_0^M) \ P_k^T \ \ \ \ \ \ \ \ \ \ k=1,2,3,4,\cdots$

The following gives the probabilities for the zero-modified negative binomial distribution.

$\displaystyle P_0^M=0.1$ (assumed value)

$\displaystyle P_1^M=0.9 \cdot \frac{8}{120}=\frac{7.2}{120}$

$\displaystyle P_2^M=0.9 \cdot \frac{48}{600}=\frac{43.2}{600}$

$\displaystyle P_3^M=0.9 \cdot \frac{256}{3000}=\frac{230.4}{3000}$

$\displaystyle P_4^M=0.9 \cdot \frac{256}{3000}=\frac{230.4}{3000}$

$\displaystyle P_5^M=0.9 \cdot \frac{6144}{75000}=\frac{5529.6}{75000}$

$\displaystyle P_6^M=0.9 \cdot \frac{28672}{375000}=\frac{25804.8}{375000}$

The same probabilities can also be obtained by using the original (a,b,0) probabilities $P_k$ directly as follows:

(5)……….$\displaystyle P_k^M=\frac{1-P_0^M}{1-P_0} \ P_k \ \ \ \ \ \ \ \ \ \ k=1,2,3,4,\cdots$

ETNB Distribution

Examples 2, 3 and 4 show, starting with with an (a,b,0) distribution, how to derive a zero-truncated distribution and from it a zero-modified distribution. In these examples, we start with a negative binomial distribution and the derived distributions are zero-truncated negative binomial distribution and zero-modified negative binomial distribution. If the starting distribution is a Poisson distribution, then the same process would produce a zero-truncated Poisson distribution and a zero-modified Poisson distribution (with a particular assumed value of $P_0^M$).

There are members of the (a,b,1) class that do not originate from a member of the (a,b,0) class. Three such distributions are discussed in this post on the (a,b,1) class. We give an example discussing one of them.

Example 5
This example demonstrates how to work with the extended truncated negative binomial distribution (ETNB). The usual negative binomial distribution has two parameters $r$ and $\theta$ in one version ($r$ and $p$ in another version). Both parameters are positive real numbers. To define an ETNB distribution, we relax the $r$ parameter to include the possibility of $-1 in addition to $r>0$. Of course if $r>0$, then we just have the usual negative binomial distribution. So we focus on the new situation of $-1.

Let’s say $r=-0.2$ and $\theta=1$. We take these two parameters and generate the “negative binomial” probabilities, from which we generate the zero-truncated probabilities $P_k^T$ as shown in Example 3. Now the parameters $r=-0.2$ and $\theta=1$ do not belong to a legitimate negative binomial distribution. In fact the resulting $P_k$ are negative values. So this “negative binomial” distribution is just a device to get things going.

According to Table 1, $r=-0.2$ and $\theta=1$ translate to $a=1/2$ and $b=-3/5$. We generate the “negative binomial” probabilities using the recursive relation (1). Don’t be alarmed that the probabilities are negative.

$\displaystyle P_0=\biggl(\frac{1}{2}\biggr)^{-0.2}=2^{0.2}=1.148698355$

$\displaystyle P_1=\biggl(\frac{1}{2}-\frac{3}{5} \biggr) \ 2^{0.2}=\frac{-1}{10} \ 2^{0.2}$

$\displaystyle P_2=\biggl(\frac{1}{2}-\frac{3}{5} \ \frac{1}{2} \biggr) \ \frac{-1}{10} \ 2^{0.2}=\frac{-2}{100} \ 2^{0.2}$

$\displaystyle P_3=\biggl(\frac{1}{2}-\frac{3}{5} \ \frac{1}{3} \biggr) \ \frac{-2}{100} \ 2^{0.2}=\frac{-6}{1000} \ 2^{0.2}$

$\displaystyle P_4=\biggl(\frac{1}{2}-\frac{3}{5} \ \frac{1}{4} \biggr) \ \frac{-6}{1000} \ 2^{0.2}=\frac{-21}{10000} \ 2^{0.2}$

$\displaystyle P_5=\biggl(\frac{1}{2}-\frac{3}{5} \ \frac{1}{5} \biggr) \ \frac{-21}{10000} \ 2^{0.2}=\frac{-399}{500000} \ 2^{0.2}$

The initial $P_0$ is greater than 1 and the other so called probabilities are negative. But they are just a device to get the ETNB probabilities. Using the formula stated in (3) gives the following zero-truncated ETNB probabilities.

$\displaystyle P_1^T=\frac{1}{1-2^{0.2}} \ \biggl( \frac{-1}{10} \ 2^{0.2}\biggr)=0.7725023959$

$\displaystyle P_2^T=\frac{1}{1-2^{0.2}} \ \biggl( \frac{-2}{100} \ 2^{0.2}\biggr)=0.1545004792$

$\displaystyle P_3^T=\frac{1}{1-2^{0.2}} \ \biggl( \frac{-6}{1000} \ 2^{0.2}\biggr)=0.0463501438$

$\displaystyle P_4^T=\frac{1}{1-2^{0.2}} \ \biggl( \frac{-21}{10000} \ 2^{0.2}\biggr)=0.0162225503$

$\displaystyle P_5^T=\frac{1}{1-2^{0.2}} \ \biggl( \frac{-399}{500000} \ 2^{0.2}\biggr)=0.0061645691$

The above gives the first 5 probabilities of the zero-truncated ETNB distribution with parameters $a=1/2$ and $b=-3/5$. It is an (a,b,1) distribution that does not originate from any (legitimate) (a,b,0) distribution.

Practice Problems

The next post is a practice problem set on the (a,b,1) class.

actuarial practice problems

Daniel Ma actuarial

Dan Ma actuarial

Dan Ma actuarial science

Daniel Ma actuarial science

Daniel Ma Math

Daniel Ma Mathematics

Actuarial exam

$\copyright$ 2019 – Dan Ma

### Several versions of negative binomial distribution

Posted on Updated on

This post shows how to work with negative binomial distribution from an actuarial modeling perspective. The negative binomial distribution is introduced as a Poisson-gamma mixture. Then other versions of the negative binomial distribution follow. Specific attention is paid to the thought processes that facilitate calculation involving negative binomial distribution.

Negative Binomial Distribution as a Poisson-Gamma Mixture

Here’s the setting for the Poisson-gamma mixture. Suppose that $X \lvert \Lambda$ has a Poisson distribution with mean $\Lambda$ and that $\Lambda$ is a random variable that varies according to a gamma distribution with parameters $\alpha$ (shape parameter) and $\rho$ (rate parameter). Then the following is the unconditional probability function of $X$.

$\displaystyle (1) \ \ \ \ P[X=k]=\frac{\Gamma(\alpha+k)}{k! \ \Gamma(\alpha)} \ \biggl( \frac{\rho}{1+\rho} \biggr)^\alpha \ \biggl(\frac{1}{1+\rho} \biggr)^k \ \ \ \ k=0,1,2,3,\cdots$

The distribution described in (1) is one parametrization of the negative binomial distribution (derived here). It has two parameters $\alpha$ and $\rho$ (coming from the gamma mixing weights). The following is another parametrization.

$\displaystyle (2) \ \ \ \ P[X=k]=\frac{\Gamma(\alpha+k)}{k! \ \Gamma(\alpha)} \ \biggl( \frac{1}{1+\theta} \biggr)^\alpha \ \biggl(\frac{\theta}{1+\theta} \biggr)^k \ \ \ \ k=0,1,2,3,\cdots$

The distribution described in (2) is obtained when the gamma mixing weight $\Lambda$ has a shape parameter $\alpha$ and a scale parameter $\theta$. Since the gamma scale parameter and rate parameter is related by $\rho=1/ \theta$, (2) can be derived from (1) by setting $\rho=1/ \theta$.

Both (1) and (2) contain the ratio $\frac{\Gamma(\alpha+k)}{k! \ \Gamma(\alpha)}$ that is expressed using the gamma function. The next task is to simplify the ratio using a general notion of binomial coefficient.

The Poisson-gamma mixture is discussed in this blog post in a companion blog called Topics in Actuarial Modeling.

General Binomial Coefficient

The familiar binomial coefficient is the following:

$(3) \ \ \ \ \displaystyle \binom{n}{j}=\frac{n!}{j! (n-j)!}$

where the top number $n$ is a positive integer and the bottom number $j$ is a non-negative integer such that $n \ge j$. Other notations for binomial coefficient are $C(n,j)$, $_nC_j$ and $C_{n,j}$. The right hand side of the above expression can be simplified by canceling out $(n-j)!$.

$(4) \ \ \ \ \displaystyle \binom{n}{j}=\frac{n (n-1) (n-2) \cdots (n-(j-1))}{j!}$

The expression in (4) is obtained by canceling out $(n-j)!$ in (3). Note that $n$ does not have to be an integer for the calculation in (4) to work. The bottom number $j$ has to be a non-negative number since $j!$ is involved. However, $n$ can be any positive real number as long as $n>j-1$.

Thus the expression in (4) gives a new meaning to the binomial coefficient where $n$ is a positive real number and $j$ is a non-negative integer such that $n>j-1$.

$\displaystyle (5) \ \ \ \ \binom{n}{j}=\left\{ \begin{array}{ll} \displaystyle \frac{n (n-1) (n-2) \cdots (n-(j-1))}{j!} &\ n>j-1, j=1,2,3,\cdots \\ \text{ } & \text{ } \\ \displaystyle 1 &\ j=0 \\ \text{ } & \text{ } \\ \displaystyle \text{undefined} &\ \text{otherwise} \end{array} \right.$

For example, $\binom{2.3}{1}=2.3$ and $\binom{5.1}{3}=(5.1 \times 4.1 \times 3.1) / 3!=10.8035$. The thought process is that the numerator is obtained by subtracting 1 $j-1$ times from $n$. If $j=0$, this thought process would not work. For convenience, $\binom{n}{0}=1$ when $n$ is a positive real number.

We now use the binomial coefficient defined in (5) to simplify the ratio $\frac{\Gamma(\alpha+k)}{k! \ \Gamma(\alpha)}$ where $\alpha$ is a positive real number and $k$ is a non-negative integer. We use a key fact about gamma function: $\Gamma(1+w)=w \Gamma(w)$. Then for any integer $k \ge 1$, we have the following derivation.

\displaystyle \begin{aligned} \Gamma(\alpha+k)&=\Gamma(1+\alpha+k-1)=(\alpha+k-1) \ \Gamma(\alpha+k-1) \\&=(\alpha+k-1) \ (\alpha+k-2) \ \Gamma(\alpha+k-2) \\&\ \ \ \vdots \\&=(\alpha+k-1) \ (\alpha+k-2) \cdots (\alpha+1) \ \alpha \ \Gamma(\alpha) \end{aligned}

$\displaystyle \frac{\Gamma(\alpha+k)}{k! \ \Gamma(\alpha)}=\frac{(\alpha+k-1) \ (\alpha+k-2) \cdots (\alpha+1) \ \alpha}{k!}$

The right hand side of the above expression is precisely the binomial coefficient $\binom{\alpha+k-1}{k}$ when $k \ge 1$. Thus we have the following relation.

$\displaystyle (6) \ \ \ \ \frac{\Gamma(\alpha+k)}{k! \ \Gamma(\alpha)}=\frac{(\alpha+k-1) \ (\alpha+k-2) \cdots (\alpha+1) \ \alpha}{k!}=\binom{\alpha+k-1}{k}$

where $k$ is an integer with $k \ge 1$.

Negative Binomial Distribution

With relation (6), the two versions of Poisson-gamma mixture stated in (1) and (2) are restated as follows:

$\displaystyle (7) \ \ \ \ P[X=k]=\binom{\alpha+k-1}{k} \ \biggl( \frac{\rho}{1+\rho} \biggr)^\alpha \ \biggl(\frac{1}{1+\rho} \biggr)^k \ \ \ \ k=0,1,2,3,\cdots$

$\displaystyle (8) \ \ \ \ P[X=k]=\binom{\alpha+k-1}{k} \ \biggl( \frac{1}{1+\theta} \biggr)^\alpha \ \biggl(\frac{\theta}{1+\theta} \biggr)^k \ \ \ \ k=0,1,2,3,\cdots$

The above two parametrizations of negative binomial distribution are used if information about the Poisson-gamma mixture is known. In (7), the gamma distribution in the Poisson-gamma mixture has shape parameter $\alpha$ and rate parameter $\rho$. In (8), the gamma distribution has shape parameter $\alpha$ and scale parameter $\theta$. The following is a standalone version of the binomial distribution.

$\displaystyle (9) \ \ \ \ P[X=k]=\binom{\alpha+k-1}{k} \ p^\alpha \ (1-p)^k \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ k=0,1,2,3,\cdots$

In (9), the negative binomial distribution has two parameters $\alpha>0$ and $p$ where $0. In this parmetrization, the parameter $p$ is simply a real number between 0 and 1. It can be viewed as a probability. In fact, this is the case when the parameter $\alpha$ is an integer. Version (9) can be restated as follows when $\alpha$ is an integer.

\displaystyle \begin{aligned} (10) \ \ \ \ P[X=k]&=\binom{\alpha+k-1}{k} \ p^\alpha \ (1-p)^k \\&=\frac{(\alpha+k-1)!}{k! \ (\alpha-1)!} \ p^\alpha \ (1-p)^k \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ k=0,1,2,3,\cdots \end{aligned}

In version (10), the parameters are $\alpha$ (a positive integer) and a real number $p$ with $0. Since $\alpha$ is an integer, the usual binomial coefficient appears in the probability function.

Version (10) has a natural interpretation. A Bernoulli trial is an random experiment that results in two distinct outcome – success or failure. Suppose that the probability of success is $p$ in each trial. Perform a series of independent Bernoulli trials until exactly $\alpha$ successes occur where $\alpha$ is a fixed positive integer. Let the random variable $X$ be the number of failures before the occurrence of the $\alpha$th success. Then (10) is the probability function for the random variable $X$.

A special case of (10). When the parameter $\alpha$ is 1, the negative binomial distribution has a special name.

$\displaystyle (11) \ \ \ \ P[X=k]=p \ (1-p)^k \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ k=0,1,2,3,\cdots$

The distribution in (11) is said to be a geometric distribution with parameter $p$. The random variable $X$ defined by (11) can be interpreted as the number of failures before the occurrence of the first success when performing a series of independent Bernoulli trials. Another important property of the geometric distribution is that it is the only discrete distribution with the memoryless property. As a result, the survival function of the geometric distribution is $P[X>k]=(1-p)^{k+1}$ where $k=0,1,2,\cdots$.

The probability functions of various versions of the negative binomial distribution have been developed in (1), (2), (7), (8), (9), (10) and (11). Other distributional quantities can be derived from the Poisson-gamma mixture. We derive the mean and variance of the negative binomial distribution.

Suppose that the negative binomial distribution is that of version (8). The conditional random variable $X \lvert \Lambda$ has a Poisson distribution with mean $\Lambda$ and the random variable $\Lambda$ has a gamma distribution with a shape parameter $\alpha$ and a scale parameter $\theta$. Note that $E(\Lambda)=\alpha \theta$ and $Var(\Lambda)=\alpha \theta^2$. Furthermore, we have the conditional mean and conditional variance

The following derives the mean and variance of $X$.

$E(X)=E[E(X \lvert \Lambda)]=E[\Lambda]=\alpha \theta$

\displaystyle \begin{aligned} Var(X)&=E[Var(X \lvert \Lambda)]+Var[E(X \lvert \Lambda)] \\&=E[\Lambda]+Var[\Lambda] \\&=\alpha \theta+\alpha \theta^2 \\&=\alpha \theta (1+\theta) \end{aligned}

The above mean and variance are for parametrization in (8). To obtain the mean and variance for the other parametrizations, make the necessary translation. For example, to get (7), plug $\theta=\frac{1}{\rho}$ into the above mean and variance. For (9), let $p=\frac{1}{1+\theta}$. Then solve for $\theta$ and plug that into the above mean and variance. Version (10) should have the same formulas as for (9). To get (11), set $\alpha=1$. The following table lists the negative binomial mean and variance.

Version Mean Variance
(7) $\displaystyle E(X)=\frac{\alpha}{\rho}$ $\displaystyle Var(X)=\frac{\alpha}{\rho} \ \biggl(1+\frac{1}{\rho} \biggr)$
(8) $E(X)=\alpha \ \theta$ $Var(X)=\alpha \ \theta \ (1+\theta)$
(9) and (10) $\displaystyle E(X)=\frac{\alpha (1-p)}{p}$ $\displaystyle Var(X)=\frac{\alpha (1-p)}{p^2}$
(11) $\displaystyle E(X)=\frac{1-p}{p}$ $\displaystyle Var(X)=\frac{1-p}{p^2}$

The table shows that the variance of the negative binomial distribution is greater than its mean (regardless of the version). This stands in contrast with the Poisson distribution whose mean and the variance are equal. Thus the negative binomial distribution would be a suitable model in situations where the variability of the empirical data is greater than the sample mean.

Modeling Claim Count

The negative binomial distribution is a discrete probability distribution that takes on the non-negative integers $0,1,2,3,\cdots$. Thus it can be used as a counting distribution, i.e. a model for the number of events of interest that occur at random. For example, the $X$ described above can be a good model for the frequency of loss, i,e, the random variable of the number of losses, either arising from a portfolio of insureds or from a particular insured in a given period of time.

The Poisson-gamma model has a great deal of flexibility. Consider a large population of individual insureds. The number of losses (or claims) in a year for each insured has a Poisson distribution with mean $\Lambda$. From insured to insured, there is uncertainty in the mean annual claim frequency $\Lambda$. However, the random variable $\Lambda$ varies according to a gamma distribution. As a result, the annual number of claims for an “average” insured or a randomly selected insured from the population will follow a negative binomial distribution.

Thus in a Poisson-gamma model, the claim frequency for an individual in the population follows a Poisson distribution with unknown gamma mean. The weighted average of these conditional Poisson claim frequencies is a negative binomial distribution. Thus the average claim frequency over all individuals has a negative binomial distribution.

The table in the preceding section shows that the variance of the negative binomial distribution is greater than the mean. This is in contrast to the fact that the variance and the mean of a Poisson distribution are equal. Thus the unconditional claim frequency $X$ is more dispersed than its conditional distributions. The increased variance of the negative binomial distribution reflects the uncertainty in the parameter of the Poisson mean across the population of insureds. The uncertainty in the parameter variable $\Lambda$ has the effect of increasing the unconditional variance of the mixture distribution of $X$. Recall that the variance of a mixture distribution has two components, the weighted average of the conditional variances and the variance of the conditional means. The second component represents the additional variance introduced by the uncertainty in the parameter $\Lambda$.

We present two examples. More examples to come at the end of the post.

Example 1
For a given insured driver in a large portfolio of insured drivers, the number of collision claims in a year has a Poisson distribution with mean $\Lambda$. The Poisson mean $\Lambda$ follows a gamma distribution with mean 4 and variance 80. For a randomly selected insured driver from this portfolio,

• what is the probability of having exactly 2 collision claims in the next year?
• what is the probability of having at most one collision claim in the next year?

The number of collision claims in a year is a Poisson-gamma mixture and thus is a negative binomial distribution. From the given gamma mean and variance, we can determine the parameters of the gamma distribution. In this example, we use the parametrization of (8). Expressing the gamma mean and variance in terms of the shape and scale parameters, we have $\alpha \theta=4$ and $\alpha \theta^2=80$. These two equations give $\alpha=0.2$ and $\theta=20$. The probabilities are calculated based on (8).

$\displaystyle P[X=0]=\biggl( \frac{1}{21} \biggr)^{0.2}=0.5439$

$\displaystyle P[X=1]=\binom{0.2}{1} \ \biggl( \frac{1}{21} \biggr)^{0.2} \ \biggl( \frac{20}{21} \biggr)=0.2 \ \biggl( \frac{1}{21} \biggr)^{0.2} \ \biggl( \frac{20}{21} \biggr)=0.1036$

$\displaystyle P[X=2]=\binom{1.2}{2} \ \biggl( \frac{1}{21} \biggr)^{0.2} \ \biggl( \frac{20}{21} \biggr)^2=\frac{1.2 (0.2)}{2!} \ \biggl( \frac{1}{21} \biggr)^{0.2} \ \biggl( \frac{20}{21} \biggr)^2=0.0592$

The answer for the first question is $P[X=2]=0.0592$. The answer for the second question is $P[X \le 1]=P[X=0]+P[X=1]=0.6475$. Thus there is a closed to 65% chance that an insured driver has at most one claim in a year.

Example 2
For an automobile insurance company, the distribution of the annual number of claims for a policyholder chosen at random is modeled by a negative binomial distribution that is a Poisson-gamma mixture. The gamma distribution in the mixture has a shape parameters of $\alpha=1$ and scale parameter $\theta=3$. What is the probability that a randomly selected policyholder has more than two claims in a year?

Since the gamma shape parameter is 1, the unconditional number of claims in a year is a geometric distribution with parameter $p=1/4$. The following is the desired probability.

$\displaystyle P[X>2]=\biggl( \frac{1}{4} \biggr) \ \biggl( \frac{3}{4} \biggr)^3=\frac{27}{256}=0.1055$

A Recursive Formula

The probability functions described in (1), (2), (7), (8), (9), (10) and (11) describe clearly how the negative binomial probabilities are calculated based on the two given parameters. The probabilities can also be calculated recursively. Let $P_k=P[X=k]$ where $k=0,1,2,\cdots$. We introduce a recursive formula that allows us to compute the value $P_k$ if $P_{k-1}$ is known. The following is the form of the recursive formula.

$\displaystyle (12) \ \ \ \ \frac{P_k}{P_{k-1}}=a+\frac{b}{k} \ \ \ \ \ \ \ \ \ \ k=1,2,3,\cdots$

In (12), the numbers $a$ and $b$ are constants. Note that the formula (12) calculates probabilities $P_k$ for all $k \ge 1$. It turns out that the initial probability $P_0$ is determined by the constants $a$ and $b$. Thus the constants $a$ and $b$ completely determines the probability distribution represented by $P_k$. Any discrete probability distribution that satisfies this recursive relation is said to be a member of the (a,b,0) class of distributions.

We show that the negative binomial distribution is a member of the (a,b,0) class of distributions. First, assume that the negative binomial distribution conforms to the parametrization in (8) with parameters $\alpha$ and $\theta$. Then let $a$ and $b$ be defined as follows:

$\displaystyle a=\frac{\theta}{1+\theta}$

$\displaystyle b=\frac{(\alpha-1) \theta}{1+\theta}$.

Let the initial probability be $P_0=(1+\theta)^{-\alpha}$. We claim that the probabilities generated by the formula (12) are identical to the ones calculated from (8). To see this, let’s calculate a few probabilities using the formula.

$\displaystyle P_0=\biggl(\frac{1}{1+\theta} \biggr)^\alpha$

\displaystyle \begin{aligned} P_1&=(a+b) P_0 \\&=\biggl(\frac{\theta}{1+\theta}+ \frac{(\alpha-1) \theta}{1+\theta} \biggr) \ \biggl(\frac{1}{1+\theta} \biggr)^\alpha \\&=\alpha \ \biggl(\frac{1}{1+\theta} \biggr)^\alpha \ \frac{\theta}{1+\theta} \\&=\binom{\alpha}{1} \ \biggl(\frac{1}{1+\theta} \biggr)^\alpha \ \frac{\theta}{1+\theta}=P[X=1] \end{aligned}

\displaystyle \begin{aligned} P_2&=\biggl(a+\frac{b}{2} \biggr) P_1 \\&=\biggl(\frac{\theta}{1+\theta}+ \frac{(\alpha-1) \theta}{2(1+\theta)} \biggr) \ \alpha \ \biggl(\frac{1}{1+\theta} \biggr)^\alpha \ \frac{\theta}{1+\theta} \\&=\frac{(\alpha+1) \alpha}{2!} \ \biggl(\frac{1}{1+\theta} \biggr)^\alpha \ \biggl( \frac{\theta}{1+\theta} \biggr)^2 \\&=\binom{\alpha+1}{2} \ \biggl(\frac{1}{1+\theta} \biggr)^\alpha \ \biggl( \frac{\theta}{1+\theta} \biggr)^2=P[X=2] \end{aligned}

\displaystyle \begin{aligned} P_3&=\biggl(a+\frac{b}{3} \biggr) P_2 \\&=\biggl(\frac{\theta}{1+\theta}+ \frac{(\alpha-1) \theta}{3(1+\theta)} \biggr) \ \frac{(\alpha+1) \alpha}{2!} \ \biggl(\frac{1}{1+\theta} \biggr)^\alpha \ \biggl( \frac{\theta}{1+\theta} \biggr)^2 \\&=\frac{(\alpha+2) (\alpha+1) \alpha}{3!} \ \biggl(\frac{1}{1+\theta} \biggr)^\alpha \ \biggl( \frac{\theta}{1+\theta} \biggr)^3 \\&=\binom{\alpha+2}{3} \ \biggl(\frac{1}{1+\theta} \biggr)^\alpha \ \biggl( \frac{\theta}{1+\theta} \biggr)^3=P[X=3] \end{aligned}

The above derivation demonstrates that formula (12) generates the same probabilities as (8). By adjusting the constants $a$ and $b$, the recursive formula can also generate the probabilities in the other versions of the negative binomial distribution. For the negative binomial version (9) with parameters $\alpha$ and $p$, the $a$ and $b$ should be defined as follows:

$a=1-p$

$b=(\alpha-1) \ (1-p)$

With the initial probability $P_0=p^\alpha$, the recursive formula (12) will generate the same probabilities as those from version (9).

More Examples

Example 3
Suppose that an insured will produce $n$ claims during the next exposure period is

$\displaystyle \frac{e^{-\lambda} \ \lambda^n}{n!}$

where $n=0,1,2,3,\cdots$. Furthermore, the parameter $\lambda$ varies according to a distribution with the following density function:

$\displaystyle g(\lambda)=\frac{9.261}{2} \ \lambda^2 \ e^{-2.1 \lambda} \ \ \ \ \ \ \lambda>0$

What is the probability that a randomly selected insured will produce more than 2 claims during the next exposure period?

Note that the claim frequency for an individual insured has a Poisson distribution with mean $\lambda$. The given density function for the parameter $\lambda$ is that of a gamma distribution with $\alpha=3$ and rate parameter $\rho=2.1$. Thus the number of claims in an exposure period for a randomly selected (or “average” insured) will have a negative binomial distribution. In this case the parametrization (7) is the most useful one to use.

\displaystyle \begin{aligned} P(X=k)&=\binom{k+2}{k} \ \biggl(\frac{2.1}{3.1} \biggr)^3 \ \biggl(\frac{1}{3.1} \biggr)^k \\&=\frac{(k+2) (k+1)}{2} \ \biggl(\frac{21}{31} \biggr)^3 \ \biggl(\frac{10}{31} \biggr)^k \ \ \ \ \ k=0,1,2,3,\cdots \end{aligned}

The following calculation gives the relevant probabilities to answer the question.

$\displaystyle P(X=0)=\biggl(\frac{21}{31} \biggr)^3$

$\displaystyle P(X=1)=3 \ \biggl(\frac{21}{31} \biggr)^3 \ \biggl(\frac{10}{31} \biggr)$

$\displaystyle P(X=2)=6 \ \biggl(\frac{21}{31} \biggr)^3 \ \biggl(\frac{10}{31} \biggr)^2$

Summing the three probabilities gives $P(X \le 2)=0.805792355$. Then $P(X>2)=0.1942$. There is a 19.42% chance that a randomly selected insured will have more than 2 claims in an exposure period.

Example 4
The number of claims in a year for each insured in a large portfolio has a Poisson distribution with mean $\lambda$. The parameter $\lambda$ follows a gamma distribution with mean 0.75 and variance 0.5625.

Determine the proportion of insureds that are expected to have less than 1 claim in a year.

Setting $\alpha \theta=0.75$ and $\alpha \theta^2=0.5625$ gives $\alpha=1$ and $\theta=0.75$. Thus the parameter $\lambda$ follows a gamma distribution with shape parameter $\alpha=1$ and scale parameter $\theta=0.75$. This is an exponential distribution with mean 0.75. The problems asks for the proportion of insured with $\lambda<1$. Thus the answer is $1-e^{-1/0.75}=0.7364$. Thus about 74% of the insured population are expected to have less than 1 claim in a year.

Example 5
Suppose that the number of claims in a year for an insured has a Poisson distribution with mean $\Lambda$. The random variable $\Lambda$ follows a gamma distribution with shape parameter $\alpha=2.5$ and scale parameter $\theta=1.2$.

One thousand insureds are randomly selected and are to be observed for a year. Determine the number of selected insureds expected to have exactly 3 claims by the end of the one-year observed period.

With this being a Poisson-gamma mixture, the number of claims in a year for a randomly selected insured has a negative binomial distribution. Using (8) and based on the gamma parameters given, the following is the probability function of negative binomial distribution.

$\displaystyle P(X=k)=\binom{k+1.5}{k} \ \biggl(\frac{1}{2.2} \biggr)^{2.5} \ \biggl(\frac{1.2}{2.2} \biggr)^k \ \ \ \ \ k=0,1,2,3,\cdots$

The following gives the calculation for $P(X=3)$.

\displaystyle \begin{aligned} P(X=3)&=\binom{4.5}{3} \ \biggl(\frac{1}{2.2} \biggr)^{2.5} \ \biggl(\frac{1.2}{2.2} \biggr)^3 \\&=\frac{4.5 (3.5) (2.5)}{3!} \ \biggl(\frac{1}{2.2} \biggr)^{2.5} \ \biggl(\frac{1.2}{2.2} \biggr)^3 \\&=6.5625 \ \biggl(\frac{1}{2.2} \biggr)^{2.5} \ \biggl(\frac{1.2}{2.2} \biggr)^3 \\&=0.148350259 \end{aligned}

With $1000 \times 0.148350259=148.35$, about 149 of the randomly selected insureds will have 3 claims in the observed period.

Example 6
Suppose that the annual claims frequency for an insured in a large portfolio of insureds has a distribution that is in the (a,b,0) class. Let $P_k$ be the probability that an insured has $k$ claims in a year.

Given that $P_1=0.3072$, $P_2=0.12288$ and $P_3=0.04096$, determine the probability that an insured has no claims in a one-year period.

Given $P_1$, $P_2$ and $P_3$, find $P_0$. Based on the recursive relation (12), we have the following two equations of $a$ and $b$.

$\displaystyle \frac{P_2}{P_1}=\frac{0.12288}{0.3072}=0.4=a+\frac{b}{2}$

$\displaystyle \frac{P_3}{P_2}=\frac{0.04096}{0.12288}=\frac{1}{3}=a+\frac{b}{3}$

Solving these two equations gives $a=0.2$ and $b=0.4$. Plugging $a$ and $b$ into the recursive relation gives the answer.

$\displaystyle \frac{P_1}{P_0}=\frac{0.3072}{P_0}=0.6$

$\displaystyle P_0=\frac{0.3072}{0.6}=0.512$.

Dan Ma actuarial

Daniel Ma actuarial

Dan Ma math

Dan Ma mathematics

$\text{ }$

$\text{ }$

$\text{ }$

$\copyright$ 2017 – Dan Ma

Revised Nov 2, 2018.

### Practice Problem Set 5 – Exercises for Severity Models

Posted on Updated on

This problem set has exercises to reinforce the various parametric continuous probability models discussed in the companion blog on actuarial modeling. Links are given below for the models involved.

This blog post in Topics in Actuarial Modeling has a catalog for continuous models.

 Practice Problem 5A Claim amounts for collision damages to insured cars are mutually independent random variables with common density probability density function $\displaystyle f(x)=\left\{ \begin{array}{ll} \displaystyle \frac{1}{1500} \ e^{-x/1500} &\ X > 0 \\ \text{ } & \text{ } \\ \displaystyle 0 &\ \text{otherwise} \end{array} \right.$ For three claims that are expected to be made, calculate the expected value of the largest of the three claims.
 Practice Problem 5B The lifetime of an electronic device is modeled using the random variable $Y=20 X^{3}$ where $X$ is an exponential random variable with mean 0.5. Determine the variance of $Y$.
 Practice Problem 5C The lifetime (in years) of an electronic device is $2X +Y$ where $X$ and $Y$ are independent exponentially distributed random variables with mean 3.5. Determine the probability density function of the lifetime of the electronic device.
 Practice Problem 5D The time (in years) until the failure of a machine that is brand new is modeled by a Weibull distribution with shape parameter 1.5 and scale parameter 4. Calculate the 95th percentile of times to failure of the machines that are 2-year old.
 Practice Problem 5E The size of a bodily injury claim for an auto insurance policy follows a Pareto Type II Lomax distribution with shape parameter 2.28 and scale parameter 200. Calculate the proportion of claims that are within one-fourth standard deviations of the mean claim size.
 Practice Problem 5F Suppose that the size of a claim has the following density function. $\displaystyle f(x)=\left\{ \begin{array}{ll} \displaystyle \frac{1}{2.5 \ x \ \sqrt{2 \pi}} \ e^{- z^2/2} &\ x > 0 \\ \text{ } & \text{ } \\ \displaystyle 0 &\ \text{otherwise} \end{array} \right.$ where $\displaystyle z=\frac{\text{ln}(x)-1.1}{2.5}$. A coverage pays claims subject to an ordinary deductible of 20. Determine the expected amount paid by the coverage per claim.
 Practice Problem 5G An actuary determines that sizes of claims from a large portfolio of insureds are exponentially distributed. For about 60% of the claims, the claim sizes are modeled by an exponential distribution with mean 1.2. For about 30% of the claims, the claim sizes are modeled by an exponential distribution with mean 2.8. For the remaining 10% of the claims, the claim sizes are considered high claim sizes and are modeled by an exponential distribution with mean 7.5. Determine the variance of the size of a claim that is randomly selected from this portfolio.
 Practice Problem 5H Losses are modeled by a loglosistic distribution with shape parameter $\gamma=2$ and scale parameter $\theta=10$. When a loss occurs, an insurance policy reimburses the loss in excess of a deductible of 5. Determine the 75th percentile of the insurance company reimbursements over all losses.
 Practice Problem 5I Losses are modeled by a loglosistic distribution with shape parameter $\gamma=2$ and scale parameter $\theta=10$. When a loss occurs, an insurance policy reimburses the loss in excess of a deductible of 5. Determine the 75th percentile of the insurance company reimbursements over all payments.
 Practice Problem 5J Claim sizes for a certain class of auto accidents are modeled by a uniform distribution on the interval $(0, 10)$. Five accidents are randomly selected. Determine the expected value of the median of the five accident claims.

$\text{ }$

$\text{ }$

$\text{ }$

$\text{ }$

Problem Links for the relevant distributions
5A Exponential distribution
5B Exponential distribution
5C Hypoexponential distribution
5D Weibull distribution
5E Pareto distribution
5F Lognormal distribution and limited expectation
5G Hyperexponential distribution
5H Loglogistic distribution
5I Loglogistic distribution
5J Order statistics

$\text{ }$

$\text{ }$

$\text{ }$

$\text{ }$

$\text{ }$

$\text{ }$

$\text{ }$

$\text{ }$

$\text{ }$

5A 2,750
5B 4,275
5C $\displaystyle \frac{2}{7} \ e^{-x/7}-\frac{1}{3.5} \ e^{-x/3.5}$
5D 8.954227
5E 0.4867
5F 61.106
5G 12.3459
5H 12.32
5I 15
5J 5

Daniel Ma Math

Daniel Ma Mathematics

Actuarial exam

$\copyright$ 2017 – Dan Ma

### Practice Problem Set 4 – Pareto Distribution

Posted on

The previous post is a discussion of the Pareto distribution as well as a side-by-side comparison of the two types of Pareto distribution. This post has several practice problems to reinforce the concepts in the previous post.

 Practice Problem 4A The random variable $X$ is an insurer’s annual hurricane-related loss. Suppose that the density function of $X$ is: $\displaystyle f(x)=\left\{ \begin{array}{ll} \displaystyle \frac{2.2 \ (250)^{2.2}}{x^{3.2}} &\ X > 250 \\ \text{ } & \text{ } \\ \displaystyle 0 &\ \text{otherwise} \end{array} \right.$ Calculate the inter-quartile range of annual hurricane-related loss. Note that the inter-quartile range of a random variable is the difference between the first quartile (25th percentile) and the third quartile (75th percentile).
 Practice Problem 4B Claim size for an auto insurance coverage follows a Pareto Type II Lomax distribution with mean 7.5 and variance 243.75. Determine the probability that a randomly selected claim will be greater than 10.
 Practice Problem 4C Losses follow a Pareto Type II distribution with shape parameter $\alpha>1$ and scale parameter $\theta$. The value of the mean excess loss function at $x=8$ is 32. The value of the mean excess loss function at $x=16$ is 48. Determine the value of the mean excess loss function at $x=32$.
 Practice Problem 4D For a large portfolio of insurance policies, the underlying distribution for losses in the current year has a Pareto Type II distribution with shape parameter $\alpha=2.9$ and scale parameter $\theta=12.5$. All losses in the next year are expected to increases by 5%. For the losses in the next year, determine the value-at-risk at the security level 95%.
 Practice Problem 4E (Continuation of 4D) For a large portfolio of insurance policies, the underlying distribution for losses in the current year has a Pareto Type II distribution with shape parameter $\alpha=2.9$ and scale parameter $\theta=12.5$. All losses in the next year are expected to increases by 5%. For the losses in the next year, determine the tail-value-at-risk at the security level 95%.
 Practice Problem 4F For a large portfolio of insurance policies, losses follow a Pareto Type II distribution with shape parameter $\alpha=3.5$ and scale parameter $\theta=5000$. An insurance policy covers losses subject to an ordinary deductible of 500. Given that a loss has occurred, determine the average amount paid by the insurer.
 Practice Problem 4G The claim severity for an auto liability insurance coverage is modeled by a Pareto Type I distribution with shape parameter $\alpha=2.5$ and scale parameter $\theta=1000$. The insurance coverage pays up to a limit of 1200 per claim. Determine the expected insurance payment under this coverage for one claim.
 Practice Problem 4H For an auto insurance company, liability losses follow a Pareto Type I distribution. Let $X$ be the random variable for these losses. Suppose that $\text{VaR}_{0.90}(X)=3162.28$ and $\text{VaR}_{0.95}(X)=4472.14$. Determine $\text{VaR}_{0.99}(X)$.
 Practice Problem 4I For a property and casualty insurance company, losses follow a mixture of two Pareto Type II distributions with equal weights, with the first Pareto distribution having parameters $\alpha=1$ and $\theta=500$ and the second Pareto distribution having parameters $\alpha=2$ and $\theta=500$. Determine the value-at-risk at the security level of 95%.
 Practice Problem 4J The claim severity for a line of property liability insurance is modeled as a mixture of two Pareto Type II distributions with the first distribution having $\alpha=1$ and $\theta=2500$ and the second distribution having $\alpha=2$ and $\theta=1250$. These two distributions have equal weights. Determine the limited expected value of claim severity at claim size 1000.

$\text{ }$

$\text{ }$

$\text{ }$

$\text{ }$

$\text{ }$

$\text{ }$

$\text{ }$

$\text{ }$

4A 184.54
4B 0.20681035
4C 80
4D 23.7499
4E 43.1577
4F 1575.97
4G 1159.51615
4H 10,000
4I 4,958.04
4J 698.3681

$\copyright$ 2017 – Dan Ma

### Pareto Type I versus Pareto Type II

Posted on Updated on

This post complements an earlier discussion of the Pareto distribution in a companion blog (found here). This post gives a side-by-side comparison of the Pareto type I distribution and Pareto type II Lomax distribution. We discuss the calculations of the mathematical properties shown in the comparison. Several of the properties in the comparison indicate that Pareto distributions (both Type I and Type II) are heavy tailed distributions. The properties presented in the comparison (and the thought processes behind them) are a good resource for studying actuarial exams.

The following table gives a side-by-side comparison for Pareto Type I and Pareto Type II.

$\displaystyle \begin{array}{llllllll} \text{ } &\text{ } &\text{ } & \text{Pareto Type I} & \text{ } & \text{ } & \text{Pareto Type II} & \text{ } \\ \text{ } & \text{ } & \text{ } & \text{ } & \text{ } \\ 1 &\text{PDF} &f(x) & \displaystyle \frac{\alpha \theta^\alpha}{x^{\alpha+1}} & x>\theta & \text{ } & \displaystyle \frac{\alpha \theta^\alpha}{(x+\theta)^{\alpha+1}} & x>0 \\ \text{ } & \text{ } & \text{ } & \text{ } & \text{ } \\ 2 &\text{CDF} &F(x) & \displaystyle 1-\biggl(\frac{\theta}{x}\biggr)^\alpha & x>\theta & \text{ } & \displaystyle 1-\biggl(\frac{\theta}{x+\theta}\biggr)^\alpha & x>0 \\ \text{ } & \text{ } & \text{ } & \text{ } & \text{ } \\ 3 &\text{Survival} &S(x) & \displaystyle \biggl(\frac{\theta}{x}\biggr)^\alpha & x>\theta & \text{ } & \displaystyle \biggl(\frac{\theta}{x+\theta}\biggr)^\alpha & x>0 \\ \text{ } & \text{ } & \text{ } & \text{ } & \text{ } \\ 4 & \text{Hazard Rate} &h(x) & \displaystyle \frac{\alpha}{x} & x>\theta & \text{ } & \displaystyle \frac{\alpha}{x+ \theta} & x>0 \\ \text{ } & \text{ } \\ 5 &\text{Moments} &E(X^k) & \displaystyle \frac{\alpha \theta^k}{\alpha-k} & k<\alpha & \text{ } & \displaystyle \frac{\theta^k \Gamma(k+1) \Gamma(\alpha-k)}{\Gamma(\alpha)} & -12 & \text{ } & \displaystyle \frac{\alpha \theta^2}{(\alpha-2) (\alpha-1)^2} &\alpha>2 \\ \text{ } & \text{ } \\ 7 & \text{Mean} &E(X-d \lvert X >d) & \displaystyle \frac{d}{\alpha-1} & \alpha>1 & \text{ } & \displaystyle \frac{d+\theta}{\alpha-1} &\alpha>1 \\ \text{ } & \text{Excess Loss} & \text{ } \\ \text{ } & \text{ } \\ \end{array}$$\displaystyle \begin{array}{llllllll} \text{ } &\text{.} & \text{ } & \text{ } & \text{ } & \text{ } & \text{ } &\text{ } \\ 8a &\text{Limited} &E[X \wedge d] & \text{ } & \text{ } & \text{ } & \displaystyle \frac{\theta}{\alpha-1} \biggl[1-\biggl(\frac{\theta}{d+\theta} \biggr)^{\alpha-1} \biggr] &\alpha \ne 1 \\ \text{ } & \text{Expectation} & \text{ } \\ \text{ } & \text{ } \\ 8b & \text{Limited} &E[X \wedge d] & \text{ } & \text{ } & \text{ } & \displaystyle -\theta \ \text{ln} \biggl(\frac{\theta}{d+\theta} \biggr) &\alpha = 1 \\ \text{ } & \text{Expectation} & \text{ } \\ \text{ } & \text{ } \\ 8c & \text{Limited} &E[(X \wedge d)^k] & \displaystyle \frac{\alpha \theta^k}{\alpha-k}-\frac{k \theta^\alpha}{(\alpha-k) d^{\alpha-k}} & \alpha>k & \text{ } & \text{See below} &\text{all } k \\ \text{ } & \text{Expectation} & \text{ } \\ \text{ } & \text{ } \\ 9 & \text{VaR} &VaR_p(X) & \displaystyle \frac{\theta}{(1-p)^{1/\alpha}} & \text{ } & \text{ } & \displaystyle \frac{\theta}{(1-p)^{1/\alpha}}-\theta &\text{ } \\ \text{ } & \text{ } \\ 10 &\text{TVaR} &TVaR_p(X) & \displaystyle \frac{\alpha}{\alpha-1} VaR_p(X) & \alpha>1 & \text{ } & \displaystyle \frac{\alpha}{\alpha-1} VaR_p(X)+\frac{\theta}{\alpha-1} &\alpha>1 \\ \text{ } & \text{ } \\ \end{array}$

One item that is not indicated in the table is $E[(X \wedge d)^k]$ for Pareto Type II, which is given below.

$\displaystyle E[(X \wedge d)^k]=\frac{\theta^k \Gamma(k+1) \Gamma(\alpha-k)}{\Gamma(\alpha)} \beta[k+1, \alpha-k; \frac{d}{d+\theta}]+d^k \biggl(\frac{\theta}{d+\theta} \biggr)^\alpha$

where $\beta(\text{ }, \text{ }; \text{ })$ is the incomplete beta function, which is defined as follows:

$\displaystyle \beta(a, b; x)=\frac{\Gamma(a+b)}{\Gamma(a) \Gamma(b)} \int_0^x t^{a-1} \ (1-t)^{b-1} \ dt$

for any $a>0$, $b>0$ and $0.

Discussion

The above table describes two distributions that are called Pareto (Type I and Type II Lomax). Each of them has two parameters – $\alpha$ (shape parameter) and $\theta$ (scale parameter). The support of Pareto Type I is the interval $\theta,\infty)$. In other words, Pareto type I distribution can only take on real numbers greater than the scale parameter $\theta$. On the other hand, the support of Pareto Type II is the interval $(0,\infty)$. So a Pareto Type II distribution can take on any positive real numbers.

The two distributions are mathematically related. Judging from the PDF, it is clear that the PDF of Pareto Type II is the result of shifting Type I PDF to the left by the magnitude of $\theta$ (the same can be said about the CDF and survival function). More specifically, let $X_1$ be a random variable that follows a Pareto Type I distribution with parameters $\alpha$ and $\theta$. Let $X_2=X_1-\theta$. It is straightforward to verify that $X_2$ has a Pareto Type II distribution, i.e. its CDF and other distributional quantities are the same as the ones shown in the above table under Pareto Type II. If having the same parameters, the two distributions are essentially the same, in that each one is the result of shifting the other one by the amount $\theta$.

A further indication that the two types are of the same distributional shape is that the variances are identical. Note that shifting a distribution to the left (or right) by a constant does not change the variance.

Since the two Pareto Types are the same distribution (except for the shifting), they share similar mathematical properties. For example, both distributions are heavy tailed distributions. In other words, they significantly put more probabilities on larger values. This point is discussed in more details below.

Calculation

First, the calculations. The moments are determined by the integral $\int_\theta^\infty x^k \ f(x) \ dx$ where $f(x)$ is the PDF of the distribution in question. Because of the PDF for Pareto Type I is easy to work with, almost all the items under Pareto Type I are quite accessible. For example, the item 8c for Pareto Type I is calculated by the following integral.

$\displaystyle E[(X \wedge d)^k]=\int_\theta^d x^k \ \frac{\alpha \theta^\alpha}{x^{\alpha+1}} \ dx+d^k \ \biggl(\frac{\theta}{x}\biggr)^\alpha$

In the remaining discussion, the focus is on Pareto Type II calculations.

The Pareto $k$th moment $E(X^k)$ is definition the integral $\int_0^\infty x^k f(x) \ dx$ where $f(x)$ is the Pareto Type II PDF. However, it is difficult to perform this integral. The best way to evaluate the moments in row 5 in the above table is to use the fact that Pareto Type II distribution is a mixture of exponential distributions with gamma mixing weight (see Example 2 here). Thus the moments of Pareto Type II can be obtained by integrating the conditional conditional $k$th moment of the exponential distribution with gamma weight. The following shows the calculation.

\displaystyle \begin{aligned} E(X^k)&=\int_0^\infty E(X^k \lvert \lambda) \ \frac{1}{\Gamma(\alpha)} \ \theta^\alpha \ \lambda^{\alpha-1} \ e^{-\theta \lambda} \ d \lambda \\&=\int_0^\infty \frac{\Gamma(k+1)}{\lambda^k} \ \frac{1}{\Gamma(\alpha)} \ \theta^\alpha \ \lambda^{\alpha-1} \ e^{-\theta \lambda} \ d \lambda \\&=\frac{\Gamma(k+1)}{\Gamma(\alpha)} \int_0^\infty \theta^\alpha \ \lambda^{\alpha-k-1} \ e^{-\theta \lambda} \ d \lambda \\&=\frac{\theta^k \ \Gamma(k+1) \Gamma(\alpha-k)}{\Gamma(\alpha)} \int_0^\infty \frac{1}{\Gamma(\alpha-k)} \ \theta^{\alpha-k} \ \lambda^{\alpha-k-1} \ e^{-\theta \lambda} \ d \lambda \\&=\frac{\theta^k \ \Gamma(k+1) \Gamma(\alpha-k)}{\Gamma(\alpha)} \end{aligned}

In the above derivation, the conditional $X \lvert \Lambda$ is assumed to have an exponential distribution with mean $\Lambda$. The random variable $\Lambda$ in turns has a gamma distribution with shape parameter $\alpha$ and rate parameter $\theta$. The integrand in the integral in the second to the last step is a gamma density, making the value of the integral 1.0. When $k$ is an integer, $E(X^k)$ can be simplified as indicated in row 5.

The next calculation is the mean excess loss. It is the conditional expected value $E(X-d \lvert X>d)$. If $X$ is an insurance loss and $d$ is some kind of threshold (e.g. the deductible in an insurance policy that covers this loss), then $E(X-d \lvert X>d)$ is the expected loss in excess of the threshold $d$ given that the loss exceeds $d$. If $X$ is the lifetime of an individual, then $E(X-d \lvert X>d)$ is the expected remaining lifetime given that the individual has survived to age $d$.

The expected value $E(X-d \lvert X>d)$ can be calculated by the integral $\frac{1}{S(d)} \int_d^\infty (x-d) \ f(x) \ dx$. This integral is not easy to evaluate when $f(x)$ is a Pareto Type II PDF. Fortunately, there is another way to handle this calculation. The key idea is that if $X$ has a Pareto Type II distribution with parameters $\alpha$ and $\theta$ (as described in the table), the conditional random variable $X-d \lvert X>d$ also has a Pareto Type II distribution, this time with parameters $\alpha$ and $d+\theta$. The mean of a Pareto Type II distribution is always the ratio of the scale parameter to the shape parameter less one. Thus the mean of $X-d \lvert X>d$ is as indicated in row 7 of the table.

The limited loss $X \wedge d$ is defined as follows.

$\displaystyle X \wedge d=\left\{ \begin{array}{ll} \displaystyle X &\ X \le d \\ \text{ } & \text{ } \\ \displaystyle d &\ X > d \end{array} \right.$

One interpretation is that it is the insurance payment when the insurance policy has an upper cap on benefit. If the loss is below the cap $d$, the insurance policy pays the loss in full. If the loss exceeds the cap $d$, the policy only pays for the loss up to the limit $d$. The expected insurance payment $E(X \wedge d)$ is said to be the limited expectation. For Pareto Type II, the first moment $E(X \wedge d)$ can be evaluated by the following integral.

$\displaystyle E(X \wedge d)=\int_0^d x \ \frac{\alpha \theta^\alpha}{(x+\theta)^{\alpha+1}} \ dx+d \ \biggl(\frac{\theta}{x+\theta}\biggr)^\alpha$

Integrating using a change of variable $u=x+\theta$ will yield the results in row 8a and row 8b in the table, i.e. the cases for $\alpha \ne 1$ and $\alpha=1$. A more interesting result is 8c, which is the $k$th moment of the variable $X \wedge d$. The integral for this expectation can expressed using the incomplete beta function. The following evaluates the $E[(X \wedge d)^k]$.

$\displaystyle E[(X \wedge d)^k]=\int_0^d x^k \ \frac{\alpha \theta^\alpha}{(x+\theta)^{\alpha+1}} \ dx +d^k \ \biggl(\frac{\theta}{x+\theta}\biggr)^\alpha$

Further transform the integral in the above calculation by the change of variable using $u=\frac{x}{x+\theta}$.

\displaystyle \begin{aligned} \int_0^d x^k \ \frac{\alpha \theta^\alpha}{(x+\theta)^{\alpha+1}} \ dx&=\int_0^{\frac{d}{d+\theta}} \alpha \theta^\alpha \ \biggl(\frac{u \theta}{1-u}\biggr)^k \ \biggl( \frac{\theta}{1-u}\biggr)^{-\alpha-1} \ \frac{\theta}{(1-u)^2} \ du\\&=\int_0^{\frac{d}{d+\theta}} \alpha \theta^k \ u^k \ (1-u)^{\alpha-k-1} \ du \\&=\frac{\theta^k \Gamma(k+1) \Gamma(\alpha-k)}{\Gamma(\alpha)} \int_0^{\frac{d}{d+\theta}} \frac{\Gamma(\alpha+1)}{\Gamma(k+1) \Gamma(\alpha-k)} \ u^{k+1-1} \ (1-u)^{\alpha-k-1} \ du \end{aligned}

The integrand in the last integral is the probability density function of the beta distribution with parameters $a=k+1$ and $b=\alpha-k$. Thus $E[(X \wedge d)^k]$ is as indicated in 8c.

Now we consider two risk measures – value-at-risk (VaR) and tail-value-at-risk (TVaR). The value-at-risk at security level $p$ for a random variable $X$ is, denoted by $VaR_p(X)$, the $(100p)$th percentile of $X$. Thus VaR is a fancy name for percentiles. Setting the Pareto Type II CDF equals to $p$ gives the VaR indicated in row 9 of the table. In other words, solving the following equation for $x$ gives the $(100p)$th percentile for Pareto Type II.

$\displaystyle 1-\biggl(\frac{\theta}{x+\theta}\biggr)^\alpha=p$

The tail-value-at-risk of a random variable $X$ at the security level $p$, denoted by $TVaR_p(X)$, is the expected value of $X$ given that it exceeds $VaR_p(X)$. Thus $TVaR_p(X)=E[X \lvert X>VaR_p(X)]$. Letting $\pi_p=\frac{\theta}{(1-p)^{1/\alpha}}-\theta$, the following integral gives the tail-value-at-risk for Pareto Type II. The integral is evaluated by the change of variable $u=x+\theta$.

$\displaystyle E[X \lvert X>VaR_p(X)]=\int_{\pi_p}^\infty x \ \frac{\alpha \theta^\alpha}{(x+\theta)^{\alpha+1}} \ x \ dx=\frac{\alpha}{\alpha-1} VaR_p(X)+\frac{\theta}{\alpha-1}$

Tail Weight

Several properties in the above table show that the Pareto distribution (both types) is a heavy-tailed distribution. When a distribution significantly puts more probabilities on larger values, the distribution is said to be a heavy tailed distribution (or said to have a larger tail weight). There are four ways to look for indication that a distribution is heavy tailed.

1. Existence of moments.
2. Hazard rate function.
3. Mean excess loss function.
4. Speed of decay of the survival function to zero.

Tail weight is a relative concept – distribution A has a heavier tail than distribution B. The first three points are ways to tell heavy tails without a reference distribution. Point number 4 is comparative.

Existence of moments
For a given random variable $Z$, the existence of all moments $E(Z^k)$, for all positive integers $k$, indicates a light (right) tail for the distribution of $Z$. The existence of positive moments exists only up to a certain value of a positive integer $k$ is an indication that the distribution has a heavy right tail.

Note that the existence of the Pareto higher moments $E(X^k)$ is capped by the shape parameter $\alpha$ (both Type I and Type II). Thus if $\alpha=3$, $E(X^k)$ only exists for $0. In particular, the Pareto Type II mean $E(X)=\frac{\theta}{\alpha-1}$ does not exist for $0<\alpha \le 1$. If the Pareto distribution is to model a random loss, and if the mean is infinite (when $\alpha=1$), the risk is uninsurable! On the other hand, when $\alpha \le 2$, the Pareto variance does not exist. This shows that for a heavy tailed distribution, the variance may not be a good measure of risk.

As compared with Pareto, the exponential distribution, the Gamma distribution, the Weibull distribution, and the lognormal distribution are considered to have light tails since all moments exist.

Hazard rate function
The hazard rate function $h(x)$ of a random variable $X$ is defined as the ratio of the density function and the survival function.

$\displaystyle h(x)=\frac{f(x)}{S(x)}$

The hazard rate is called the force of mortality in a life contingency context and can be interpreted as the rate that a person aged $x$ will die in the next instant. The hazard rate is called the failure rate in reliability theory and can be interpreted as the rate that a machine will fail at the next instant given that it has been functioning for $x$ units of time. It follows that the hazard rate of Pareto Type I is $h(x)=\alpha/x$ and is $h(x)=\alpha/(x+\theta)$ for Type II. They are both decreasing function of $x$.

Another indication of heavy tail weight is that the distribution has a decreasing hazard rate function. Thus the Pareto distribution (both types) is considered to be a heavy distribution based on its decreasing hazard rate function.

One key characteristic of hazard rate function is that it can generate the survival function.

$\displaystyle S(x)=e^{\displaystyle -\int_0^x h(t) \ dt}$

Thus if the hazard rate function is decreasing in $x$, then the survival function will decay more slowly to zero. To see this, let $H(x)=\int_0^x h(t) \ dt$, which is called the cumulative hazard rate function. As indicated above, the survival function can be generated by $e^{-H(x)}$. If $h(x)$ is decreasing in $x$, $H(x)$ is smaller than $H(x)$ where $h(x)$ is constant in $x$ or increasing in $x$. Consequently $e^{-H(x)}$ is decaying to zero much more slowly than $e^{-H(x)}$. Thus a decreasing hazard rate leads to a slower speed of decay to zero for the survival function (a point discussed below).

In contrast, the exponential distribution has a constant hazard rate function, making it a medium tailed distribution. As explained above, any distribution having an increasing hazard rate function is a light tailed distribution.

The mean excess loss function
Suppose that a property owner is exposed to a random loss $X$. The property owner buys an insurance policy with a deductible $d$ such that the insurer will pay a claim in the amount of $X-d$ if a loss occurs with $Y>d$. The insuerer will pay nothing if the loss is below the deductible. Whenever a loss is above $d$, what is the average claim the insurer will have to pay? This is one way to look at mean excess loss function, which represents the expected excess loss over a threshold conditional on the event that the threshold has been exceeded. Thus the mean excess loss function is $e_Y(d)=E(Y-d \lvert X>d)$, a function of the deductible $d$.

According to row 7 in the above table, the mean excess loss for Pareto Type I is $e(X)=d/(\alpha-1)$ and for Type II is $e(X)=(d+\theta)/(\alpha-1)$. They are both increasing functions of the deductible $d$! This means that the larger the deductible, the larger the expected claim if such a large loss occurs! If a random loss is modeled by such a distribution, it is a catastrophic risk situation.

In general, an increasing mean excess loss function is an indication of a heavy tailed distribution. On the other hand, a decreasing mean excess loss function indicates a light tailed distribution. The exponential distribution has a constant mean excess loss function and is considered a medium tailed distribution.

Speed of decay of the survival function to zero
The survival function $S(x)=P(X>x)$ captures the probability of the tail of a distribution. If a distribution whose survival function decays slowly to zero (equivalently the cdf goes slowly to one), it is another indication that the distribution is heavy tailed. This point is touched on when discussing hazard rate function.

The following is a comparison of a Pareto Type II survival function and an exponential survival function. The Pareto survival function has parameters ($\alpha=2$ and $\theta=2$). The two survival functions are set to have the same 75th percentile, which is $x=2$. The following table is a comparison of the two survival functions.

$\displaystyle \begin{array}{llllllll} \text{ } &x &\text{ } & \text{Pareto } S_X(x) & \text{ } & \text{Exponential } S_Y(x) & \text{ } & \displaystyle \frac{S_X(x)}{S_Y(x)} \\ \text{ } & \text{ } & \text{ } & \text{ } & \text{ } \\ \text{ } &2 &\text{ } & 0.25 & \text{ } & 0.25 & \text{ } & 1 \\ \text{ } &10 &\text{ } & 0.027777778 & \text{ } & 0.000976563 & \text{ } & 28 \\ \text{ } &20 &\text{ } & 0.008264463 & \text{ } & 9.54 \times 10^{-7} & \text{ } & 8666 \\ \text{ } &30 &\text{ } & 0.00390625 & \text{ } & 9.31 \times 10^{-10} & \text{ } & 4194304 \\ \text{ } &40 &\text{ } & 0.002267574 & \text{ } & 9.09 \times 10^{-13} & \text{ } & 2.49 \times 10^{9} \\ \text{ } &60 &\text{ } & 0.001040583 & \text{ } & 8.67 \times 10^{-19} & \text{ } & 1.20 \times 10^{15} \\ \text{ } &80 &\text{ } & 0.000594884 & \text{ } & 8.27 \times 10^{-25} & \text{ } & 7.19 \times 10^{20} \\ \text{ } &100 &\text{ } & 0.000384468 & \text{ } & 7.89 \times 10^{-31} & \text{ } & 4.87 \times 10^{26} \\ \text{ } &120 &\text{ } & 0.000268745 & \text{ } & 7.52 \times 10^{-37} & \text{ } & 3.57 \times 10^{32} \\ \text{ } &140 &\text{ } & 0.000198373 & \text{ } & 7.17 \times 10^{-43} & \text{ } & 2.76 \times 10^{38} \\ \text{ } &160 &\text{ } & 0.000152416 & \text{ } & 6.84 \times 10^{-49} & \text{ } & 2.23 \times 10^{44} \\ \text{ } &180 &\text{ } & 0.000120758 & \text{ } & 6.53 \times 10^{-55} & \text{ } & 1.85 \times 10^{50} \\ \text{ } & \text{ } \\ \end{array}$

Note that at the large values, the Pareto right tails retain much more probabilities. This is also confirmed by the ratio of the two survival functions, with the ratio approaching infinity. If a random loss is a heavy tailed phenomenon that is described by the above Pareto survival function ($\alpha=2$ and $\theta=2$), then the above exponential survival function is woefully inadequate as a model for this phenomenon even though it may be a good model for describing the loss up to the 75th percentile. It is the large right tail that is problematic (and catastrophic)!

Since the Pareto survival function and the exponential survival function have closed forms, We can also look at their ratio.

$\displaystyle \frac{\text{pareto survival}}{\text{exponential survival}}=\frac{\displaystyle \frac{\theta^\alpha}{(x+\theta)^\alpha}}{e^{-\lambda x}}=\frac{\theta^\alpha e^{\lambda x}}{(x+\theta)^\alpha} \longrightarrow \infty \ \text{ as } x \longrightarrow \infty$

In the above ratio, the numerator has an exponential function with a positive quantity in the exponent, while the denominator has a polynomial in $x$. This ratio goes to infinity as $x \rightarrow \infty$.

In general, whenever the ratio of two survival functions diverges to infinity, it is an indication that the distribution in the numerator of the ratio has a heavier tail. When the ratio goes to infinity, the survival function in the numerator is said to decay slowly to zero as compared to the denominator.

The Pareto distribution has many economic applications. Since it is a heavy tailed distribution, it is a good candidate for modeling income above a theoretical value and the distribution of insurance claims above a threshold value.

$\text{ }$

$\text{ }$

$\text{ }$

$\copyright$ 2017 – Dan Ma

### Integrating survival function to calculate the mean

Posted on Updated on

For a continuous random variable $X$ that are non-negative in values, the mean $E(X)$ is obtained by integrating $x f(x)$ from $0$ to $\infty$ where $f(x)$ is the probability density function of $X$. In some cases, this integral is hard (or impossible) to do. In these cases, it may be possible to find the mean and higher moments by integrating the survival functions.

Here’s the usual way to calculate moments.
$\displaystyle (1) \ \ \ \ E(X)=\int_0^\infty x f(x) \ dx$
$\text{ }$
$\displaystyle (2) \ \ \ \ E(X^k)=\int_0^\infty x^k f(x) \ dx$

Moments can be calculated by integrating the survival function.
$\displaystyle (3) \ \ \ \ E(X)=\int_0^\infty S(x) \ dx$
$\text{ }$
$\displaystyle (4) \ \ \ \ E(X^k)=\int_0^\infty k x^{k-1} S(x) \ dx$

In fact, the mean and moments of many of the distributions in the Exam C table are calculated using the survival function method, e.g. the Pareto distribution and the exponential distribution.

The integrals in (3) and (4) are derived from (1) and (2) by performing integration by parts. The derivation is not very hard. You will find out that some assumptions are needed to make the integration by parts work. Specifically, the assumption is that the following limit is 0.

$\displaystyle \lim_{x \rightarrow \infty} x^k \ S(x)= 0$

The integrals in (3) and (4) works only if this limit converges to zero. Essentially this limit is zero if the expectation $E(X^k)$ exists.

The same idea can be applied to a discrete distribution.

$\displaystyle (5) \ \ \ \ E(X)=\sum_{x=0}^\infty x P(X=x)$
$\text{ }$
$\displaystyle (6) \ \ \ \ E(X^k)=\sum_{x=0}^\infty x^k P(X=x)$

$\displaystyle (7) \ \ \ \ E(X)=\sum_{x=0}^\infty P(X>x)$
$\text{ }$
$\displaystyle (8) \ \ \ \ E(X^k)=\sum_{x=0}^\infty [(x+1)^k-x^k] P(X>x)$

$\text{ }$

$\text{ }$

Examples

Example 1
Suppose that the useful life (in months) of a device is modeled by the CDF $F(x)=1-(1- x/120)^{1/2}$ for $0 \le x \le 120$. Calculated the expected useful of a brand new device. Calculate the variance of the lifetime of such a device.

Note that the survival function is $S(x)=(1- x/120)^{1/2}$. The following shows the calculation.

$\displaystyle E(X)=\int_0^{120} (1- x/120)^{1/2} \ dx=80$

\displaystyle \begin{aligned} E(X^2)&=\int_0^{120} 2x \ (1- x/120)^{1/2} \ dx \\&=\int_1^{0} 2 \cdot 120(1-U) \ U^{1/2} (-120) \ dU \\&=\int_0^{1} 28800 \ (U^{1/2}-U^{3/2}) \ dU \\&=28800 \ \biggl[\frac{2}{3} U^{3/2}-\frac{2}{5} U^{5/2} \biggr]_0^1 \\&=7680 \end{aligned}

$Var(X)=E(X^2)-E(X)^2=7680-80^2=1280$

Note that the integral for $E(X^2)$ uses the method of substitution.

Example 2
Suppose that the lifetime of a machine follows a distribution with the following CDF.

$\displaystyle F(x)=1+e^{-x/6}-2 e^{-x/12} \ \ \ \ \ x>0$

Calculate the mean and variance of the lifetime.

The survival function is $\displaystyle S(x)=2 e^{-x/12}-e^{-x/6}$. The first is to calculate the first 2 moments.

\displaystyle \begin{aligned} E(X)&=\int_0^\infty (2 e^{-x/12}-e^{-x/6}) \ dx \\&=\int_0^\infty (2 \cdot 12 \frac{1}{12} e^{-x/12}-6 \ \frac{1}{6} e^{-x/6}) \ dx \\&=24 \int_0^\infty \frac{1}{12} e^{-x/12} \ dx- 6 \int_0^\infty \frac{1}{6} e^{-x/6} \ dx=24-6=18 \end{aligned}

\displaystyle \begin{aligned} E(X^2)&=\int_0^\infty 2x (2 e^{-x/12}-e^{-x/6}) \ dx \\&=\int_0^\infty (4x e^{-x/12}-2x e^{-x/6}) \ dx \\&=\int_0^\infty (4 \cdot 12 \ x \frac{1}{12} e^{-x/12}-2 \cdot 6 \ x \ \frac{1}{6} e^{-x/6}) \ dx \\&=48 \int_0^\infty x \frac{1}{12} e^{-x/12} \ dx -12 \int_0^\infty x \ \frac{1}{6} e^{-x/6} \ dx=48(12)-12(6)=504 \end{aligned}

$Var(X)=E(X^2)-E(X)^2=504-18^2=180$

Note that the integral for $E(X)$ is manipulated to be in terms of integrals of exponential density functions while the integral for $E(X^2)$ is manipulated to be in terms of integrals of $x$ times exponential density functions (each becoming the mean of that exponential distribution).

Example 3
The two-parameter Pareto distribution (Type II Lomax) is discussed in here in a companion blog. Its survival function is

$\displaystyle S(x)=\biggl(\frac{\theta}{x+\theta} \biggr)^\alpha \ \ \ \ \ x>0$

Derive the mean of the Pareto distribution using the survival function approach. What assumption is made on the shape parameter $\alpha$?

\displaystyle \begin{aligned} E(X)&=\int_0^\infty \biggl(\frac{\theta}{x+\theta} \biggr)^\alpha \ dx \\&=\int_0^\infty \theta^\alpha (x+\theta)^{- \alpha} \ dx \\&=\frac{\theta^\alpha}{-\alpha+1} \ (x+\theta)^{-\alpha+1} \biggr|_0^\infty \\&=\frac{\theta^\alpha}{-\alpha+1} \ \frac{1}{(x+\theta)^{\alpha-1}} \biggr|_0^\infty =\frac{\theta}{\alpha-1} \end{aligned}

In order for the integral to converge, we need to assume $\alpha>1$.

The next two examples are left as exercises.

Practice Problems

Practice Problem 1
The survival function for the distribution of a lifetime of a type of electronic devices is $\displaystyle S(x)=\frac{1}{40} \ e^{-x/10}+\frac{1}{20} \ e^{-x/15} \ \ \ \ \ x>0$.

Calculate the mean and variance of the lifetime of such devices.

Practice Problem 2
The probability that the size of a randomly selected auto collision loss is greater than $x$ is $\displaystyle S(x)=\biggl(1-\frac{x}{10} \biggr)^6 \ \ \ \ 0.

Calculate the mean and variance of the loss size.

$\text{ }$

$\text{ }$

$\text{ }$

$\text{ }$

$\text{ }$

$\text{ }$

Practice Problem 1

$\displaystyle E(X)=1$

$\displaystyle Var(X)=\frac{53}{2}=26.5$

Practice Problem 2

$\displaystyle E(X)=\frac{10}{7}$

$\displaystyle Var(X)=\frac{75}{49}$

$\text{ }$

$\text{ }$

$\text{ }$

$\copyright$ 2017 – Dan Ma

### Practice Problem Set 3 – basic lognormal problems

Posted on Updated on

This post has several practice problems to go with this previous discussion on lognormal distribution.

 Practice Problem 3A The amount of annual losses from an insured follows a lognormal distribution with parameters $\mu$ and $\sigma$ = 0.6 and with mode = 2.5. Calculate the mean annual loss for a randomly selected insured.
 Practice Problem 3B Claim size for an auto insurance coverage follows a lognormal distribution with mean 149.157 and variance 223.5945. Determine the probability that a randomly selected claim will be greater than 170.
 Practice Problem 3C For x-ray machines produced by a certain manufacturer, the following is known. Lifetime in years follows a lognormal distribution with $\mu$ = 0.9 and $\sigma$. The expected lifetime of such machines is 15 years. Calculate the probability that an x-ray machine produced by this manufacturer will last at least 12 years.
 Practice Problem 3D Claim sizes expressed in US dollars follow a lognormal distribution with parameters $\mu$ = 5 and $\sigma$ = 0.25. One Canadian dollar is currently worth \$0.75 US dollars. Calculate the 75th percentile of a claim in Canadian dollars.
 Practice Problem 3E For a commercial fire coverage, the size of a loss follows a lognormal distribution with parameters $\mu$ = 2.75 and $\sigma$ = 0.75. Determine $y-x$ where $y$ is the 75th percentile of a loss and $x$ is the 25th percentile of a loss. Note that $y-x$ is known as the interquartile range.
 Practice Problem 3F Claim sizes in the current year follow a lognormal distribution with $\mu$ = 4.75 and $\sigma$ = 0.25. In the next year, all claims are expected to be inflated uniformly by 25%. One claim is expected in the next year for an insured. Determine $y-x$ where $y$ is the 80th percentile of the size of this claim and $x$ is the 40th percentile of the size of this claim.
 Practice Problem 3G In the current year, losses follow a lognormal distribution with $\mu$ = 1.6 and $\sigma$ = 1.35. In the next year, inflation of 20% will impact all losses uniformly. Determine the median of the portion of next year’s loss distribution that is above 10.
 Practice Problem 3H Losses follow a lognormal distribution with mean 17 and variance 219. Determine the skewness of the loss distribution.
 Practice Problem 3I Losses from an insurance coverage follow a lognormal distribution with parameters $\mu$ and $\sigma$ = 2. The 80th percentile of the losses is 5884. Determine the probability that a loss is less than 5000.
 Practice Problem 3J Losses from an insurance coverage follow a lognormal distribution. The 25th percentile of the losses is 133.62. The 75th percentile of the losses is 997.25. Determine the mean of the losses.

All normal probabilities are obtained by using the normal distribution table found here.

$\text{ }$

$\text{ }$

$\text{ }$

$\text{ }$

$\text{ }$

$\text{ }$

$\text{ }$

$\text{ }$

3A 4.29
3B 0.0869
3C 0.2033
3D 233.9675
3E 16.39085
3F 42.5155
3G 21.143268
3H 3.271185
3I 0.7764
3J 1124.394559

______________________________________________________________________________________________________________________________
$\copyright$ 2017 – Dan Ma