# Pareto Distribution

### Calculating the variance of insurance payment

Posted on Updated on

The post supplements a three-part discussion on the mathematical models of insurance payments: part 1, part 2 and part 3. This post focuses on the calculation of the variance of insurance payments.

There are three practice problem sets for the 3-part discussion on the mathematical models of insurance payments – problem set 7, problem set 8 and problem set 9. Problems in these problem sets are on calculation of expected payments. We present several examples in this post on variance of insurance payment. A practice problem set will soon follow.

In contrast, the next post is a discussion on the insurance payment per payment.

Coverage with an Ordinary Deductible

To simplify the calculation, the only limit on benefits is the imposition of a deductible. Suppose that the loss amount is the random variable $X$. The deductible is $d$. Given that a loss has occurred, the insurance policy pays nothing if the loss is below $d$ and pays $X-d$ if the loss exceeds $d$. The payment random variable is denoted by $Y_L$ or $(X-d)_+$ and is explicitly described as follows:

(1)……$\displaystyle Y_L=(X-d)_+=\left\{ \begin{array}{ll} \displaystyle 0 &\ X \le d \\ \text{ } & \text{ } \\ \displaystyle X-d &\ X > d \end{array} \right.$

The subscript L in $Y_L$ is to denote that this variable is the payment per loss. This means that its mean, $E(Y_L)$, is the average payment over all losses. A related payment variable is $Y_P$ which is defined as follows:

(2)……$\displaystyle Y_P=X-d \ \lvert X > d$

The variable $Y_P$ is a truncated variable (any loss that is less than the deductible is not considered) and is also shifted (the payment is the loss less the deductible). As a result, $Y_P$ is a conditional distribution. It is conditional on the loss exceeding the deductible. The subscript P in $Y_P$ indicates that the payment variable is the payment per payment. This means that its mean, $E(Y_P)$, is the average payment over all payments that are made, i.e. average payment over all losses that are eligible for a claim payment.

The focus of this post is on the calculation of $E(Y_L)$ (the average payment over all losses) and $Var(Y_L)$ (the variance of payment per loss). These two quantities are important in the actuarial pricing of insurance. If the policy were to pay each loss in full, the average amount paid would be $E(X)$, the average of the loss distribution. Imposing a deductible, the average amount paid is $E(Y_L)$, which is less than $E(X)$. On the other hand, $Var(Y_L)$, the variance of the payment per loss, is smaller than $Var(X)$, the variance of the loss distribution. Thus imposing a deductible not only reduces the amount paid by the insurer, it also reduces the variability of the amount paid.

The calculation of $E(Y_L)$ and $Var(Y_L)$ can be done by using the pdf $f(x)$ of the original loss random variable $X$.

(3)……$\displaystyle E(Y_L)=\int_d^\infty (x-d) \ f(x) \ dx$

(4)……$\displaystyle E(Y_L^2)=\int_d^\infty (x-d)^2 \ f(x) \ dx$

(5)……$\displaystyle Var(Y_L)=E(Y_L^2)-E(Y_L)^2$

The above calculation assumes that the loss $X$ is a continuous random variable. If the loss is discrete, simply replace integrals by summation. The calculation in (3) and (4) can also be done by integrating the pdf of the payment variable $Y_L$.

(6)……$\displaystyle f_{Y_L}(y)=\left\{ \begin{array}{ll} \displaystyle 0 &\ y=0 \\ \text{ } & \text{ } \\ \displaystyle f(y+d) &\ y > 0 \end{array} \right.$

(7)……$\displaystyle E(Y_L)=\int_0^\infty y \ f_{Y_L}(y) \ dy$

(8)……$\displaystyle E(Y_L^2)=\int_0^\infty y^2 \ f_{Y_L}(y) \ dy$

It will be helpful to also consider the pdf of the payment per payment variable $Y_P$.

(9)……$\displaystyle f_{Y_P}(y)=\frac{f(y+d)}{P[X > d]} \ \ \ \ \ \ \ y>0$

Three Approaches

We show that there are three different ways to calculate $E(Y_L)$ and $Var(Y_L)$.

1. Using basic principle.
2. Considering $Y_L$ as a mixture.
3. Considering $Y_L$ as a compound distribution.

Using basic principle refers to using (3) and (4) or (7) and (8). The second approach is to treat $Y_L$ as a mixture of a point mass of 0 with weight $P(X \le d)$ and the payment per payment $Y_P$ with weight $P(X >d)$. The third approach is to treat $Y_L$ as a compound distribution where the number of claims $N$ is a Bernoulli distribution with $p=P(X >d)$ and the severity is the payment $Y_P$. We demonstrate these approaches with a series of examples.

Examples

Example 1
The random loss $X$ has an exponential distribution with mean 50. A coverage with a deductible of 25 is purchased to cover this loss. Calculate the mean and variance of the insurance payment per loss.

We demonstrate the calculation using the three approaches discussed above. The following gives the calculation based on basic principles.

\displaystyle \begin{aligned} E(Y_L)&=\int_{25}^\infty (x-25) \ \frac{1}{50} \ e^{-x/50} \ dx \\&=\int_{0}^\infty \frac{1}{50} \ u \ e^{-u/50} \ e^{-1/2} \ du \\&=50 \ e^{-1/2} \int_{0}^\infty \frac{1}{50^2} \ u \ e^{-u/50} \ du \\&=50 \ e^{-1/2}=30.33 \end{aligned}

\displaystyle \begin{aligned} E(Y_L^2)&=\int_{25}^\infty (x-25)^2 \ \frac{1}{50} \ e^{-x/50} \ dx \\&=\int_{0}^\infty \frac{1}{50} \ u^2 \ e^{-u/50} \ e^{-1/2} \ du \\&=2 \cdot 50^2 \ e^{-1/2} \int_{0}^\infty \frac{1}{2} \ \frac{1}{50^3} \ u^2 \ e^{-u/50} \ du \\&=2 \cdot 50^2 \ e^{-1/2} \end{aligned}

$\displaystyle Var(Y_L)=2 \cdot 50^2 \ e^{-1/2}-\biggl( 50 \ e^{-1/2} \biggr)^2=2112.954696$

In the above calculation, we perform a change of variable via $u=x-25$. We now do the second approach. Note that the variable $Y_P=X-25 \lvert X >25$ also has an exponential distribution with mean 50 (this is due to the memoryless property of the exponential distribution). The point mass of 0 has weight $P(X \le 25)=1-e^{-1/2}$ and the variable $Y_P$ has weight $P(X > 25)=e^{-1/2}$.

\displaystyle \begin{aligned} E(Y_L)&=0 \cdot (1-e^{-1/2})+E(Y_P) \cdot e^{-1/2}=50 \ \cdot e^{-1/2} \end{aligned}

\displaystyle \begin{aligned} E(Y_L^2)&=0 \cdot (1-e^{-1/2})+E(Y_P^2) \cdot e^{-1/2} \\&=(50^2+50^2) \cdot e^{-1/2} =2 \ 50^2 \ \cdot e^{-1/2} \end{aligned}

$\displaystyle Var(Y_L)=2 \cdot 50^2 \ e^{-1/2}-\biggl( 50 \ e^{-1/2} \biggr)^2=2112.954696$

In the third approach, the frequency variable $N$ is Bernoulli with $P(N=0)=1-e^{-1/2}$ and $P(N=1)=e^{-1/2}$. The severity variable is $Y_P$. The following calculates the compound variance.

\displaystyle \begin{aligned} Var(Y_L)&=E(N) \cdot Var(Y_P)+Var(N) \cdot E(Y_P)^2 \\&=e^{-1/2} \cdot 50^2+e^{-1/2} (1-e^{-1/2}) \cdot 50^2 \\&=2 \cdot 50^2 \ e^{-1/2}-50^2 \ e^{-1} \\&=2112.954696 \end{aligned}

Note that the average payment per loss is $E(Y_L)=30.33$, a substantial reduction from the mean $E[X]=50$ if the policy pays each loss in full. The standard deviation of $Y_L$ is $\sqrt{2112.954696}=45.97$, which is a reduction from 50, the standard deviation of original loss distribution. Clearly, imposing a deductible (or other limits on benefits) has the effect of reducing risk for the insurer.

When the loss distribution is exponential, approach 2 and approach 3 are quite easy to implement. This is because the payment per payment variable $Y_P$ has the same distribution as the original loss distribution. This happens only in this case. If the loss distribution is any other distribution, we must determine the distribution of $Y_P$ before carrying out the second or the third approach.

We now work two more examples that are not exponential distributions.

Example 2
The loss distribution is a uniform distribution on the interval $(0,100)$. The insurance coverage has a deductible of 20. Calculate the mean and variance of the payment per loss.

The following gives the basic calculation.

\displaystyle \begin{aligned} E(Y_L)&=\int_{20}^{100} (x-20) \ \frac{1}{100} \ dx \\&=\int_0^{80} \frac{1}{100} \ u \ du =32 \end{aligned}

\displaystyle \begin{aligned} E(Y_L^2)&=\int_{20}^{100} (x-20)^2 \ \frac{1}{100} \ dx \\&=\int_0^{80} \frac{1}{100} \ u^2 \ du =\frac{5120}{3} \end{aligned}

$\displaystyle Var(Y_L)=\frac{5120}{3}-32^2=\frac{2048}{3}=682.67$

The mean and variance of the loss distribution are 50 and $\frac{100^2}{12}=833.33$ (if the coverage pays for each loss in full). By imposing a deductible of 20, the mean payment per loss is 32 and the variance of payment per loss is 682.67. The effect is a reduction of risk since part of the risk is shifted to the policyholder.

We now perform the calculation using the the other two approaches. Note that the payment per payment $Y_P=X-20 \lvert X > 20$ has a uniform distribution on the interval $(0,80)$. The following calculates according to the second approach.

\displaystyle \begin{aligned} E(Y_L)&=0 \cdot (0.2)+E[Y_P] \cdot 0.8=40 \ \cdot 0.8=32 \end{aligned}

\displaystyle \begin{aligned} E(Y_L^2)&=0 \cdot (0.2)+E[Y_P^2] \cdot 0.8=\biggl(\frac{80^2}{12}+40^2 \biggr) \ \cdot 0.8=\frac{5120}{3} \end{aligned}

$\displaystyle Var(Y_L)=\frac{5120}{3}-32^2=\frac{2048}{3}=682.67$

For the third approach, the frequency $N$ is a Bernoulli variable with $p=0.8$ and the severity variable is $Y_P$, which is uniform on $(0,80)$.

\displaystyle \begin{aligned} Var(Y_L)&=E(N) \cdot Var(Y_P)+Var(N) \cdot E(Y_P)^2 \\&=0.8 \cdot \frac{80^2}{12} +0.8 \cdot 0.2 \cdot 40^2 \\&=\frac{2048}{3} \\&=682.67 \end{aligned}

Example 3
In this example, the loss distribution is a Pareto distribution with parameters $\alpha=3$ and $\theta=1000$. The deductible of the coverage is 500. Calculate the mean and variance of the payment per loss.

Note that the payment per payment $Y_P=X-500 \lvert X > 500$ also has a Pareto distribution with parameters $\alpha=3$ and $\theta=1500$. This information is useful for implementing the second and the third approach. First the calculation based on basic principles.

\displaystyle \begin{aligned} E(Y_L)&=\int_{500}^{\infty} (x-500) \ \frac{3 \cdot 1000^3}{(x+1000)^4} \ dx \\&=\int_{0}^{\infty} u \ \frac{3 \cdot 1000^3}{(u+1500)^4} \ du \\&=\frac{1000^3}{1500^3} \ \int_{0}^{\infty} u \ \frac{3 \cdot 1500^3}{(u+1500)^4} \ du\\&=\frac{8}{27} \ \frac{1500}{2}\\&=\frac{2000}{9}=222.22 \end{aligned}

\displaystyle \begin{aligned} E(Y_L^2)&=\int_{500}^{\infty} (x-500)^2 \ \frac{3 \cdot 1000^3}{(x+1000)^4} \ dx \\&=\int_{0}^{\infty} u^2 \ \frac{3 \cdot 1000^3}{(u+1500)^4} \ du \\&=\frac{1000^3}{1500^3} \ \int_{0}^{\infty} u^2 \ \frac{3 \cdot 1500^3}{(u+1500)^4} \ du\\&=\frac{8}{27} \ \frac{2 \cdot 1500^2}{2 \cdot 1}\\&=\frac{2000000}{3} \end{aligned}

$\displaystyle Var(Y_L)=\frac{2000000}{3}-\biggl(\frac{2000}{9} \biggr)^2=\frac{50000000}{81}=617283.95$

Now, the mixture approach (the second approach). Note that $P(X > 500)=\frac{8}{27}$.

\displaystyle \begin{aligned} E(Y_L)&=0 \cdot \biggl(1-\frac{8}{27} \biggr)+E(Y_P) \cdot \frac{8}{27}=\frac{1500}{2} \ \cdot \frac{8}{27}=\frac{2000}{9} \end{aligned}

\displaystyle \begin{aligned} E(Y_L^2)&=0 \cdot \biggl(1-\frac{8}{27} \biggr)+E(Y_P^2) \cdot \frac{8}{27}=\frac{2 \cdot 1500^2}{2 \cdot 1} \ \cdot \frac{8}{27}=\frac{2000000}{3} \end{aligned}

$\displaystyle Var(Y_L)=\frac{2000000}{3}-\biggl(\frac{2000}{9} \biggr)^2=\frac{50000000}{81}=617283.95$

Now the third approach, which is to calculate the compound variance.

\displaystyle \begin{aligned} Var(Y_L)&=E(N) \cdot Var(Y_P)+Var(N) \cdot E(Y_P)^2 \\&=\frac{8}{27} \cdot 1687500 +\frac{8}{27} \cdot \biggl(1-\frac{8}{27} \biggr) \cdot 750^2 \\&=\frac{50000000}{81} \\&=617283.95 \end{aligned}

Remarks

For some loss distributions, the calculation of the variance of $Y_L$, the payment per loss, can be difficult mathematically. The required integrals for the first approach may not have closed form. For the second and third approach to work, we need to have a handle on the payment per payment $Y_P$. In many cases, the pdf of $Y_P$ is not easy to obtain or its mean and variance are hard to come by (or even do not exist). For these examples, we may have to find the variance numerically. The examples presented are some of the distributions that are tractable mathematically for all three approaches. These three examples are such that the second and third approaches represent shortcuts for find variance of $Y_L$ because $Y_P$ have a known form and requires minimal extra calculation. For other cases, it is possible that the second or third approach is doable but is not shortcut. In that case, any one of the approaches can be used.

In contrast, the next post is a discussion on the insurance payment per payment.

actuarial practice problems

Daniel Ma actuarial

Dan Ma actuarial

Dan Ma actuarial science

Daniel Ma actuarial science

Daniel Ma Math

Daniel Ma Mathematics

Actuarial exam

Dan Ma actuary

Daniel Ma actuary

$\copyright$ 2019 – Dan Ma

### Practice Problem Set 4 – Pareto Distribution

Posted on

The previous post is a discussion of the Pareto distribution as well as a side-by-side comparison of the two types of Pareto distribution. This post has several practice problems to reinforce the concepts in the previous post.

 Practice Problem 4A The random variable $X$ is an insurer’s annual hurricane-related loss. Suppose that the density function of $X$ is: $\displaystyle f(x)=\left\{ \begin{array}{ll} \displaystyle \frac{2.2 \ (250)^{2.2}}{x^{3.2}} &\ X > 250 \\ \text{ } & \text{ } \\ \displaystyle 0 &\ \text{otherwise} \end{array} \right.$ Calculate the inter-quartile range of annual hurricane-related loss. Note that the inter-quartile range of a random variable is the difference between the first quartile (25th percentile) and the third quartile (75th percentile).
 Practice Problem 4B Claim size for an auto insurance coverage follows a Pareto Type II Lomax distribution with mean 7.5 and variance 243.75. Determine the probability that a randomly selected claim will be greater than 10.
 Practice Problem 4C Losses follow a Pareto Type II distribution with shape parameter $\alpha>1$ and scale parameter $\theta$. The value of the mean excess loss function at $x=8$ is 32. The value of the mean excess loss function at $x=16$ is 48. Determine the value of the mean excess loss function at $x=32$.
 Practice Problem 4D For a large portfolio of insurance policies, the underlying distribution for losses in the current year has a Pareto Type II distribution with shape parameter $\alpha=2.9$ and scale parameter $\theta=12.5$. All losses in the next year are expected to increases by 5%. For the losses in the next year, determine the value-at-risk at the security level 95%.
 Practice Problem 4E (Continuation of 4D) For a large portfolio of insurance policies, the underlying distribution for losses in the current year has a Pareto Type II distribution with shape parameter $\alpha=2.9$ and scale parameter $\theta=12.5$. All losses in the next year are expected to increases by 5%. For the losses in the next year, determine the tail-value-at-risk at the security level 95%.
 Practice Problem 4F For a large portfolio of insurance policies, losses follow a Pareto Type II distribution with shape parameter $\alpha=3.5$ and scale parameter $\theta=5000$. An insurance policy covers losses subject to an ordinary deductible of 500. Given that a loss has occurred, determine the average amount paid by the insurer.
 Practice Problem 4G The claim severity for an auto liability insurance coverage is modeled by a Pareto Type I distribution with shape parameter $\alpha=2.5$ and scale parameter $\theta=1000$. The insurance coverage pays up to a limit of 1200 per claim. Determine the expected insurance payment under this coverage for one claim.
 Practice Problem 4H For an auto insurance company, liability losses follow a Pareto Type I distribution. Let $X$ be the random variable for these losses. Suppose that $\text{VaR}_{0.90}(X)=3162.28$ and $\text{VaR}_{0.95}(X)=4472.14$. Determine $\text{VaR}_{0.99}(X)$.
 Practice Problem 4I For a property and casualty insurance company, losses follow a mixture of two Pareto Type II distributions with equal weights, with the first Pareto distribution having parameters $\alpha=1$ and $\theta=500$ and the second Pareto distribution having parameters $\alpha=2$ and $\theta=500$. Determine the value-at-risk at the security level of 95%.
 Practice Problem 4J The claim severity for a line of property liability insurance is modeled as a mixture of two Pareto Type II distributions with the first distribution having $\alpha=1$ and $\theta=2500$ and the second distribution having $\alpha=2$ and $\theta=1250$. These two distributions have equal weights. Determine the limited expected value of claim severity at claim size 1000.

$\text{ }$

$\text{ }$

$\text{ }$

$\text{ }$

$\text{ }$

$\text{ }$

$\text{ }$

$\text{ }$

4A 184.54
4B 0.20681035
4C 80
4D 23.7499
4E 43.1577
4F 1575.97
4G 1159.51615
4H 10,000
4I 4,958.04
4J 698.3681

$\copyright$ 2017 – Dan Ma

### Pareto Type I versus Pareto Type II

Posted on Updated on

This post complements an earlier discussion of the Pareto distribution in a companion blog (found here). This post gives a side-by-side comparison of the Pareto type I distribution and Pareto type II Lomax distribution. We discuss the calculations of the mathematical properties shown in the comparison. Several of the properties in the comparison indicate that Pareto distributions (both Type I and Type II) are heavy tailed distributions. The properties presented in the comparison (and the thought processes behind them) are a good resource for studying actuarial exams.

The following table gives a side-by-side comparison for Pareto Type I and Pareto Type II.

$\displaystyle \begin{array}{llllllll} \text{ } &\text{ } &\text{ } & \text{Pareto Type I} & \text{ } & \text{ } & \text{Pareto Type II} & \text{ } \\ \text{ } & \text{ } & \text{ } & \text{ } & \text{ } \\ 1 &\text{PDF} &f(x) & \displaystyle \frac{\alpha \theta^\alpha}{x^{\alpha+1}} & x>\theta & \text{ } & \displaystyle \frac{\alpha \theta^\alpha}{(x+\theta)^{\alpha+1}} & x>0 \\ \text{ } & \text{ } & \text{ } & \text{ } & \text{ } \\ 2 &\text{CDF} &F(x) & \displaystyle 1-\biggl(\frac{\theta}{x}\biggr)^\alpha & x>\theta & \text{ } & \displaystyle 1-\biggl(\frac{\theta}{x+\theta}\biggr)^\alpha & x>0 \\ \text{ } & \text{ } & \text{ } & \text{ } & \text{ } \\ 3 &\text{Survival} &S(x) & \displaystyle \biggl(\frac{\theta}{x}\biggr)^\alpha & x>\theta & \text{ } & \displaystyle \biggl(\frac{\theta}{x+\theta}\biggr)^\alpha & x>0 \\ \text{ } & \text{ } & \text{ } & \text{ } & \text{ } \\ 4 & \text{Hazard Rate} &h(x) & \displaystyle \frac{\alpha}{x} & x>\theta & \text{ } & \displaystyle \frac{\alpha}{x+ \theta} & x>0 \\ \text{ } & \text{ } \\ 5 &\text{Moments} &E(X^k) & \displaystyle \frac{\alpha \theta^k}{\alpha-k} & k<\alpha & \text{ } & \displaystyle \frac{\theta^k \Gamma(k+1) \Gamma(\alpha-k)}{\Gamma(\alpha)} & -12 & \text{ } & \displaystyle \frac{\alpha \theta^2}{(\alpha-2) (\alpha-1)^2} &\alpha>2 \\ \text{ } & \text{ } \\ 7 & \text{Mean} &E(X-d \lvert X >d) & \displaystyle \frac{d}{\alpha-1} & \alpha>1 & \text{ } & \displaystyle \frac{d+\theta}{\alpha-1} &\alpha>1 \\ \text{ } & \text{Excess Loss} & \text{ } \\ \text{ } & \text{ } \\ \end{array}$$\displaystyle \begin{array}{llllllll} \text{ } &\text{.} & \text{ } & \text{ } & \text{ } & \text{ } & \text{ } &\text{ } \\ 8a &\text{Limited} &E[X \wedge d] & \text{ } & \text{ } & \text{ } & \displaystyle \frac{\theta}{\alpha-1} \biggl[1-\biggl(\frac{\theta}{d+\theta} \biggr)^{\alpha-1} \biggr] &\alpha \ne 1 \\ \text{ } & \text{Expectation} & \text{ } \\ \text{ } & \text{ } \\ 8b & \text{Limited} &E[X \wedge d] & \text{ } & \text{ } & \text{ } & \displaystyle -\theta \ \text{ln} \biggl(\frac{\theta}{d+\theta} \biggr) &\alpha = 1 \\ \text{ } & \text{Expectation} & \text{ } \\ \text{ } & \text{ } \\ 8c & \text{Limited} &E[(X \wedge d)^k] & \displaystyle \frac{\alpha \theta^k}{\alpha-k}-\frac{k \theta^\alpha}{(\alpha-k) d^{\alpha-k}} & \alpha>k & \text{ } & \text{See below} &\text{all } k \\ \text{ } & \text{Expectation} & \text{ } \\ \text{ } & \text{ } \\ 9 & \text{VaR} &VaR_p(X) & \displaystyle \frac{\theta}{(1-p)^{1/\alpha}} & \text{ } & \text{ } & \displaystyle \frac{\theta}{(1-p)^{1/\alpha}}-\theta &\text{ } \\ \text{ } & \text{ } \\ 10 &\text{TVaR} &TVaR_p(X) & \displaystyle \frac{\alpha}{\alpha-1} VaR_p(X) & \alpha>1 & \text{ } & \displaystyle \frac{\alpha}{\alpha-1} VaR_p(X)+\frac{\theta}{\alpha-1} &\alpha>1 \\ \text{ } & \text{ } \\ \end{array}$

One item that is not indicated in the table is $E[(X \wedge d)^k]$ for Pareto Type II, which is given below.

$\displaystyle E[(X \wedge d)^k]=\frac{\theta^k \Gamma(k+1) \Gamma(\alpha-k)}{\Gamma(\alpha)} \beta[k+1, \alpha-k; \frac{d}{d+\theta}]+d^k \biggl(\frac{\theta}{d+\theta} \biggr)^\alpha$

where $\beta(\text{ }, \text{ }; \text{ })$ is the incomplete beta function, which is defined as follows:

$\displaystyle \beta(a, b; x)=\frac{\Gamma(a+b)}{\Gamma(a) \Gamma(b)} \int_0^x t^{a-1} \ (1-t)^{b-1} \ dt$

for any $a>0$, $b>0$ and $0.

Discussion

The above table describes two distributions that are called Pareto (Type I and Type II Lomax). Each of them has two parameters – $\alpha$ (shape parameter) and $\theta$ (scale parameter). The support of Pareto Type I is the interval $\theta,\infty)$. In other words, Pareto type I distribution can only take on real numbers greater than the scale parameter $\theta$. On the other hand, the support of Pareto Type II is the interval $(0,\infty)$. So a Pareto Type II distribution can take on any positive real numbers.

The two distributions are mathematically related. Judging from the PDF, it is clear that the PDF of Pareto Type II is the result of shifting Type I PDF to the left by the magnitude of $\theta$ (the same can be said about the CDF and survival function). More specifically, let $X_1$ be a random variable that follows a Pareto Type I distribution with parameters $\alpha$ and $\theta$. Let $X_2=X_1-\theta$. It is straightforward to verify that $X_2$ has a Pareto Type II distribution, i.e. its CDF and other distributional quantities are the same as the ones shown in the above table under Pareto Type II. If having the same parameters, the two distributions are essentially the same, in that each one is the result of shifting the other one by the amount $\theta$.

A further indication that the two types are of the same distributional shape is that the variances are identical. Note that shifting a distribution to the left (or right) by a constant does not change the variance.

Since the two Pareto Types are the same distribution (except for the shifting), they share similar mathematical properties. For example, both distributions are heavy tailed distributions. In other words, they significantly put more probabilities on larger values. This point is discussed in more details below.

Calculation

First, the calculations. The moments are determined by the integral $\int_\theta^\infty x^k \ f(x) \ dx$ where $f(x)$ is the PDF of the distribution in question. Because of the PDF for Pareto Type I is easy to work with, almost all the items under Pareto Type I are quite accessible. For example, the item 8c for Pareto Type I is calculated by the following integral.

$\displaystyle E[(X \wedge d)^k]=\int_\theta^d x^k \ \frac{\alpha \theta^\alpha}{x^{\alpha+1}} \ dx+d^k \ \biggl(\frac{\theta}{x}\biggr)^\alpha$

In the remaining discussion, the focus is on Pareto Type II calculations.

The Pareto $k$th moment $E(X^k)$ is definition the integral $\int_0^\infty x^k f(x) \ dx$ where $f(x)$ is the Pareto Type II PDF. However, it is difficult to perform this integral. The best way to evaluate the moments in row 5 in the above table is to use the fact that Pareto Type II distribution is a mixture of exponential distributions with gamma mixing weight (see Example 2 here). Thus the moments of Pareto Type II can be obtained by integrating the conditional conditional $k$th moment of the exponential distribution with gamma weight. The following shows the calculation.

\displaystyle \begin{aligned} E(X^k)&=\int_0^\infty E(X^k \lvert \lambda) \ \frac{1}{\Gamma(\alpha)} \ \theta^\alpha \ \lambda^{\alpha-1} \ e^{-\theta \lambda} \ d \lambda \\&=\int_0^\infty \frac{\Gamma(k+1)}{\lambda^k} \ \frac{1}{\Gamma(\alpha)} \ \theta^\alpha \ \lambda^{\alpha-1} \ e^{-\theta \lambda} \ d \lambda \\&=\frac{\Gamma(k+1)}{\Gamma(\alpha)} \int_0^\infty \theta^\alpha \ \lambda^{\alpha-k-1} \ e^{-\theta \lambda} \ d \lambda \\&=\frac{\theta^k \ \Gamma(k+1) \Gamma(\alpha-k)}{\Gamma(\alpha)} \int_0^\infty \frac{1}{\Gamma(\alpha-k)} \ \theta^{\alpha-k} \ \lambda^{\alpha-k-1} \ e^{-\theta \lambda} \ d \lambda \\&=\frac{\theta^k \ \Gamma(k+1) \Gamma(\alpha-k)}{\Gamma(\alpha)} \end{aligned}

In the above derivation, the conditional $X \lvert \Lambda$ is assumed to have an exponential distribution with mean $\Lambda$. The random variable $\Lambda$ in turns has a gamma distribution with shape parameter $\alpha$ and rate parameter $\theta$. The integrand in the integral in the second to the last step is a gamma density, making the value of the integral 1.0. When $k$ is an integer, $E(X^k)$ can be simplified as indicated in row 5.

The next calculation is the mean excess loss. It is the conditional expected value $E(X-d \lvert X>d)$. If $X$ is an insurance loss and $d$ is some kind of threshold (e.g. the deductible in an insurance policy that covers this loss), then $E(X-d \lvert X>d)$ is the expected loss in excess of the threshold $d$ given that the loss exceeds $d$. If $X$ is the lifetime of an individual, then $E(X-d \lvert X>d)$ is the expected remaining lifetime given that the individual has survived to age $d$.

The expected value $E(X-d \lvert X>d)$ can be calculated by the integral $\frac{1}{S(d)} \int_d^\infty (x-d) \ f(x) \ dx$. This integral is not easy to evaluate when $f(x)$ is a Pareto Type II PDF. Fortunately, there is another way to handle this calculation. The key idea is that if $X$ has a Pareto Type II distribution with parameters $\alpha$ and $\theta$ (as described in the table), the conditional random variable $X-d \lvert X>d$ also has a Pareto Type II distribution, this time with parameters $\alpha$ and $d+\theta$. The mean of a Pareto Type II distribution is always the ratio of the scale parameter to the shape parameter less one. Thus the mean of $X-d \lvert X>d$ is as indicated in row 7 of the table.

The limited loss $X \wedge d$ is defined as follows.

$\displaystyle X \wedge d=\left\{ \begin{array}{ll} \displaystyle X &\ X \le d \\ \text{ } & \text{ } \\ \displaystyle d &\ X > d \end{array} \right.$

One interpretation is that it is the insurance payment when the insurance policy has an upper cap on benefit. If the loss is below the cap $d$, the insurance policy pays the loss in full. If the loss exceeds the cap $d$, the policy only pays for the loss up to the limit $d$. The expected insurance payment $E(X \wedge d)$ is said to be the limited expectation. For Pareto Type II, the first moment $E(X \wedge d)$ can be evaluated by the following integral.

$\displaystyle E(X \wedge d)=\int_0^d x \ \frac{\alpha \theta^\alpha}{(x+\theta)^{\alpha+1}} \ dx+d \ \biggl(\frac{\theta}{x+\theta}\biggr)^\alpha$

Integrating using a change of variable $u=x+\theta$ will yield the results in row 8a and row 8b in the table, i.e. the cases for $\alpha \ne 1$ and $\alpha=1$. A more interesting result is 8c, which is the $k$th moment of the variable $X \wedge d$. The integral for this expectation can expressed using the incomplete beta function. The following evaluates the $E[(X \wedge d)^k]$.

$\displaystyle E[(X \wedge d)^k]=\int_0^d x^k \ \frac{\alpha \theta^\alpha}{(x+\theta)^{\alpha+1}} \ dx +d^k \ \biggl(\frac{\theta}{x+\theta}\biggr)^\alpha$

Further transform the integral in the above calculation by the change of variable using $u=\frac{x}{x+\theta}$.

\displaystyle \begin{aligned} \int_0^d x^k \ \frac{\alpha \theta^\alpha}{(x+\theta)^{\alpha+1}} \ dx&=\int_0^{\frac{d}{d+\theta}} \alpha \theta^\alpha \ \biggl(\frac{u \theta}{1-u}\biggr)^k \ \biggl( \frac{\theta}{1-u}\biggr)^{-\alpha-1} \ \frac{\theta}{(1-u)^2} \ du\\&=\int_0^{\frac{d}{d+\theta}} \alpha \theta^k \ u^k \ (1-u)^{\alpha-k-1} \ du \\&=\frac{\theta^k \Gamma(k+1) \Gamma(\alpha-k)}{\Gamma(\alpha)} \int_0^{\frac{d}{d+\theta}} \frac{\Gamma(\alpha+1)}{\Gamma(k+1) \Gamma(\alpha-k)} \ u^{k+1-1} \ (1-u)^{\alpha-k-1} \ du \end{aligned}

The integrand in the last integral is the probability density function of the beta distribution with parameters $a=k+1$ and $b=\alpha-k$. Thus $E[(X \wedge d)^k]$ is as indicated in 8c.

Now we consider two risk measures – value-at-risk (VaR) and tail-value-at-risk (TVaR). The value-at-risk at security level $p$ for a random variable $X$ is, denoted by $VaR_p(X)$, the $(100p)$th percentile of $X$. Thus VaR is a fancy name for percentiles. Setting the Pareto Type II CDF equals to $p$ gives the VaR indicated in row 9 of the table. In other words, solving the following equation for $x$ gives the $(100p)$th percentile for Pareto Type II.

$\displaystyle 1-\biggl(\frac{\theta}{x+\theta}\biggr)^\alpha=p$

The tail-value-at-risk of a random variable $X$ at the security level $p$, denoted by $TVaR_p(X)$, is the expected value of $X$ given that it exceeds $VaR_p(X)$. Thus $TVaR_p(X)=E[X \lvert X>VaR_p(X)]$. Letting $\pi_p=\frac{\theta}{(1-p)^{1/\alpha}}-\theta$, the following integral gives the tail-value-at-risk for Pareto Type II. The integral is evaluated by the change of variable $u=x+\theta$.

$\displaystyle E[X \lvert X>VaR_p(X)]=\int_{\pi_p}^\infty x \ \frac{\alpha \theta^\alpha}{(x+\theta)^{\alpha+1}} \ x \ dx=\frac{\alpha}{\alpha-1} VaR_p(X)+\frac{\theta}{\alpha-1}$

Tail Weight

Several properties in the above table show that the Pareto distribution (both types) is a heavy-tailed distribution. When a distribution significantly puts more probabilities on larger values, the distribution is said to be a heavy tailed distribution (or said to have a larger tail weight). There are four ways to look for indication that a distribution is heavy tailed.

1. Existence of moments.
2. Hazard rate function.
3. Mean excess loss function.
4. Speed of decay of the survival function to zero.

Tail weight is a relative concept – distribution A has a heavier tail than distribution B. The first three points are ways to tell heavy tails without a reference distribution. Point number 4 is comparative.

Existence of moments
For a given random variable $Z$, the existence of all moments $E(Z^k)$, for all positive integers $k$, indicates a light (right) tail for the distribution of $Z$. The existence of positive moments exists only up to a certain value of a positive integer $k$ is an indication that the distribution has a heavy right tail.

Note that the existence of the Pareto higher moments $E(X^k)$ is capped by the shape parameter $\alpha$ (both Type I and Type II). Thus if $\alpha=3$, $E(X^k)$ only exists for $0. In particular, the Pareto Type II mean $E(X)=\frac{\theta}{\alpha-1}$ does not exist for $0<\alpha \le 1$. If the Pareto distribution is to model a random loss, and if the mean is infinite (when $\alpha=1$), the risk is uninsurable! On the other hand, when $\alpha \le 2$, the Pareto variance does not exist. This shows that for a heavy tailed distribution, the variance may not be a good measure of risk.

As compared with Pareto, the exponential distribution, the Gamma distribution, the Weibull distribution, and the lognormal distribution are considered to have light tails since all moments exist.

Hazard rate function
The hazard rate function $h(x)$ of a random variable $X$ is defined as the ratio of the density function and the survival function.

$\displaystyle h(x)=\frac{f(x)}{S(x)}$

The hazard rate is called the force of mortality in a life contingency context and can be interpreted as the rate that a person aged $x$ will die in the next instant. The hazard rate is called the failure rate in reliability theory and can be interpreted as the rate that a machine will fail at the next instant given that it has been functioning for $x$ units of time. It follows that the hazard rate of Pareto Type I is $h(x)=\alpha/x$ and is $h(x)=\alpha/(x+\theta)$ for Type II. They are both decreasing function of $x$.

Another indication of heavy tail weight is that the distribution has a decreasing hazard rate function. Thus the Pareto distribution (both types) is considered to be a heavy distribution based on its decreasing hazard rate function.

One key characteristic of hazard rate function is that it can generate the survival function.

$\displaystyle S(x)=e^{\displaystyle -\int_0^x h(t) \ dt}$

Thus if the hazard rate function is decreasing in $x$, then the survival function will decay more slowly to zero. To see this, let $H(x)=\int_0^x h(t) \ dt$, which is called the cumulative hazard rate function. As indicated above, the survival function can be generated by $e^{-H(x)}$. If $h(x)$ is decreasing in $x$, $H(x)$ is smaller than $H(x)$ where $h(x)$ is constant in $x$ or increasing in $x$. Consequently $e^{-H(x)}$ is decaying to zero much more slowly than $e^{-H(x)}$. Thus a decreasing hazard rate leads to a slower speed of decay to zero for the survival function (a point discussed below).

In contrast, the exponential distribution has a constant hazard rate function, making it a medium tailed distribution. As explained above, any distribution having an increasing hazard rate function is a light tailed distribution.

The mean excess loss function
Suppose that a property owner is exposed to a random loss $X$. The property owner buys an insurance policy with a deductible $d$ such that the insurer will pay a claim in the amount of $X-d$ if a loss occurs with $Y>d$. The insuerer will pay nothing if the loss is below the deductible. Whenever a loss is above $d$, what is the average claim the insurer will have to pay? This is one way to look at mean excess loss function, which represents the expected excess loss over a threshold conditional on the event that the threshold has been exceeded. Thus the mean excess loss function is $e_Y(d)=E(Y-d \lvert X>d)$, a function of the deductible $d$.

According to row 7 in the above table, the mean excess loss for Pareto Type I is $e(X)=d/(\alpha-1)$ and for Type II is $e(X)=(d+\theta)/(\alpha-1)$. They are both increasing functions of the deductible $d$! This means that the larger the deductible, the larger the expected claim if such a large loss occurs! If a random loss is modeled by such a distribution, it is a catastrophic risk situation.

In general, an increasing mean excess loss function is an indication of a heavy tailed distribution. On the other hand, a decreasing mean excess loss function indicates a light tailed distribution. The exponential distribution has a constant mean excess loss function and is considered a medium tailed distribution.

Speed of decay of the survival function to zero
The survival function $S(x)=P(X>x)$ captures the probability of the tail of a distribution. If a distribution whose survival function decays slowly to zero (equivalently the cdf goes slowly to one), it is another indication that the distribution is heavy tailed. This point is touched on when discussing hazard rate function.

The following is a comparison of a Pareto Type II survival function and an exponential survival function. The Pareto survival function has parameters ($\alpha=2$ and $\theta=2$). The two survival functions are set to have the same 75th percentile, which is $x=2$. The following table is a comparison of the two survival functions.

$\displaystyle \begin{array}{llllllll} \text{ } &x &\text{ } & \text{Pareto } S_X(x) & \text{ } & \text{Exponential } S_Y(x) & \text{ } & \displaystyle \frac{S_X(x)}{S_Y(x)} \\ \text{ } & \text{ } & \text{ } & \text{ } & \text{ } \\ \text{ } &2 &\text{ } & 0.25 & \text{ } & 0.25 & \text{ } & 1 \\ \text{ } &10 &\text{ } & 0.027777778 & \text{ } & 0.000976563 & \text{ } & 28 \\ \text{ } &20 &\text{ } & 0.008264463 & \text{ } & 9.54 \times 10^{-7} & \text{ } & 8666 \\ \text{ } &30 &\text{ } & 0.00390625 & \text{ } & 9.31 \times 10^{-10} & \text{ } & 4194304 \\ \text{ } &40 &\text{ } & 0.002267574 & \text{ } & 9.09 \times 10^{-13} & \text{ } & 2.49 \times 10^{9} \\ \text{ } &60 &\text{ } & 0.001040583 & \text{ } & 8.67 \times 10^{-19} & \text{ } & 1.20 \times 10^{15} \\ \text{ } &80 &\text{ } & 0.000594884 & \text{ } & 8.27 \times 10^{-25} & \text{ } & 7.19 \times 10^{20} \\ \text{ } &100 &\text{ } & 0.000384468 & \text{ } & 7.89 \times 10^{-31} & \text{ } & 4.87 \times 10^{26} \\ \text{ } &120 &\text{ } & 0.000268745 & \text{ } & 7.52 \times 10^{-37} & \text{ } & 3.57 \times 10^{32} \\ \text{ } &140 &\text{ } & 0.000198373 & \text{ } & 7.17 \times 10^{-43} & \text{ } & 2.76 \times 10^{38} \\ \text{ } &160 &\text{ } & 0.000152416 & \text{ } & 6.84 \times 10^{-49} & \text{ } & 2.23 \times 10^{44} \\ \text{ } &180 &\text{ } & 0.000120758 & \text{ } & 6.53 \times 10^{-55} & \text{ } & 1.85 \times 10^{50} \\ \text{ } & \text{ } \\ \end{array}$

Note that at the large values, the Pareto right tails retain much more probabilities. This is also confirmed by the ratio of the two survival functions, with the ratio approaching infinity. If a random loss is a heavy tailed phenomenon that is described by the above Pareto survival function ($\alpha=2$ and $\theta=2$), then the above exponential survival function is woefully inadequate as a model for this phenomenon even though it may be a good model for describing the loss up to the 75th percentile. It is the large right tail that is problematic (and catastrophic)!

Since the Pareto survival function and the exponential survival function have closed forms, We can also look at their ratio.

$\displaystyle \frac{\text{pareto survival}}{\text{exponential survival}}=\frac{\displaystyle \frac{\theta^\alpha}{(x+\theta)^\alpha}}{e^{-\lambda x}}=\frac{\theta^\alpha e^{\lambda x}}{(x+\theta)^\alpha} \longrightarrow \infty \ \text{ as } x \longrightarrow \infty$

In the above ratio, the numerator has an exponential function with a positive quantity in the exponent, while the denominator has a polynomial in $x$. This ratio goes to infinity as $x \rightarrow \infty$.

In general, whenever the ratio of two survival functions diverges to infinity, it is an indication that the distribution in the numerator of the ratio has a heavier tail. When the ratio goes to infinity, the survival function in the numerator is said to decay slowly to zero as compared to the denominator.

The Pareto distribution has many economic applications. Since it is a heavy tailed distribution, it is a good candidate for modeling income above a theoretical value and the distribution of insurance claims above a threshold value.

$\text{ }$

$\text{ }$

$\text{ }$

$\copyright$ 2017 – Dan Ma