Pareto Type I versus Pareto Type II

This post complements an earlier discussion of the Pareto distribution in a companion blog (found here). This post gives a side-by-side comparison of the Pareto type I distribution and Pareto type II Lomax distribution. We discuss the calculations of the mathematical properties shown in the comparison. Several of the properties in the comparison indicate that Pareto distributions (both Type I and Type II) are heavy tailed distributions. The properties presented in the comparison (and the thought processes behind them) are a good resource for studying actuarial exams.

The following table gives a side-by-side comparison for Pareto Type I and Pareto Type II.

\displaystyle \begin{array}{llllllll} \text{ } &\text{ } &\text{ } & \text{Pareto Type I} & \text{ } & \text{ } & \text{Pareto Type II} & \text{ } \\  \text{ } & \text{ } & \text{ } & \text{ } & \text{ } \\  1 &\text{PDF} &f(x) & \displaystyle \frac{\alpha \theta^\alpha}{x^{\alpha+1}} & x>\theta & \text{ } & \displaystyle \frac{\alpha \theta^\alpha}{(x+\theta)^{\alpha+1}} & x>0  \\     \text{ } & \text{ } & \text{ } & \text{ } & \text{ } \\  2 &\text{CDF} &F(x) & \displaystyle 1-\biggl(\frac{\theta}{x}\biggr)^\alpha & x>\theta & \text{ } & \displaystyle 1-\biggl(\frac{\theta}{x+\theta}\biggr)^\alpha & x>0 \\    \text{ } & \text{ } & \text{ } & \text{ } & \text{ } \\  3 &\text{Survival} &S(x) & \displaystyle \biggl(\frac{\theta}{x}\biggr)^\alpha & x>\theta & \text{ } & \displaystyle \biggl(\frac{\theta}{x+\theta}\biggr)^\alpha & x>0 \\  \text{ } & \text{ } & \text{ } & \text{ } & \text{ } \\  4 & \text{Hazard Rate} &h(x) & \displaystyle \frac{\alpha}{x}  & x>\theta & \text{ } & \displaystyle \frac{\alpha}{x+ \theta} & x>0  \\   \text{ } & \text{ } \\  5 &\text{Moments} &E(X^k) & \displaystyle \frac{\alpha \theta^k}{\alpha-k} & k<\alpha & \text{ } & \displaystyle \frac{\theta^k \Gamma(k+1) \Gamma(\alpha-k)}{\Gamma(\alpha)} & -1<k<\alpha \\           \text{ } & \text{ } \\  \text{ } & \text{ } &\text{ } & \text{ } & \text{ } & \text{ } & \displaystyle \frac{\theta^k k!}{(\alpha-1) \cdots (\alpha-k)} & k \text{ integer} \\           \text{ } & \text{ } \\    6 & \text{Variance} &Var(X) & \displaystyle \frac{\alpha \theta^2}{(\alpha-2) (\alpha-1)^2}  & \alpha>2  & \text{ } & \displaystyle \frac{\alpha \theta^2}{(\alpha-2) (\alpha-1)^2}  &\alpha>2 \\           \text{ } & \text{ } \\  7 & \text{Mean} &E(X-d \lvert X >d) & \displaystyle \frac{d}{\alpha-1}  & \alpha>1  & \text{ } & \displaystyle \frac{d+\theta}{\alpha-1}  &\alpha>1 \\   \text{ } &        \text{Excess Loss} & \text{ } \\  \text{ } & \text{ } \\    \end{array}\displaystyle \begin{array}{llllllll} \text{ } &\text{.} & \text{ } & \text{ }  & \text{ }  & \text{ } & \text{ }  &\text{ } \\ 8a &\text{Limited} &E[X \wedge d] & \text{ }  & \text{ }  & \text{ } & \displaystyle \frac{\theta}{\alpha-1} \biggl[1-\biggl(\frac{\theta}{d+\theta} \biggr)^{\alpha-1} \biggr]  &\alpha \ne 1 \\  \text{ } & \text{Expectation} & \text{ } \\  \text{ } & \text{ } \\      8b & \text{Limited} &E[X \wedge d] & \text{ }  & \text{ }  & \text{ } & \displaystyle -\theta \ \text{ln} \biggl(\frac{\theta}{d+\theta} \biggr)  &\alpha = 1 \\          \text{ } & \text{Expectation} & \text{ } \\  \text{ } & \text{ } \\      8c & \text{Limited} &E[(X \wedge d)^k] & \displaystyle \frac{\alpha \theta^k}{\alpha-k}-\frac{k \theta^\alpha}{(\alpha-k) d^{\alpha-k}}  & \alpha>k  & \text{ } & \text{See below}  &\text{all } k \\       \text{ } &    \text{Expectation} & \text{ } \\    \text{ } & \text{ } \\    9 & \text{VaR} &VaR_p(X) & \displaystyle \frac{\theta}{(1-p)^{1/\alpha}}  & \text{ }  & \text{ } & \displaystyle \frac{\theta}{(1-p)^{1/\alpha}}-\theta  &\text{ } \\           \text{ } & \text{ } \\    10 &\text{TVaR} &TVaR_p(X) & \displaystyle \frac{\alpha}{\alpha-1} VaR_p(X)  & \alpha>1  & \text{ } & \displaystyle \frac{\alpha}{\alpha-1} VaR_p(X)+\frac{\theta}{\alpha-1}   &\alpha>1 \\           \text{ } & \text{ } \\    \end{array}

One item that is not indicated in the table is E[(X \wedge d)^k] for Pareto Type II, which is given below.

    \displaystyle E[(X \wedge d)^k]=\frac{\theta^k \Gamma(k+1) \Gamma(\alpha-k)}{\Gamma(\alpha)} \beta[k+1, \alpha-k; \frac{d}{d+\theta}]+d^k \biggl(\frac{\theta}{d+\theta} \biggr)^\alpha

where \beta(\text{ }, \text{ }; \text{ }) is the incomplete beta function, which is defined as follows:

    \displaystyle \beta(a, b; x)=\frac{\Gamma(a+b)}{\Gamma(a) \Gamma(b)} \int_0^x t^{a-1} \ (1-t)^{b-1} \ dt

for any a>0, b>0 and 0<x<1.


The above table describes two distributions that are called Pareto (Type I and Type II Lomax). Each of them has two parameters – \alpha (shape parameter) and \theta (scale parameter). The support of Pareto Type I is the interval \theta,\infty). In other words, Pareto type I distribution can only take on real numbers greater than the scale parameter \theta. On the other hand, the support of Pareto Type II is the interval (0,\infty). So a Pareto Type II distribution can take on any positive real numbers.

The two distributions are mathematically related. Judging from the PDF, it is clear that the PDF of Pareto Type II is the result of shifting Type I PDF to the left by the magnitude of \theta (the same can be said about the CDF and survival function). More specifically, let X_1 be a random variable that follows a Pareto Type I distribution with parameters \alpha and \theta. Let X_2=X_1-\theta. It is straightforward to verify that X_2 has a Pareto Type II distribution, i.e. its CDF and other distributional quantities are the same as the ones shown in the above table under Pareto Type II. If having the same parameters, the two distributions are essentially the same, in that each one is the result of shifting the other one by the amount \theta.

A further indication that the two types are of the same distributional shape is that the variances are identical. Note that shifting a distribution to the left (or right) by a constant does not change the variance.

Since the two Pareto Types are the same distribution (except for the shifting), they share similar mathematical properties. For example, both distributions are heavy tailed distributions. In other words, they significantly put more probabilities on larger values. This point is discussed in more details below.


First, the calculations. The moments are determined by the integral \int_\theta^\infty x^k \ f(x) \ dx where f(x) is the PDF of the distribution in question. Because of the PDF for Pareto Type I is easy to work with, almost all the items under Pareto Type I are quite accessible. For example, the item 8c for Pareto Type I is calculated by the following integral.

    \displaystyle E[(X \wedge d)^k]=\int_\theta^d x^k \ \frac{\alpha \theta^\alpha}{x^{\alpha+1}} \ dx+d^k \ \biggl(\frac{\theta}{x}\biggr)^\alpha

In the remaining discussion, the focus is on Pareto Type II calculations.

The Pareto kth moment E(X^k) is definition the integral \int_0^\infty x^k f(x) \ dx where f(x) is the Pareto Type II PDF. However, it is difficult to perform this integral. The best way to evaluate the moments in row 5 in the above table is to use the fact that Pareto Type II distribution is a mixture of exponential distributions with gamma mixing weight (see Example 2 here). Thus the moments of Pareto Type II can be obtained by integrating the conditional conditional kth moment of the exponential distribution with gamma weight. The following shows the calculation.

    \displaystyle \begin{aligned} E(X^k)&=\int_0^\infty E(X^k \lvert \lambda) \ \frac{1}{\Gamma(\alpha)} \ \theta^\alpha \ \lambda^{\alpha-1} \ e^{-\theta \lambda} \ d \lambda \\&=\int_0^\infty \frac{\Gamma(k+1)}{\lambda^k} \ \frac{1}{\Gamma(\alpha)} \ \theta^\alpha \ \lambda^{\alpha-1} \ e^{-\theta \lambda} \ d \lambda \\&=\frac{\Gamma(k+1)}{\Gamma(\alpha)} \int_0^\infty \theta^\alpha \ \lambda^{\alpha-k-1} \ e^{-\theta \lambda} \ d \lambda \\&=\frac{\theta^k \ \Gamma(k+1) \Gamma(\alpha-k)}{\Gamma(\alpha)} \int_0^\infty \frac{1}{\Gamma(\alpha-k)} \ \theta^{\alpha-k} \ \lambda^{\alpha-k-1} \ e^{-\theta \lambda} \ d \lambda \\&=\frac{\theta^k \ \Gamma(k+1) \Gamma(\alpha-k)}{\Gamma(\alpha)} \end{aligned}

In the above derivation, the conditional X \lvert \Lambda is assumed to have an exponential distribution with mean \Lambda. The random variable \Lambda in turns has a gamma distribution with shape parameter \alpha and rate parameter \theta. The integrand in the integral in the second to the last step is a gamma density, making the value of the integral 1.0. When k is an integer, E(X^k) can be simplified as indicated in row 5.

The next calculation is the mean excess loss. It is the conditional expected value E(X-d \lvert X>d). If X is an insurance loss and d is some kind of threshold (e.g. the deductible in an insurance policy that covers this loss), then E(X-d \lvert X>d) is the expected loss in excess of the threshold d given that the loss exceeds d. If X is the lifetime of an individual, then E(X-d \lvert X>d) is the expected remaining lifetime given that the individual has survived to age d.

The expected value E(X-d \lvert X>d) can be calculated by the integral \frac{1}{S(d)} \int_d^\infty (x-d) \ f(x) \ dx. This integral is not easy to evaluate when f(x) is a Pareto Type II PDF. Fortunately, there is another way to handle this calculation. The key idea is that if X has a Pareto Type II distribution with parameters \alpha and \theta (as described in the table), the conditional random variable X-d \lvert X>d also has a Pareto Type II distribution, this time with parameters \alpha and d+\theta. The mean of a Pareto Type II distribution is always the ratio of the scale parameter to the shape parameter less one. Thus the mean of X-d \lvert X>d is as indicated in row 7 of the table.

The limited loss X \wedge d is defined as follows.

    \displaystyle  X \wedge d=\left\{ \begin{array}{ll}                     \displaystyle  X &\ X \le d \\           \text{ } & \text{ } \\           \displaystyle  d &\ X > d           \end{array} \right.

One interpretation is that it is the insurance payment when the insurance policy has an upper cap on benefit. If the loss is below the cap d, the insurance policy pays the loss in full. If the loss exceeds the cap d, the policy only pays for the loss up to the limit d. The expected insurance payment E(X \wedge d) is said to be the limited expectation. For Pareto Type II, the first moment E(X \wedge d) can be evaluated by the following integral.

    \displaystyle E(X \wedge d)=\int_0^d x \ \frac{\alpha \theta^\alpha}{(x+\theta)^{\alpha+1}} \ dx+d \ \biggl(\frac{\theta}{x+\theta}\biggr)^\alpha

Integrating using a change of variable u=x+\theta will yield the results in row 8a and row 8b in the table, i.e. the cases for \alpha \ne 1 and \alpha=1. A more interesting result is 8c, which is the kth moment of the variable X \wedge d. The integral for this expectation can expressed using the incomplete beta function. The following evaluates the E[(X \wedge d)^k].

    \displaystyle E[(X \wedge d)^k]=\int_0^d x^k \ \frac{\alpha \theta^\alpha}{(x+\theta)^{\alpha+1}} \ dx +d^k \ \biggl(\frac{\theta}{x+\theta}\biggr)^\alpha

Further transform the integral in the above calculation by the change of variable using u=\frac{x}{x+\theta}.

    \displaystyle \begin{aligned} \int_0^d x^k \ \frac{\alpha \theta^\alpha}{(x+\theta)^{\alpha+1}} \ dx&=\int_0^{\frac{d}{d+\theta}} \alpha \theta^\alpha \ \biggl(\frac{u \theta}{1-u}\biggr)^k \ \biggl( \frac{\theta}{1-u}\biggr)^{-\alpha-1} \ \frac{\theta}{(1-u)^2} \ du\\&=\int_0^{\frac{d}{d+\theta}} \alpha \theta^k \ u^k \ (1-u)^{\alpha-k-1} \ du \\&=\frac{\theta^k \Gamma(k+1) \Gamma(\alpha-k)}{\Gamma(\alpha)} \int_0^{\frac{d}{d+\theta}} \frac{\Gamma(\alpha+1)}{\Gamma(k+1) \Gamma(\alpha-k)} \ u^{k+1-1} \ (1-u)^{\alpha-k-1} \ du \end{aligned}

The integrand in the last integral is the probability density function of the beta distribution with parameters a=k+1 and b=\alpha-k. Thus E[(X \wedge d)^k] is as indicated in 8c.

Now we consider two risk measures – value-at-risk (VaR) and tail-value-at-risk (TVaR). The value-at-risk at security level p for a random variable X is, denoted by VaR_p(X), the (100p)th percentile of X. Thus VaR is a fancy name for percentiles. Setting the Pareto Type II CDF equals to p gives the VaR indicated in row 9 of the table. In other words, solving the following equation for x gives the (100p)th percentile for Pareto Type II.

    \displaystyle 1-\biggl(\frac{\theta}{x+\theta}\biggr)^\alpha=p

The tail-value-at-risk of a random variable X at the security level p, denoted by TVaR_p(X), is the expected value of X given that it exceeds VaR_p(X). Thus TVaR_p(X)=E[X \lvert X>VaR_p(X)]. Letting \pi_p=\frac{\theta}{(1-p)^{1/\alpha}}-\theta, the following integral gives the tail-value-at-risk for Pareto Type II. The integral is evaluated by the change of variable u=x+\theta.

    \displaystyle E[X \lvert X>VaR_p(X)]=\int_{\pi_p}^\infty x \ \frac{\alpha \theta^\alpha}{(x+\theta)^{\alpha+1}} \ x \ dx=\frac{\alpha}{\alpha-1} VaR_p(X)+\frac{\theta}{\alpha-1}

Tail Weight

Several properties in the above table show that the Pareto distribution (both types) is a heavy-tailed distribution. When a distribution significantly puts more probabilities on larger values, the distribution is said to be a heavy tailed distribution (or said to have a larger tail weight). There are four ways to look for indication that a distribution is heavy tailed.

  1. Existence of moments.
  2. Hazard rate function.
  3. Mean excess loss function.
  4. Speed of decay of the survival function to zero.

Tail weight is a relative concept – distribution A has a heavier tail than distribution B. The first three points are ways to tell heavy tails without a reference distribution. Point number 4 is comparative.

Existence of moments
For a given random variable Z, the existence of all moments E(Z^k), for all positive integers k, indicates a light (right) tail for the distribution of Z. The existence of positive moments exists only up to a certain value of a positive integer k is an indication that the distribution has a heavy right tail.

Note that the existence of the Pareto higher moments E(X^k) is capped by the shape parameter \alpha (both Type I and Type II). Thus if \alpha=3, E(X^k) only exists for 0<k<3. In particular, the Pareto Type II mean E(X)=\frac{\theta}{\alpha-1} does not exist for 0<\alpha \le 1. If the Pareto distribution is to model a random loss, and if the mean is infinite (when \alpha=1), the risk is uninsurable! On the other hand, when \alpha \le 2, the Pareto variance does not exist. This shows that for a heavy tailed distribution, the variance may not be a good measure of risk.

As compared with Pareto, the exponential distribution, the Gamma distribution, the Weibull distribution, and the lognormal distribution are considered to have light tails since all moments exist.

Hazard rate function
The hazard rate function h(x) of a random variable X is defined as the ratio of the density function and the survival function.

    \displaystyle h(x)=\frac{f(x)}{S(x)}

The hazard rate is called the force of mortality in a life contingency context and can be interpreted as the rate that a person aged x will die in the next instant. The hazard rate is called the failure rate in reliability theory and can be interpreted as the rate that a machine will fail at the next instant given that it has been functioning for x units of time. It follows that the hazard rate of Pareto Type I is h(x)=\alpha/x and is h(x)=\alpha/(x+\theta) for Type II. They are both decreasing function of x.

Another indication of heavy tail weight is that the distribution has a decreasing hazard rate function. Thus the Pareto distribution (both types) is considered to be a heavy distribution based on its decreasing hazard rate function.

One key characteristic of hazard rate function is that it can generate the survival function.

    \displaystyle S(x)=e^{\displaystyle -\int_0^x h(t) \ dt}

Thus if the hazard rate function is decreasing in x, then the survival function will decay more slowly to zero. To see this, let H(x)=\int_0^x h(t) \ dt, which is called the cumulative hazard rate function. As indicated above, the survival function can be generated by e^{-H(x)}. If h(x) is decreasing in x, H(x) is smaller than H(x) where h(x) is constant in x or increasing in x. Consequently e^{-H(x)} is decaying to zero much more slowly than e^{-H(x)}. Thus a decreasing hazard rate leads to a slower speed of decay to zero for the survival function (a point discussed below).

In contrast, the exponential distribution has a constant hazard rate function, making it a medium tailed distribution. As explained above, any distribution having an increasing hazard rate function is a light tailed distribution.

The mean excess loss function
Suppose that a property owner is exposed to a random loss X. The property owner buys an insurance policy with a deductible d such that the insurer will pay a claim in the amount of X-d if a loss occurs with Y>d. The insuerer will pay nothing if the loss is below the deductible. Whenever a loss is above d, what is the average claim the insurer will have to pay? This is one way to look at mean excess loss function, which represents the expected excess loss over a threshold conditional on the event that the threshold has been exceeded. Thus the mean excess loss function is e_Y(d)=E(Y-d \lvert X>d), a function of the deductible d.

According to row 7 in the above table, the mean excess loss for Pareto Type I is e(X)=d/(\alpha-1) and for Type II is e(X)=(d+\theta)/(\alpha-1). They are both increasing functions of the deductible d! This means that the larger the deductible, the larger the expected claim if such a large loss occurs! If a random loss is modeled by such a distribution, it is a catastrophic risk situation.

In general, an increasing mean excess loss function is an indication of a heavy tailed distribution. On the other hand, a decreasing mean excess loss function indicates a light tailed distribution. The exponential distribution has a constant mean excess loss function and is considered a medium tailed distribution.

Speed of decay of the survival function to zero
The survival function S(x)=P(X>x) captures the probability of the tail of a distribution. If a distribution whose survival function decays slowly to zero (equivalently the cdf goes slowly to one), it is another indication that the distribution is heavy tailed. This point is touched on when discussing hazard rate function.

The following is a comparison of a Pareto Type II survival function and an exponential survival function. The Pareto survival function has parameters (\alpha=2 and \theta=2). The two survival functions are set to have the same 75th percentile, which is x=2. The following table is a comparison of the two survival functions.

    \displaystyle \begin{array}{llllllll} \text{ } &x &\text{ } & \text{Pareto } S_X(x) & \text{ } & \text{Exponential } S_Y(x) & \text{ } & \displaystyle \frac{S_X(x)}{S_Y(x)} \\  \text{ } & \text{ } & \text{ } & \text{ } & \text{ } \\  \text{ } &2 &\text{ } & 0.25 & \text{ } & 0.25 & \text{ } & 1  \\    \text{ } &10 &\text{ } & 0.027777778 & \text{ } & 0.000976563 & \text{ } & 28  \\  \text{ } &20 &\text{ } & 0.008264463 & \text{ } & 9.54 \times 10^{-7} & \text{ } & 8666  \\   \text{ } &30 &\text{ } & 0.00390625 & \text{ } & 9.31 \times 10^{-10} & \text{ } & 4194304  \\  \text{ } &40 &\text{ } & 0.002267574 & \text{ } & 9.09 \times 10^{-13} & \text{ } & 2.49 \times 10^{9}  \\  \text{ } &60 &\text{ } & 0.001040583 & \text{ } & 8.67 \times 10^{-19} & \text{ } & 1.20 \times 10^{15}  \\  \text{ } &80 &\text{ } & 0.000594884 & \text{ } & 8.27 \times 10^{-25} & \text{ } & 7.19 \times 10^{20}  \\  \text{ } &100 &\text{ } & 0.000384468 & \text{ } & 7.89 \times 10^{-31} & \text{ } & 4.87 \times 10^{26}  \\  \text{ } &120 &\text{ } & 0.000268745 & \text{ } & 7.52 \times 10^{-37} & \text{ } & 3.57 \times 10^{32}  \\  \text{ } &140 &\text{ } & 0.000198373 & \text{ } & 7.17 \times 10^{-43} & \text{ } & 2.76 \times 10^{38}  \\  \text{ } &160 &\text{ } & 0.000152416 & \text{ } & 6.84 \times 10^{-49} & \text{ } & 2.23 \times 10^{44}  \\  \text{ } &180 &\text{ } & 0.000120758 & \text{ } & 6.53 \times 10^{-55} & \text{ } & 1.85 \times 10^{50}  \\  \text{ } & \text{ } \\    \end{array}

Note that at the large values, the Pareto right tails retain much more probabilities. This is also confirmed by the ratio of the two survival functions, with the ratio approaching infinity. If a random loss is a heavy tailed phenomenon that is described by the above Pareto survival function (\alpha=2 and \theta=2), then the above exponential survival function is woefully inadequate as a model for this phenomenon even though it may be a good model for describing the loss up to the 75th percentile. It is the large right tail that is problematic (and catastrophic)!

Since the Pareto survival function and the exponential survival function have closed forms, We can also look at their ratio.

    \displaystyle \frac{\text{pareto survival}}{\text{exponential survival}}=\frac{\displaystyle \frac{\theta^\alpha}{(x+\theta)^\alpha}}{e^{-\lambda x}}=\frac{\theta^\alpha e^{\lambda x}}{(x+\theta)^\alpha} \longrightarrow \infty \ \text{ as } x \longrightarrow \infty

In the above ratio, the numerator has an exponential function with a positive quantity in the exponent, while the denominator has a polynomial in x. This ratio goes to infinity as x \rightarrow \infty.

In general, whenever the ratio of two survival functions diverges to infinity, it is an indication that the distribution in the numerator of the ratio has a heavier tail. When the ratio goes to infinity, the survival function in the numerator is said to decay slowly to zero as compared to the denominator.

The Pareto distribution has many economic applications. Since it is a heavy tailed distribution, it is a good candidate for modeling income above a theoretical value and the distribution of insurance claims above a threshold value.

\text{ }

\text{ }

\text{ }

\copyright 2017 – Dan Ma

7 thoughts on “Pareto Type I versus Pareto Type II

  1. Pingback: Practice Problem Set 4 – Pareto Distribution « Practice Problems in Actuarial Modeling

  2. Pingback: More on Pareto distribution | Applied Probability and Statistics

  3. Pingback: The Pareto distribution | Applied Probability and Statistics

  4. Pingback: Practice Problem Set 5 – Exercises for Severity Models « Practice Problems in Actuarial Modeling

  5. Pingback: Practice Problem Set 1 – method of moments estimation | SOA Exam C / CAS Exam 4

  6. Pingback: More on calculating maximum likelihood estimators | SOA STAM Exam

  7. Pingback: Practice Problem Set 13 – variance of insurance payment per loss | Practice Problems in Actuarial Modeling

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s