《金融经济学原理》课程教学资源（文献资料）Implied risk aversion and pricing kernel in the FTSE 100 index.pdf

Contents lists available at ScienceDirect North American Journal of Economics and Finance journal homepage: www.elsevier.com/locate/najef Implied risk aversion and pricing kernel in the FTSE 100 index Wen Ju Liaoa , Hao-Chang Sungb,⁎ aDepartment of Finance, Fujian Business University, 19, Huang Pu, Gulou District, Fuzhou, Fujian, People's Republic of China bDepartment of Finance, College of Economics, Jinan University, No. 601 Huangpu Avenue West, Guanzhou, Guandong 510632, People’s Republic of China ARTICLE INFO Keywords: Pricing kernel Risk aversion Risk neutral density Positive convolution approximation Volatility smile Pricing kernel puzzle JEL: C14 G12 ABSTRACT This paper studies the estimation of the pricing kernel and explains the pricing kernel puzzle found in the FTSE 100 index. We use prices of options and futures on the FTSE 100 index to derive the risk neutral density (RND). The option-implied RND is inverted by using two nonparametric methods: the implied-volatility surface interpolation method and the positive convolution approximation (PCA) method. The actual density distribution is estimated from the historical data of the FTSE 100 index by using the threshold GARCH (TGARCH) model. The results show that the RNDs derived from the two methods above are relatively negatively skewed and fat-tailed, compared to the actual probability density, that is consistent with the phenomenon of “volatility smile.” The derived risk aversion is found to be locally increasing at the center, but decreasing at both tails asymmetrically. This is the so-called pricing kernel puzzle. The simulation results based on a representative agent model with two state variables show that the pricing kernel is locally increasing with the wealth at the level of 1 and is consistent with the empirical pricing kernel in shape and magnitude. 1. Introduction A representative agent’s risk aversion can be inferred from pricing kernel. In asset pricing theory, it has shown that the pricing kernel summarizes the representative investor’s preferences for different states of the world.1 If we can know the pricing kernel, we can then infer the risk attitude of market participants toward unknown future prices. In the literature, early research on deriving the pricing kernel empirically relies on investors’ portfolio holdings or on consumption data. Nevertheless, the conclusions from studies based on the consumption data are inconsistent, particularly regarding the magnitude and characteristics of the relative risk aversion. On the other hand, several studies considering to invert the pricing kernel from actual trading prices of options have established a better underpinning of theory (Jackwerth, 2000; Rosenberg & Engle, 2002). Empirically, pricing kernel can be evaluated by the ratio of risk-neutral probabilities to actual probabilities. The risk-neutral density (RND) can be derived from option prices, and the actual probability density can be estimated from historical prices of the underlying asset. The use of cross-section data of option prices exhibits informative advantage because of the availability of plenty of daily crosssection option prices data.2 Using the option prices data can also help avoid the estimated errors related to the consumption data. https://doi.org/10.1016/j.najef.2018.08.009 Received 4 May 2018; Received in revised form 2 August 2018; Accepted 8 August 2018 ⁎ Corresponding author. E-mail address: frrg4125@hotmail.com (H.-C. Sung). 1 Under the no-arbitrage condition, knowing the pricing kernel can help find the true value of an asset, that is, S E = Q ( ) X , where S is the asset price, EQ (·) refers to the expectation under the risk-neutral density Q, is the pricing kernel or discount factor and X is the asset payoffs. Pricing kernel is aggregated with risk preferences of the representative agent in different states. 2Daily and intradaily cross-sectional option prices have specific expiration dates and different strike prices when payoffs are realized, and that futures contracts also have finite-horizon maturity. But, to incorporate the options price data to estimate the RND, we need to impose the arbitragefree assumption on observed option prices. North American Journal of Economics and Finance 54 (2020) 100826 Available online 22 August 2018 1062-9408/ © 2018 Elsevier Inc. All rights reserved. T

As argued in Rosenberg and Engle (2002), empirical pricing kernel is the preference function that best fits asset prices given forecasted payoffs density. With a more general framework for utility functions admitting risk preference (e.g. prospect theory)3 and estimating the pricing kernel at the sequence of points in time, the pricing kernel can be extended for a dynamic structure. Bliss and Panigirtzoglou (2004) infer the risk aversion from both power-form and exponential-form utility functions and RNDs embedded in cross-sections of options on FTSE 100 and S&P500 indexes4 and ignored the pricing kernel by design. To derive empirical or implied pricing kernel, we need to estimate implied RND and actual probability density for the FTSE 100 index returns at first. To obtain the RND, we use the put options on FTSE 100 index with moneyness ranging from 0.82 to 1.16 Specifically, we compare two nonparametric methods, including the implied-volatility surface fitting method (Derman, 1998) and the PCA method (Bondarenko, 2003b). Regarding the actual probability density, we evaluate it by a Monte Carlo simulation based on the estimation results from the threshold GARCH (TGARCH) model of Glosten, Jagannathan, and Runkle (1993). 5 The results show that the average empirical pricing kernel implied from the FTSE 100 index options exhibits a tilde-shaped pattern. This is the so-called “pricing kernel puzzle,” and has been documented in Ait-Sahalia and Lo (2000), Jackwerth (2000), Rosenberg and Engle (2002), Hill (2013), and Fengler and Hin (2015) based on the study of the S&P 500 index options6 . We also find that the implied risk aversion and relative risk aversion are U-shaped, as found by Jackwerth (2000) and Ait-Sahalia and Lo (2000), which is an anomaly that confronts with the economic theory. The pricing kernel puzzle arises from the fact that, theoretically, the pricing kernel should be monotonically decreasing but, empirically, it increases locally for some range of wealth levels. A finding of locally increasing pricing kernel implies that the representative investor is not risk averse. Some studies have undertaken to explain the reason for the pricing kernel puzzle by estimating both RND and subjective probability density at the same time and dealing with demand based models (Ziegler, 2007; Chabi-Yo, Garcia, & Renault, 2008; Christoffersen, Heston, & Jacobs, 2013; Song & Xiu, 2016). However, in these studies, analytical pricing kernels cannot fit well with the corresponding empirical pricing kernel puzzle in shape and magnitude. To explain the pricing kernel puzzle found in our study on the FTSE 100 index options in the LIFFE (London International Financial Futures and Options Exchange), we simulate a price kernel function using Brown and Jackwerth (2012)’s approach. Under the framework of a representative agent model, by specifying volatility as an additional momentum state variable, we can capture the empirical patterns in the pricing kernel, which is consistent with the locally increasing pricing kernel and the volatility smile. The remainder of this paper is organized as follows. In Section 2, we explain the relation between actual probability density and RND and how to extract pricing kernel and risk aversion from actual probability densities and RNDs. Section 3 documents methods of inverting implied pricing kernels from the data of option prices. Section 4 discusses pricing kernel puzzle and the possible explanations to resolve this puzzle. The empirical results from the FTSE 100 option data are presented and analyzed in Section 5. Section 6 concludes. 2. Pricing kernel and risk aversion We consider a complete market economy, thus, we can derive the stock price by solving the optimization problem of a representative agent (Constantinides, 1982). Suppose, initially at date t, the investor has one share of the stock as endowment, and this investor can only consume at date t and at a future date, T. Between dates t and T, this representative investor considers to invest a fraction ( ) of wealth to the stock at date , t T . At date T, the investor’s wealth becomes WT. Suppose the stock price, St, is a stochastic process, which follows dS µS dt t t = + dZt, where µ and are the mean and volatility of St, respectively, and Z is a standard Brownian motion. Suppose U (·) denotes the investor’s utility function and be twice continuously differentiable, concave, and increasing with wealth. The investor solves for the optimal asset holding at date , , by maximizing the discounted expected utility: max [ ( )] E U W t T (1) s. t. ( dW rW W = + ( )) µ r d + W dZ (2) W t 0, , T where W denotes the investor’s wealth at date , ; t T r is the risk-free interest rate. The first-order condition for the optimization problem above can be rewritten as follows, = J W S W e J W S t W ( , , ) ( , , ) , r t t t ( ) (3) where J W S ( , , ) denotes the indirect utility function for this investor. The terminal condition at date = T is 3Gemmill and Shackleton (2005) examine whether cumulative prospect theory can explain the extraordinary steepness of the volatility smile in the loss domain, which is equivalent to a risk-neutral distribution with a fat left-tail. 4 The FTSE 100 Index is a share index of the 100 largest companies listed on the London Stock Exchange, starting January 3, 1984. FTSE is the abbreviation of Financial Times Stock Exchange. 5 TGARCH model allows for asymmetric relation between volatility and squared past errors, that is the “leverage effect” evident in stock markets. 6 Bliss and Panigirtzoglou (2004) do not find the ”pricing kernel puzzle” evident in their study. W.J. Liao and H.-C. Sung North American Journal of Economics and Finance 54 (2020) 100826 2

U W e U W ( ) T = ( ) , r T t t T ( ) (4) where = µ r dZ µ r exp{ ds 1 2 }. t s t 2 (5) Under a complete and frictionless market, the theoretical relation between the risk-neutral density ( Q S( T) and actual density distribution ( P S( ) T ) is established as follows (see Ait-Sahalia & Lo, 1998; Jackwerth, 2000): = Q S P S e U S U S S ( ) ( ) ( ) ( ) ( ). T T r T t T t T T ( ) (6) The function ( ) S T T is the pricing kernel. Thus, if we know P S( ) T and Q S( ) T , we can find out the pricing kernel. If we take the derivative of T with respect to ST, we obtain = e U S U S ( ) ( ) . T r T t T t ( ) (7) In addition, taking the ratio of Eqs. (6) and (7), and multiplying ST, we have exactly the Arrow-Pratt measure of relative riskaversion, A S rt T ( ). S = = = U S U S S A S Q S Q S S P S P S S ( ) ( ) ( ) ( ) ( ) ( ) ( ) . T T T T T T rt T T T T T T T Hence, we obtain a computable expression for the implied relative risk aversion. This also yields the implied absolute risk aversion computable from the data (Ait-Sahalia & Lo, 1998; Jackwerth, 2000). Meanwhile, it shows that the value of the relative risk aversion can be obtained by combining the actual probability density, the RND and their first derivatives. With the implied risk aversion, we can understand how the investor’s risk preference varies across the investment horizons.7 3. Methods of estimating implied pricing kernels In a complete-market economy, there exists a representative investor, and a market index, e.g., the FTSE 100 index, can work as a representative of the aggregate wealth held by this investor (see, for example, Lucas, 1978). In a representative-agent economy, equilibrium asset prices reflect the agent’s preferences and beliefs. The pricing kernel can be derived from optimizing problems if we assume concave utility functions. However, the results of empirical pricing kernel derived from macroeconomic consumption data are not consistent with the concavity of utility function. Instead, we can derive the implied pricing kernel or risk aversion by finding the RND the actual probability density at first. Below we introduce the two methods used in this paper to derive the RNDs, including implied-volatility surface fitting method of Derman (1998), hereafter IV method, and the PCA method of Bondarenko (2003b). To obtain the actual probability density for the FTSE 100 index returns, we use the historical returns of FTSE 100 index to estimate a TGARCH model and use its estimated parameters to construct an actual density probability. 3.1. Option prices and risk-neutral densities Breeden and Litzenberger (1978) show that RND, Q S( ) T , is related to the European option prices by = = Q S e C S K T t K ( ) ( , , ) T r T t t K S ( ) 2 2 T, (8) where St is the current value of the underlying asset, K is the option strike price, and T t is time remaining before the expiration date. However, the available option prices do not provide a continuous call price function. Hence, we have to construct a continuous function of option prices by fitting a smoothing function to the available data. According to Jackwerth (2004), instead of picking a few parameters of a parametric risk-neutral probability distribution, a most efficient way is to fit the risk-neutral probability distribution either point-wise or build it up from linear segment (or even from nonlinear pieces or surfaces). In Jackwerth (2004)’s survey, there are three groups of nonparametric methods – maximum entropy, curve-fitting, and kernel methods. In this paper, we compare two nonparametric methods – one is the IV method (Derman, 1998) from the group of curve-fitting methods, and the other is the PCA method (Bondarenko, 2003b) that belongs to the group of kernel methods. The nonparametric methods avoid to assume any parametric restrictions on either the underlying asset price process or on the family of distributions that the RNDs belong to. They neither require any prior knowledge for the RNDs. Among the three groups of 7 Besides, we know that the class of representative agent utility functions which are implied by the Black and Scholes (1973) model belongs to the class of the constant relative risk aversion (CRRA). Hence, using our empirical risk-aversion functions estimated from the FTSE 100 option prices, we can verify if the assumption of CRRA accurately fits the empirical pattern. W.J. Liao and H.-C. Sung North American Journal of Economics and Finance 54 (2020) 100826 3

nonparametric methods, maximum entropy methods are designed to find RNDs that fit the option data and that presume the least information relative to a prior probability distribution. However, as argued in Jackwerth (2004), the main problem with entropy methods is from the use of the logarithm in its objective function. For small values of probability, the logarithm of such small values turns into large negative value. Hence, the maximization procedure will be predominated by those large negative values and yield a misleading result. An improved approach to fitting the RND is to fit a function of option prices across strike prices and then use the BreedenLitzenberger (Breeden & Litzenberger, 1978) result to take the second derivatives of the option price function with respect to strike prices and obtain the RND after appropriate scaling. Within the class of such methods, kernel methods are often used (See, Ait-Sahalia & Lo, 1998, 2000; Perignon & Villa, 2002; Song & Xiu, 2016). In recent literature, curve fitting of the implied volatility has become the most popular starting point for backing out RNDs. Thus, we can even consider to fit the function of implied volatilities across strike prices. Once the implied volatilities are fitted, we can calculate the function of option prices by using the implied volatilities, and then apply Breeden-Litzenberger result to evaluate the RND. Such approach uses curve-fitting methods to fit the implied volatilities. The advantage from such implied-volatility fitting methods lies in that the implied volatilities are much more similar in magnitude across strike prices than across option prices. When the fitted implied-volatilities do not vary rapidly in strike prices, such methods can produce arbitrage-free RND straightforward. Next, we introduce the IV and PCA methods. The IV method is used to find a smoothed implied volatility, and belongs to the group of curve-fitting methods. The PCA approach proposed by Breeden and Litzenberger (1978) aims to construct a flexible admissible set for available densities. This approach belongs to kernel methods. 3.1.1. Implied-Volatility surface fitting (IV) method A most straightforward way to calculate the RNDs would be to interpolate or smoothing the observed option prices directly. However, the curvature of the option pricing formula is difficult to approximate with commonly used methods. Similarly, small fitted price errors will have a large effect on the RNDs, especially in the tails. Generally, prices can be interpolated, but it is more stable numerically to interpolate implied volatilities. We may interpolate the implied volatility smile by using cubic splines (Shimko, 1993; Campa, Chang, & Reider, 1998) or fit the implied volatility surface by using linear interpolation (Derman, 1998). In this study, we fit an implied-volatility surface and use it to derive implied RND.8 We let the surface be linear across moneyness and the slope decreases exponentially, and estimate the implied-volatility surface as follows: = + ( ) T t + e m, IV T t 0 1 2 ( ) 3 where T t is time to maturity, m = , F K F T t log( / )t is the futures price, and K is the exercise price. We use 100 points to produce equally spaced strike prices over the range and then fit the implied volatilities. Fitted implied-volatilities are interpolated into the BlackScholes model9 to derive the option prices that can be expressed as a continuous function of the strike prices. With the function of call option prices, we then use Breeden-Litzenberger result to extract the RND. Since the range of available strike prices is limited, the implied RND distribution will only expand between the lowest and highest strike prices. As suggested by Shimko (1993), we have to fit a lognormal distribution at each tail so that the total distribution can sum up to one. Fitting lognormal distributions at the tails of the implied RND function is equivalent to assuming that the volatility smile is flat outside the range of observations. 3.1.2. Positive convolution approximation (PCA) method Bondarenko (2003b) propose the PCA method and find PCA method outperforms several popular methods.10 PCA is a new nonparametric approach to estimate the RND and exhibits four properties – (1) it is completely agnostic about the data generating process; (2) it controls against overfitting and is still valid for small samples; (3) it ensures arbitrage-free estimators; (4) it is computationally simple.11 The basic idea of PCA is to build up a special set of admissible densities from which the optimal density is selected. The optimal density is the one that gives the best fit to the option prices. The admissible set includes functions that can be expressed as a convolution of a fixed positive kernel and another density. Using PCA approach, we do not need the information about the tail distribution of the asset prices. Moreover, it allows us to get a flexible admissible set for available densities and select an optimal bandwidth to solve for the trade-off between smoothness and fit. Denote Ld as the set of all probability density functions. At first, we begin by fixing a basis density or kernel function, ( ) x Ld. So we can rescale ( ) x with the bandwidth h to construct a new density ( ) x h such that h ( ) ( ) x = h x h 1 . For a fixed ( ) x h , let the approximation set W W h = h be a convolution of h and another density. Wh contains all admissible or candidate densities. And we can search for the optimal density in Wh that best fits the given cross-section option prices. In practice, we obtain the PCA estimator of RND, f , by minimizing the sum of squared pricing errors between observed option prices and theoretical prices. For example, suppose { } Pi denotes a cross-sectional observations of put prices with strike prices 8 The Matlab code is available atwww.essex.ac.uk/ccfea/. 9 Note that this method does not require the Black-Scholes model to be correct. The Black-Scholes model is simply used to transform the data from one space to another. 10 Bondarenko (2003b) compares RNDs derived from the PCA, mixture of log-normals, Hermite polynomials, and sigma-shape polynomials methods and concludes that PCA is a promising alternative to the competitors. 11 The Matlab code for PCA method is available at tigger.uic.edu/ olegb/. W.J. Liao and H.-C. Sung North American Journal of Economics and Finance 54 (2020) 100826 4

x x x 1 2 < < < n corresponding to the RND f x( ) . In this case, we can obtain f by min ( ( )) , P D f x f i i 2 2 Wh (9) where D f x( )i 2 is the second integral of f x( ) . To obtain the PCA estimator, we have to construct the admissible set Wh, and the basis density ( ) x at first. The choice of the bandwidth for constructing the admissible set Wh is essential for the PCA estimator, whereas a specific choice of the basis density ( ) x is less important (Bondarenko, 2003b). For a better choice of kernel function and the optimal bandwidth h0, Bondarenko (2003b) suggests the use of a Gaussian kernel. We also use a discrete version of the admissible set to solve for the PCA optimization equation, i.e., Eq. (11), numerically: = = = = = { | ( ) g L g x a ( ), for 0, 1}. x z a a h z d k k h k k k K k 1 W (10) Let z k z k = form a equally-spaced grid with the grid side z . Here, h W z is a subset of Wh. With a sufficiently small grid size of z, the two sets can be made arbitrary close. The optimal h and z are set as h h = 0.95 0 and z h = 0.5 so that any density g in the continuous set W0 can be approximately close by some density g in the discrete set h W z . PCA method is useful in inverting RND from traded option prices because of two reasons. First, PCA method is straightforward and easy to implement. We only need to choose a base density and the bandwidth to construct the admissible set. PCA method will produce smooth, arbitrage-free estimators of the RND. Second, PCA is constrained from overfitting and can bear the curse of differentiation and the curse of dimensionality. 3.2. Actual probability density distribution In the risk-neutral world, all expected returns are equal to the risk-free rate. Risk-neutral investors would not require a risk premium to bear risks. Therefore, under this situation, prices of derivatives can be inferred without market risk preference. But, the actual probability density or real-world probability density for returns is switched from the risk-neutral economy to the real-world economy. The actual probability density is related to risk factors of the sentiments of investors over the future price uncertainties. Three salient features of equity index return process have been discussed in literature (see Ghysels, Harvey, & Renault, 1996), including (1) return volatility is stochastic and mean-reverting; (2) return volatility responds asymmetrically to positive and negative return (namely, the leverage effect); (3) return innovations are non-normal distributed.12 Thus, a stochastic volatility model of equity index should take these data features into consideration. In a discrete-time setting, stochastic volatility is always modeled using extensions of autoregressive conditional heteroskedasticity (ARCH) model of Engle (1982) and generalized autoregressive conditional heteroskedasticity (GARCH) of Bollerslev (1986). 13 In the presence of time-varying volatility and structure breaks in equity index returns, more recent studies have turned to GARCH models, such as Giacomini and Haerdle (2008) and Christoffersen et al. (2013). Besides, GARCH model has been employed in option pricing (e.g., Duan, 1996; Heston & Nandi, 2000) and in deposit insurance pricing (e.g., Duan & Yu, 1999). These studies find that the GARCH model significantly outperforms its Black-choles counterpart and showed that the GARCH model can exhibits asset risk premium embedded in the underlying assets. GARCH models has been also exploited to forecast VIX and estimate variance risk premium (see, Liu, Guo, and Qiao (2015)).14 In this paper, the estimation of the historical density is based on daily returns to the FTSE 100 index from January 17, 1986 through March 19, 2004. By the Akaike information criterion (AIC) and Schwarz information criterion (SIC), we choose the MA(1)- TGARCH(1,1) specification (Glosten et al., 1993) to fit the historical returns to the FTSE 100 index. In contrast to ordinary GARCH models, the TGARCH model can account for the leverage effect by treating positive and negative shocks differently. Thus, Rosenberg and Engle (2002) and Barone-Adesi, Engle, and Mancini (2008) all fit the TGARCH model to historical returns. We estimate parameters by maximizing the quasi log likelihood function under the assumption of conditional normality.15 The quasi maximum likelihood estimates are still consistent even if the true density is some other than a Gaussian density (Bollerslev & Wooldridge, 1992). To obtain actual density distributions that are compatible with RNDs, the horizons of equity index returns are chosen such that they match the maturity of underlying options and the returns are then smoothed through a kernel density estimator. Thus, after estimating a parametric MA(1)-TGARCH(1,1) model based on historical returns to the FTSE 100 index, we calculate standardized innovation by dividing the innovations by estimated volatility, and then run 10,000 simulations to obtain an actual density distributions by using a Gaussian kernel density estimator with bandwidth equal to 1.8 10, 000 5 , where is the standard deviation of annualized return based on simulated data of index levels and 10,000 denotes the number of replications. The bandwidth is chosen according to Sliverman’s (1986) rule of thumb and is also similar to the one used in Jackwerth (2000). 16 12 For example, there could be jumps in index returns. Jumps could cause the index returns to be mixed-normal distributed. 13 On the other hand, in a continuous-time setting, a stochastic volatility diffusion is often used (Shephard, 1996). 14 For a complete survey of ARCH and GARCH related models, please refer to Bollerslev, Chou, and Kroner (1992) or Bollerslev, Engle, and Nelson (1994). 15 When the assumption of conditional normality is removed, there may affect the simulated shape of a density. It would be left for further tests. 16 Users of nonparametric regressions face a trade-off between smoothness and overfitting. The bandwidth controls the balance between fitting and smoothness. Jackwerth (2000) explained this choice could lead to a slight oversmoothing but remove spurious multimodalities. We have also tried the optimized bandwidth and the results are similar. W.J. Liao and H.-C. Sung North American Journal of Economics and Finance 54 (2020) 100826 5