Stock Trading Using PE Ratio Based on Bayesian Inference

: The Price Earnings (PE) ratio is one of the most widely applied tool for the firm valuation in a security market. Unfortunately, recent academic developments in financial econometrics and machine learning have rarely looked at this tool. In the paper, we propose to formalize a process of fundamental PE ratio estimation by employing Dynamic Bayesian Network (DBN) methodology. Forward-backward inference and Expectation Maximization (EM) parameter estimation algorithms are derived with respect to our proposed DBN structure. A simple but practical trading strategy is invented based on the result of Bayesian inference. We make stock trading experiments using Thai stocks and American stocks, respectively. Extensive experiments show that our trading strategy statistically outperforms the buy-and-hold strategy.


Introduction
With the rapid advancement of machine learning technology, recent works make at-tempts to incorporate these machine learning techniques to construct trading systems that support decisions of investors in security markets (Yeh et al., 2011;Lu et al., 2009;Wen et al., 2010;Hassan, 2009;Kao et al., 2013;Kazem et al., 2013).
Existing works, however, have common limitations. Firstly, the discovered patterns are so complicated (highly non-linear) and lacked of financial interpretation. Secondly, each financial time-series has to be trained separately, resulting in one set of distinct patterns for each different security. In other words, there is no common pattern in the data of interested securities. Thirdly, because of pattern complexities, practical trading implementations are not easy for some investors. In fact, sophisticated trading program has to be constructed by users themselves. Fairly speaking, although having the mentioned limitations, the core philosophy of existing research matches the philosophy of one certain investor group called technical analyst (Murphy, 1999;Shannon, 2008). Technical analysts believe in price patterns and do not pay much attention to economic interpretation of the patterns. Therefore, this line of existing research may benefit this group of investors.
On another side of investment practitioners, there is a group named fundamentalists whose trading strategies have clear financial interpretations and are based on well-defined financial information (Mark, 2011;Damodaran, 2012;Lynch and Rothchild, 2000). Price Earnings (PE) ratio is one of the most widely applied valuation toolkits for fundamentalists to make their investment decisions (Damodaran, 2012;Henry et al., 2010). Also, investment recommendations by security analysts are often based on PE ratio (Carvell et al., 1989). Nevertheless, it is unfortunate that recent academic advancements in financial econometrics and machine learning rarely look at this tool.
In this research, we apply the powerful framework of dynamic Bayesian network (Bishop, 2006;Murphy, 2012) to model the valuation process using PE ratio. The main contributions of our work are threefold. Firstly, we propose to apply the machine learning framework to formalize the PE ratio valuation process which somehow rarely gets attention from academic re-searchers. In contrast to existing machine learning frameworks mentioned above on price pattern discovery where the discovered patterns have no meaning in finance, the interpretation of our model is well justified according to behavioral finance (Szyszka, 2013) as explained in the Section 2. Secondly, as our proposed dynamic Bayesian network having non-standard structure compared to literatures (Bishop, 2006;Murphy, 2012), we have derived the new inference formulas by applying the forward-backward methodology and the new parameter estimation algorithm according to the concept of Expectation-Maximization (Bishop, 2006;Murphy, 2012), see Section 3. Thirdly, based on the result of Bayesian inference, we propose the trading strategy which can be applied to every security. Using the trading strategy, we do experiments on individual firm level and show the statistical significance on portfolio level, see Section 4.

Background of Fundamental Investment Based on PE Ratio and Motivation of Statistical Modeling
The core idea of the PE ratio valuation method is simply that the value of the firm is directly proportional to the annual earnings of the company, i.e., for each firm i: where, * i P denotes the value of firm i, E i denotes the firm's current annual earnings which can be observed in the stock market and * i PE is the firm's appropriate PE ratio, usually assumed to be a constant in a period.
Only the group of fundamentalists believe that the firm value can be able to calculated by Equation (1). Therefore, they usually call * i P as the fundamental price or fundamental value and so * i PE ratio as the fundamental PE ratio. The goal of modelling is to support the group of investors to systematically determine the fundamental PE ratio.
We can observe the market price P i and the annual earnings E i in a stock market. Then we can calculate the observed PE i ratio, that is: It is important to distinguish between the observed PE i ratio (changing everyday due to changes of P i ) and the fundamental * i PE ratio. A simple trading strategy is to compare the value with a market price of the firm.

Strategy
If the firm value is higher than its market price by some threshold, it is considered to be at low price, so that we can buy the firm's stock. We expect to sell it later when its market price is higher than the firm's intrinsic value by some threshold.
It is important to note that the philosophy of this trading strategy is that the market price is not always equal to the value of the firm. The price of the firm changes almost every working day. In contrast, by Equation (1), the firm's value will not change in a short time period provided that there is no new announcement on annual earnings in that period.
Why does a stock price deviate from its fundamental price? works on behavioral finance (Szyszka, 2013) found much evidence to this question. For example, researchers argue that there are noise traders in the market who tend to make irrational actions so the price moves away from its value (Black, 1986;De Long et al., 1990;Hommes, 2013). One of the works found that some investors cannot process new information correctly and so overreact to new information (Werner et al., 1986). What is worse, information which investors overreact to is unconfirmed (Bloomfield et al., 2000) or unreliable (Pound and Zeckhauser, 1990;Tumarkin and Whitelaw, 2001) or even unimportant (Rashes, 2001;Cooper et al., 2001). Also, investors who consult experts may not get much helpful advice since security analysts tend to be overoptimistic (Dechow et al., 2000) and having conflict of interest (Cowen et al., 2006). Finally, it is well known that even rational investors in the market cannot immediately eliminate this irrational pricing due to limit of arbitrage (Shleifer and Vishny, 1997). All the effects mentioned here are able to temporarily move away a stock price from its value for a period of time. The effects continue until either they are cancelled out, or rational investors finally eliminate this mispricing. This reversion phenomena is called mean reversion in literatures.

Dynamic Bayesian Network of Stock Price Movement
Our model simplifies and formalizes the observations described in Subsection 2.1. We divide the temporary effects which cause mispricing into two categories: (1) short-term effects: mispricing effects which last about a few days, e.g., effects caused by noise trading or overreaction to unreliable information and (2) mediumterm effects: Mispricing effects which last several weeks or months, e.g., effects caused by reaction to unconfirmed information which may take time to confirm, or overoptimistic prediction of analysts which may take time to prove. Mathematically, the relation between market price and its fundamental value can be described as the following equation. Since we consider only one firm at a time, we now replace the firm-index subscript i with a time-index subscript t to emphasize the dynamic relationship between price and its fundamental value to simplify the equation: where: z t = A random variable modeling the medium-term noisy effects. To make its effects persist for a period of time, we model z t as a Markov chain ε t = A random variable for the short-term noisy effects which is modeled by a Gaussian ε t ∼N(0,σ 2 ) Assuming PE * as a constant for the period which we observed and following Equation (1), we have: and, therefore, we get the relationship between the fundamental PE and the observed PE: Note that our model is suitable only for a firm with positive earnings E t >0. Equation (5) is central to our idea and can be visualized as shown in Fig. 1.
We can mathematically simplify Equation (5) further: Since ε t is usually small, it can be approximated by ln(1 + ε t ) ≈ ε t and denote y t = ln(P t /E t ), we then have: Note that y t is an observable quantity, while PE * and z t are unobservable, i.e., they are hidden state or latent variables. Note that these are two different types of latent variables, i.e., PE * is constant and z t is time-varying. Thus, Equation (7) is different from standard state-space and graphical models such as Hidden Markov Models or Linear State Space Model (Bishop, 2006). The graphical model of our proposed stock price dynamic has three layers as represented in Fig. 2. In our case, where the model is temporal, the graphical model framework is also called Dynamic Bayesian Network (DBN). The main advantage of DBN is its ability to encode conditional independent properties and hence simplifying probabilistic inference (Murphy, 2012). Another advantage of this framework is that expert knowledge can be integrated in the model naturally as shown in Section 3.
To derive mathematical equations for inference and parameter estimation in the DBN framework, we shall assume that all latent random variables are discrete: z t ∈ {a 1 ,..,a M } PE * ∈ {b 1 ,...,b N }. Furthermore, we have to set up the conditional probability distribution function for each node given its parents. We define the conditional probability distribution functions of all nodes as follows.
The transition probability distribution function. Let i,m∈{1,...,M},t∈{2,3,...}: By Equation (6) If we know all parameters, we can derive inference equations based on the forward-backward algorithm as shown in Subsection 3.1. If the parameters are unknown, we have to estimate them first. In this study, we derive the estimation procedures based on Maximum a Posteriori (MAP) and Expectation-Maximization (EM) algorithms as shown in Subsection 3.2. In Section 3, we will show how to derive both the inference and parameter estimation algorithms.

Bayesian Inference on the DBN of Stock Price Dynamic
As explained in previous sections, our goal is to make an inference on PE * ratio so that we can estimate the fundamental price of a stock. In Section 4, we will show that estimations of {z t } is also useful in investment. To infer the values of these two latent variables, similar to Hidden Markov Models (HMM) and Linear State Space Model (LSSM) (Bishop, 2006;Murphy, 2012), we need to derive equations in two steps: (1) the inference algorithms with known parameters and (2) the parameter estimation algorithms given that parameters are unknown. However, because there are two types of latent states as explained in the previous section, our graphical model shown in Fig. 2 is more sophisticated than HMM and LSSM. In this section, we show the new equations for both inference tasks. To simplify the notation, we use notation 1

Inference with Known Parameters
Suppose θ is known, together with the observed data and PE * = b n given all the observed variables, is given by the following recurrent formula: In the above derivation, Bayes's rule, conditional independent properties (Murphy, 2012) with respect to DBN shown in Fig. 2 and sum rule are applied consecutively to get the above result. The initial equation of the recurrent formula can be derived: p(z 1 = a m ; PE * = b n |y 1 )∝φ mn (y 1 )u m υ n .
Next, we shall calculate the smoothing formula which is the conditional joint probability of mediumterm noisy effect z t = a m and PE * = b n given all the observed variables at the any-date t∈{1,...,T-1}.
For all t∈{1,...,T-1}, (note that 1 Note that conditional independent properties of our DBN are applied in the first step. Also note that p(z t = a m ; PE * = 1 | t n b y ) is in fact a filtering probability. Therefore, we need to concentrate only ( )  , | , T t p z PE y θ , t∈1,...,T−1 in computer program, we also need to solve the formulas for the constants appeared in the above derivations. To fulfil this task, using matrix reformulation of the above recurrent equations is the most convenient and efficient way. Below, we give only the end results because of space limitation. Derivation details can be found in my Ph.D thesis (Haizhen, 2017). Denote the matrix A t = (α tmn ) M ×N, we can show that: where, ο denotes the entrywise (or Hadamard) product of the matrix. Φ t and W denote the emission matrix and transtion matrix, respectively, as described in Equation (9) and (8). For the initial case, we have: where, u and v are as defined in Equation (10) and (11). To get a matrix formula for a smoothing density, we first define: And:

Inference with Unknown Parameters
In general situations, θ is unknown, so only the observed data 1 T y is available. In this case, θ must be estimated first. Expectation Maximization (EM) is a general method to estimate the parameters θ for probabilistic models with latent variables. Here, we formulate our parameter estimation in the Maximum a Posteriori (MAP) setting so that 11 expert's prior knowledge can be employed into the model. Formally, we would like to solve the following problem of maximizing the posterior probability distribution function of θ. Where: Then, EM repeats the two steps until θ (j) converges. Note that EM guarantees to find a local maxima of Equation (22). The argument in the expectation of Equation (24)  According to DBN, they are simply the logarithms of the emission pdf, transition pdf and initial pdf, respectively. By equation manipulations, the expectation Equation (24) can be calculated by employing the smoothing probabilities already done in the E-step. As a result, we get a closed form of Equation (24). Combining with the ln p(θ) term described below, the constraint maximization Equation (24) is well defined and readily to be solved by using the method of Lagrange multipliers. All derivations details, which have the same mathematical structure for the simpler case of HMM, are quite long and can be found in my Ph.D thesis (Haizhen, 2017).
Experts can put their knowledge into the parameter estimation procedure via p(θ) in Equation (23). Here, we assume that all parameters are independent, i.e., p(θ) = p(σ)p(u)p(v)p(W). Often, experts may be able to estimate the range of appropriate PE * ratio by analyzing a firm's business strategy together with competitions in its industry. The prior p(v) for the vector v = (υ n )N×1 can be represented via the Dirichlet distribution: Intuitively, k n , n∈{1,...,N} is a degree of belief for each possible PE* ratio value b n . Experts can employ their believes that some value of PE* ratio, e.g., b i is relatively more probable than other values by giving k i relatively higher value than other k n ; n ≠ i.

Experiments
In this section, we illustrate benefits of our methodology in real-world applications. To do this, we will conduct comprehensive trading simulations to show superior performances of our method over the benchmark. Buy-and-hold strategy is simple and widely used and we use it as the benchmark. We will test our method against this buy-and-hold strategy on individual stock level and show the statistical significance on portfolio level.

The Data
We collected the data sets from Stock Exchange of Thailand (SET) in Thailand, NYSE in US, respectively. The data sets are daily stock prices of 10 firms from each country. Each selected firm is well established and has at least 5 year historical trading data. The criterion for our model is that the historical yearly earnings are positive. The historical data for each firm are from Jan 1, 2012 to Sep 30, 2016 consisting of 1160 closing prices for stocks in SET and 1195 closing prices for stocks in NYSE, respectively. The difference in the number of data is due to different working days in the two countries. The historical prices and the historical earnings are adjusted according to stock splits. Fig. 3. Example of long-term strategy trading of CPALL with threshold Tr = 5% where our model's profit is 58.43% while "buy & hold" profit is 44.46%."Green circle" denotes "buy" and "Black cross" denotes "sell". The figure shows trading with respect to the "PE" perspective where Red line denotes PE * . Here, it is easy to see our strategy in action: when the observed PE is lower or higher than the threshold level, buy or sell is triggered, respectively Fig.4. Example of medium-term strategy trading of CPALL with threshold Tr = 5%. In addition to those explained in Figure 3, in the figure, the purple dashed line denotes PE * (1 + z t ), and becomes the base line of this trading strategy. Note that our Bayesian method estimates the purple line by the method of "filtering" which tracks the observed PE movements with some delay. In this example, "buy & hold" method beats ours by small margin because of the commission fees caused by our frequent trading

Experiment Setting
We will make trading simulations in the markets of two different countries. To avoid duplicated writing, we shall explain only experiment settings for stocks in SET with historical price P 1,..., P 1160 and historical earnings E 1 ,..., E 1160 which are defined as the summation of the most recent 4 quarterly earnings.
The experiment settings for stocks in NYSE are done similarly.
The first 3-year historical data (Jan 1, 2012 to Dec. 31, 2014) P 1 ,..., P 735 and E 1 ,..., E735 will be used as a training data for our Bayesian methodology to learn the parameters θ = {W, u, v, σ 2 } and estimate the most probable values of PE* and {z1,...,z735} as explained in Section 3. The remaining 2-year historical data is to measure the performance of both our method and the benchmark.
The performance measurement metric is, as used by practitioners, a profit generated by each method. For each trading simulation, an same initial amount of cash I is given and a commission fee is taken into account.
Using the benchmark, we can do in 3 steps to calculate the profit for each stock as follows: • Buy the stock with all cash I and get C.I/P 736 shares, where C ≈ 0.9987 represents the value of assets after taking SET's commission fee into account • Do nothing until the end and we get the market value of P 1160 .C.I/P 736 • The profit is P 1160. C.I/P 736 -P736 Based on the results of our model, we propose two possible versions inspired by our model's main idea (Fig. 1) and Strategy (buy low, sell high) described in Section 2 The first version called long-term strategy is simply to buy low, sell high" with respect to the static value of PE* and the second version called mediumterm strategy is to buy low, sell high" with respect to the dynamic values of PE * (1 + z t ) where each z t is dynamically estimated by the method of filtering described in Subsection 3.2. Both versions can be formally described as follows.
Let I t and N t be available cash and total shares at date t, respectively. Initially, I 736 = I and N 736 = 0. Now, both trading versions can be defined simply by the following procedure: for each date t, exactly one of the following cases holds: • P t /E t ≤A t (1 − Tr) and I t >0 (buy-low case) where Tr∈(0, 1) is a threshold, A t = PE * for the long-term strategy and A t = PE*(1+z t ) for the medium-term strategy. In this case, buy the stock with all cash, so that N t+1 = C.I t /P T and I t +1 = 0 • P t /E t ≤A t (1 − Tr) and I t = 0 (sell-high case). In this case, sell all the holding stock to get cash I t +1 = P t . N t . C and N t+1 = 0 • If case (i) and case (ii) are not satisfied, do nothing.
So, I t+1 = I t and N t+1 = N t At the end of a trading simulation t = 1160, the total profit is simply I 1160 +P 1160 .N 1160 -I 736 , so that we can compare with the buy-and-hold profit.
We give some illustrations of our trading in actions which are shown in Fig. 3 and 4. Figure 3 is an example of long-term trading of CPALL with threshold 0.05 and Fig. 4 is an example of medium-term trading of CPALL with threshold 0.05.

Experimental Results and Statistical Significance Individual Level Experiments
To avoid bias in our experimental results, we test 4 different thresholds for each trading strategy. Note that the thresholds in the medium-term trading are relatively smaller than those in the long-term. This is due to the nature of medium-term strategy where PE t has a smaller deviation from its base line PE*(1 + z t ) compared to the long-term strategy's base line PE * , not containing the effect of z t . The experimental results with respect to Thai stocks and US stocks are shown in Table 1 and 2, respectively.
From the tables, we can see that in the total of 80 trading simulations on SET firms, our method results in greater performance 41 times, while buy-and-hold strategy results in better performance 19 times (the remaining 20 times are draws). Similarly, in the total of 80 trading simulations on NYSE and NASDAQ firms, our method results in greater performance 36 times, while buy-and-hold strategy results in better comparison 20 times (the remaining 24 times are draws). Summing up results of markets in the two countries, our method outperforms buy-and-hold strategy 77 times, yet underperforms only 39 times. We shall analyze statistically significance of the results in the next subsection.
From Tables 1 and 2, it can be seen that there are 44 draws, which occur only in the cases of the long-term trading strategy. Disregarding the draws, our long-term trading strategy still beats the benchmark with 22 wins versus 14 loses. This is mainly due to the volatility of the observed PE in most stocks so that our strategy of buying in an undervalued price and selling in an overvalued price with respect to PE * is possible.
On the other hand, the results of our method equipped with medium-term trading strategy show impressive superiority, 55 wins versus 25 loses to the benchmark.
The key factor of success is its tracking ability of the medium-term noisy effect z t by our filtering algorithm presented in Section 3. When the new base line PE * (1+z t ) is predicted accurately, undervalued and overvalued prices are also accurately detected and so the probability of our profitable trading is increasing.     Table 3. Experimental results in portfolio level testing. X denotes a random variable representing difference in % profit between our model and the benchmark. The distribution of X is estimated using the method of Boostrap Resampling. Bold face denotes a case where there is more than 80% confidence that our method is more superior or equal to the benchmark. Long-term thresholds Medium-term thresholds --

Statistical Significance: Portfolio Level Experiments
In this subsection, to analyze statistically significance of the results more formally, we construct a portfolio of stocks and test its performance against the benchmark. Here, we use a rule-of-thumb commonly employed in practice saying that a good portfolio should consists of around 15 stocks.
To test the performance of a 15-stock portfolio of our method against the benchmark, we employ the method of boostrap resampling. For each boostrap sample, a set of 15 stocks are selected randomly from Tables 1 and 2 to form an equally-weighted portfolio. We are interested in the different performance between our method and buy-and-hold strategy on each boostrap sample. After all bootsrap samples are drawn, we can also estimate the average different performance between the two methods. More precisely, let X be a random variable representing difference in % profit between our model and the benchmark (our % profit minus the benchmark's % profit). By repeating the boostrap re-sampling 10,000 times, we are able to construct an empirical distribution of X. This empirical distribution allows us to calculate E[X], the average % profit difference between the two methods and Pr(X≥0), the probability that our method has superior or equal performance to the benchmark. The results are shown in Table 3.
From Table 3, our method beats the benchmark on every case on average (since E[X] > 0 for all cases). Pr(X≥0) is significantly increasing and most cases have the confidence levels of superiority greater than 80%. This statistically confirms the superiority of our method over the benchmark on selected stocks. This phenomenon of confidence -level increasing is due to the diversification effect on portfolio with a higher number of stocks.

Conclusion
In this study, we propose to apply the advanced Dynamic Bayesian Network (DBN) methodology to model stock price dynamics with two latent variables, namely, the fundamental PE ratio and the medium-term noisy effect, respectively. We have derived both inference and parameter estimation algorithms. Based on the results of our model, we propose two versions of stock trading strategy. Experiments in both individual firm-level and portfolio level show statistically significant superiority of our method.