Comparative Analysis of the Artificial Neural Networks Options Pricing Model Under Constant and Time-Variant Volatilities

: Option pricing using artificial neural networks (ANN) model while relaxing the assumption of constant volatility still remains a challenge. The conventional practice for pure ANN models has been to either model volatility using the very ANN model and have the model output fed as an input to the ANN option pricing model, or to make allowances for a large number of lags directly as inputs to the option pricing model with the belief that the ability of ANN to incorporate flexibility and redundancy creates a more robust model. This has been done in spite of a well-known fact-that financial time series data harbors a set of characteristics such as volatility clustering, leptokurtosis and leverage effects-features that ANNs in their pure forms have proved inadequate in capturing. Consequently, this study sought to follow the conventional methods employed by other studies and developed two pure ANN option pricing models-one with constant volatility and the other while violating the assumption of constant volatility with an aim of establishing whether significant differences exist in the outputs of the two models. The intraday data for the AAPL stock option for the period between December 2016 and March 2017 with 56,238 data points was used in validating the developed models. Results indicate that the ANN model (with varying volatility) makes better predictions than the model with constant volatility. However, the difference between the performance of the two models is not significant at 0.05 level of significance.


Introduction
For the past two decades, the Black-Scholes Model (BSM) has continuously received considerable attention especially in underlying probability attributes of an European call option on a non-dividend stock (Al Saedi and Tularam, 2018) and has been identified as the basic building block of the financial derivatives theory (Wilmott et al., 1995). However, through numerous studies, inconsistencies have consistently been proved to exist between the market prices and the BSM output-a vice that has been associated with the highly questionable assumptions that underpin the development and use of the BSM, including but not limited to: Constant volatility, existence of efficient markets, non-dividend paying structures, known and constant interest rates, log-normally distributed returns, commission free transactions and a perfectly liquid market.
In a bid to contain this discrepancies, advances have been made both in the model itself and its respective solution methods-both analytical and numerical. The advances made on the analytical solutions include but are not limited to: the use of the generalization technique in which parabolic partial differential equations were reduced to canonical form (Harper, 1994); the finitedifference methods to provide the exact solution of the Black-Scholes Equation (BSE) (Forsyth et al., 1999), Mellin transformation , application of time-varying parameters technique in coming up with a simple derivation of the explicit formula of an option in time-dependent parameters of the Black-Scholes PDE (Rodrigo and Mamon, 2006), use of the Adomain approximate decomposition technique (Bohner and Zheng, 2009), application of the Projected Differential Transformation Method (PDTM), a modification of the Differential Transformation Method (DTM) on the BSE for European option valuation (Edeki et al., 2015) and finally the use of Laplace transform to provide a solution to the Black-Scholes terminal value (Shin and Kim, 2016), all as reviewed in detail by Al Saedi and Tularam (2018).
On the other hand, numerous advances have similarly been made on the numerical solution including the use of the: Mellin transform , semidiscretization technique (Company et al., 2008), Crank-Niolcon method (Ankudinova and Ehrhardt, 2008), Cubic spline wavelets and multi-wavelet bases method (Černá et al., 2016), the two-step backward differentiation formula in the temporal discretization and a Higher-Order Difference approximation with Identity Expansion (HODIE) scheme (Rao, 2016), among others. A detailed review on the use of these techniques can be found on Al Saedi and Tularam (2018).
In spite of such and many more concerted efforts having been employed to improve the performance of the BSM, findings from numerous studies still show that these techniques have not been able to very well capture more realistic aspects that characterize day-to-day transactions such as the presence of transaction costs, high volatility, illiquid markets and large investor preference. In addition, market participants change their option pricing attitudes from time to time and thus, many researchers hold the view that existing complexity issues require non-linear solutions.
It is against this background that a good number of researchers shifted focus to non-parametric models flexible enough and capable of capturing the non-linear patterns that characterize the observed market data. One such model is the Artificial Neural Network (ANN) model. ANNs have increasingly become promising alternatives to the BSM and other parametric option pricing models. This is due to the fact that unlike the BSM, they do not require or rely on any underlying assumptions (Malliaris and Salchenberg, 1993) and are trained inductively using historical or implied input variables and option transaction data (Andreou et al., 2006). Their popularity as well as use in option pricing is constantly increasing as reflected in studies by (Enke and Dagli, 2007;Mostafa and Dillon, 2008;Hajizadeha and Seifia, 2011) among others. Malliaris and Salchenberg (1993), are among the pioneers of ANN option pricing models. The duo sought not only to develop an ANN model that processes financial input data to estimate the market option prices for European options at closing, but also to compare this network's ability to the BSM.
Comparisons revealed that the MSE for the ANN was less than that of the BSM in more than half the cases examined and for both in-the-money and out-of-the money prices. Based on these results, the study concludes that the ANN methodology offers a valuable alternative to estimating option prices to the traditional BSM.
Another immediate attempt was by Hutchnison et al. (1994). They used three different network architectures namely: Radial Basis Function (RBF), Multilayer Perception (MLP) and Projection Pursuit Regression (PPR) to fit both Monte-Carlo simulated Brownian underlier and Black-Scholes option data and S and P 500 futures thereof. The authors however used a minimalist approach in the selection of their inputs and restricted the network inputs to time to maturity (T-t) and Moneyness. Interest rate and volatility were also assumed to be constant. Since then, a good number of studies have been conducted on option pricing using ANN (Amilon, 2003;Mostafa And Dillon, 2008;Hajizedah and Seifi, 2011;Gradojevic et al., 2007;Mitra, 2012).
We however note that a good number of these studies have ended up curing the linearity aspect that existed in the parametric models while still clinging to the same assumptions on the BSM. Hutchnison et al. (1994) for instance, by using a minimalist approach in the selection of their inputs and restricting the network inputs to time to maturity (T−t) and Moneyness, ends up holding interest rate and volatility constant-an assumption that draws criticism in the BSM. Secondly, majority of the studies using pure ANN models have handled the time variant volatility component by pre-modelling it using ANN, before using it as an input to the ANN option pricing model. However, it may not pass unnoticed that financial data is characterized by some features such as volatility clustering, leptokurtosis among others that ANN have proved inadequate in capturing. This study therefore seeks to determine whether this technique has any significance in modelling option prices, by comparing purely ANN option pricing models with timevariant volatility and ANN models with constant volatility, so asto establish whether the former has any impact on the option prices vis-à-vis the later model.

Data
The study used intraday data for the AAPL stock option for the period between December 2016 and March 2017 with 56,238 data points.

The Case of Constant Volatility
Since our objective in this subsection is to price under the no arbitrage assumptions, with constant interest rate and constant volatility, the model inputs consists of time to maturity and moneyness, λ, which is the ratio of the strike price to the underlying asset price defined by: There is no celebrated theory for the determination of an optimal ANN architecture. Rather, this is generally realized by fixing the number of layers and choosing the number of nodes in each layer. There are always two layers representing the input and output variables in any neural network and therefore choosing the number of hidden layers is always a puzzle in ANN structure. The number of hidden nodes allow neural networks to capture non-linear patterns and detect the complex relationships in the data. Networks with too voluminous hidden nodes may cause over-fitting glitches leading to poor forecasting capacity. Previous research e.g., by Shahin et al. (2002) has shown that one hidden layer is sufficient to approximate any continuous function, provided that sufficient connection weights are given.
The very first step in ANN modelling involves the transformation of input variables to values between 0 and 1. As Najjar et al. (1997) pointed out, this adjustment is effective in ensuring faster training by preventing larger values from overriding smaller ones. As a result, the inputs to this model were first transformed as follows: For moneyness: 1,2,..., and for time to maturity we had: The normalized inputs are then multiplied by weights (to be determined), after which they act as inputs to each of the hidden nodes. As a result, the input to the j^th hidden node is obtained as follows: For moneyness, we have the transformation: Where: I αjM = The normalized moneyness input to the j th hidden node ω jM = Weights connecting the normalized moneyness input to the j th hidden node M = Index notation for moneyness j = The hidden neuron position for j = 1, 2, …, H On the other hand, for time to maturity, we have the transformation: Where: I αjM = The normalized time to maturity input to the j th hidden node ω jt = Weights connecting the normalized moneyness input to the j th hidden node M = Index notation for moneyness j = The hidden neuron position for j = 1, 2, …, H Ultimately, the net input to the j th hidden node is obtained as: One of the major strengths of neural networks is their ability to capture nonlinear patterns in a given dataset. This is achieved through the use of activation functions which process data in the hidden layers of the model. Activation functions not only capture non-linear relationships existing between inputs and outputs, but also constrain the values of the output layer into a bounded domain. The choice of an activation function is therefore critical and should be sensitive to the nature of data as well as the desired output.
The sigmoid function is the only function that in addition to producing positive outputs, produces outputs whose values range between 0 and 1. This is necessary first, to satisfy the option pricing conditions such as the positivity of option prices and secondly in rescaling the inputs from other neurons back to (0,1). The sigmoid function (defined as in Equation 7) As a result, the input I Hj of the j th hidden node is transformed as follows: Where: The output from the j th hidden node, for j = 1, 2, …, H H = The number of hidden nodes ( ) The input to the j th hidden node ω j0 = The bias to the j th hidden node α = 1, 2,…, H Equation 8 represents the output of the j th hidden node, which needs to be combined with the outputs from the other H-1 hidden nodes into a single value to act as an input to the output node. This is done through a connection of weights linking the hidden neurons to the output neuron as follows: , , , , , Where: ω 0 = The bias to the output node ω j0 = The weight connecting the j th hidden node to the output node From the output neuron, the input I oj defined in Equation 13 is transformed using the sigmoid function defined in Equation 7. This is to ensure that the option prices modelled remain positive and that the values remain in the range of 0 and 1. This transformation implies that the resultant output from the neuron is of the form: Finally, a reverse of expression 2 and 3, i.e.: is used to de-scale the scaled neuron output and produce the target output for each α, for α = 1,2, …, n and where Equation 11 is the target output of our model, whose difference with the real time option prices c P ɶ we try to minimize in training. This output, for each α = 1, 2, …, n can be re-written as in Equation 12: , , , ,

Determining H, the Number of Hidden Nodes
For a single hidden layer network, several rules-ofthumb have been suggested for obtaining the best number of hidden layer nodes. These include: (a) Taking the number of hidden nodes to be 75% of the input units (Salchenberger et al., 1992). In that case, this implies that the H in this study would be obtained as: Taking the number of hidden nodes to be somewhere between the average and the sum of the input and output nodes (Hajela and Berke, 1991), implying that in our case, H would lie in the range: (c) Fixing an upper limit and then working backwards. Hecht-Nielsen (1989) and Caudill (1988) suggested that the upper limit of the number of hidden nodes be twice the number of inputs plus 1. Letting I be the number of inputs, H in this case would be obtained as follows: 2 1 2(2) 1 5 This study proposes and adopts a method for number of hidden neurons (H) determination in which all the three proposals by (Salchenberger et al. (1992), Hajela and Berke (1991) and Hecht-Nielsen (1989) Caudill (1988 Let H denote the number of hidden neuron, I-the number of input nodes and O-the number of output nodes. Then: ,0.75 2 1 2 This is so because, clearly, the minimum number of nodes would fall between seventy five percent of the inputs and the average of the input and output nodes, while the maximum would be the Hecht-Nielsen's and Caudill's 2I+1.

Training to the Network
The SSE was used in training the network, in which case the weights were adjusted in such a way that the SSE between the targets c P ɶ and the models output 1 ( , , , , ) t X S T t γ ω was minimized. Since the study used the bipolar activation function, the Quasi-Newton algorithm was used in minimization.

The ANN Option Pricing Model with Time-Variant Volatility Volatility
Though the Black-Scholes model and some other option pricing models have always assumed constant volatility, empirical evidence strongly suggests otherwise. In incorporating time-variant volatility, ANN forecasts of historical volatility were used as inputs in addition to moneyness t X S and time to maturity (T-t).
The historical volatility was first computed as follows.
Let r t denote returns in the price of the underlying asset (stock) given the asset price at t-1 defined as: Further, define the average of r t over a time T and denote it by r as: Then, the historical volatility is given by: Consequently, a d-days historical volatility is obtained by substituting T by d, for T<d, to obtain moving averages or order d-1 of the form: The conventionally and widely used number of days (d s ) in computing historical volatility are d = 10, 20, 30. Thus to allow for comparison and taking into account the number of trading days for our data, the study took d = 10.
Conventionally, non-constant volatility in option pricing using ANN has been modelled by taking various lags alongside other variables such as lags of an underlying asset as inputs to the ANN and varying this inputs until the optimal model is obtained. Malliaris and Salchenberg (1995) for instance used this approach and ended up with 13 input variables. This study chose to use various volatility lags as inputs to the ANN volatility model. The implication is that we began by expressing historical volatility as a function of its own lags, so that: Consequently, we would have K input nodes representing the K volatility lags and a single output neuron representing the d-day historical volatility. Similar steps explained from equation 2 to 12 are used in modelling volatility so that we end up with a transformed volatility output which we denote by σ dα . The outputs of this ANN volatility model (σ dα ) are then fed as inputs to the ANN option pricing model in addition to moneyness and time to maturity.
Just as before, the now three inputs are first transformed to ranges between 0 and 1. From here, similar steps used in developing the ANN model with constant volatility are used with the only difference being the additional input node representing the time variant volatility. This is summarized diagrammatically and mathematically as in Fig. 1 follows: ( ) ( ) As it can be seen, the resultant output from the output neuron is of the form: A reverse of the scaling expression i.e: descales the scaled neuron output and produces the target output Equation 23 gives the target output of the model whose difference with the real valued option prices C P ɶ we minimize in training using the quasi-newton, just as it were with the constant volatility model.

Results and Discussion
In validating the developed model, the study used intraday data for the AAPL stock option for the period between December 2016 and March 2017 with 56,238 data points. Of these options values, 59.15% were In The Money (ITM) while 40.85% were Out Of The Money (OTM). The data was divided into three sets with 50% (28,119) used in training the model, 25% (14,160) used in testing and the remaining 25% (14,059) used in the validation of the model.
Among the tasks in the development of our ANN models was to determine the optimal number of hidden layers (which structurally had to be the same), the learning difficulty of the models in terms of the training time and the performance of the model in terms of the differences between the predictions generated by it and the target output. For the ANN option pricing model with constant volatility, results indicated no variations between the number of hidden neurons and the SSE. However, variation existed between the number of neurons and the network training time as shown in Fig. 2.
Results further indicate that in spite of the neurons giving similar SSEs, two hidden neurons had the least training time of 0.03s. As a consequence, a model with two hidden neurons and a single layer of hidden nodes was adopted.
Interest was also drawn to how the various learning rates impact on the training time of the model. As shown on Figure 3, the learning rate of 0.3 provided the quickest training time of 0.3s. Other statistics worth noting is that the number of iterations were varied between 1000 and 100000 with the optimal weight being achieved in 59,676 iterations.
The SSE, MSE and the RMSE were used to measure the performance of the ANN model with constant volatility during training, testing and validation. Table 1 summarizes the error statistics obtained in the three phrases.
An analysis of the performance of the model with regards to ITM and OTM options reveals that the model predicts the OTM option prices better than the ITM option prices. This result is consistent with what was obtained by Malliaris and Salchenber (1993), whose conventional ANN model revealed that the ANN model performed better in modelling OTM option prices compared to ITM option prices. Secondly, the pattern in which the validation errors are slightly above the testing ones exhibited by the predicted values for both the ITM and OTM is also exhibited by the ITM option prices but not the OTM option prices, since for OTM, the validation errors are uniquely less than ITM errors. The study therefore concluded that the pattern exhibited by the joint model could be a carryover effect from the ITM option prices which, by the way, formed a greater share (59.15%) of the option contracts analyzed. Further it demonstrates the nature of the data as the training, testing and validation datasets were separated hierarchically. Table 2 summarizes the results. As also noted, in the second part of this study, in addition to moneyness and time to maturity, the study used historical volatility as one of the option pricing variables. The conventional practice adopted by the ANN option price modelers dealing with time-variant volatility is to make allowances for a large number of lags (in the case of volatility) with the belief that the ability of ANN to incorporate flexibility and redundancy creates a more robust model. This is exactly what was done in this section with volatility lags being varied beginning with lag one and increasing the number of lags until the optimal model was obtained. While dealing with volatility, attention was drawn to the effect of the lag variation on the model complexity in terms of the structure (no of hidden layers and nodes therein, learning rate, training time, etc.) and the model performance. The study observed these variables while varying the lags beginning with lag 1. 3-D plots (Fig. 4 to 13) were used to monitor the patterns of change for every lag with respect to the number of hidden nodes, the learning rate and the SSE.
At lag 1, it was observed that there were no outliers in the SSE with its values mostly ranging between 0.84134 and 0.84237, with the number of hidden nodes H ranging between 1 and 3. From these, the minimum SSE (0.84134) is obtained at the combination of 2 hidden nodes and a learning rate of 0.8 with a training time of 0.01s. At the second lag, the number of hidden nodes H ranged between 2 and 7, resulting from equation 3.21 formulated and adopted in the methodology. Nonetheless, the hidden nodes that yielded minimum SSEs across the learning rates (h) ranged from 2 to 5. Lower rates also had higher values of H than higher rates. The SSE also ranged between 0.84488 and 5.09788, with only 3 values out of the 36 being greater than 5. The minimum value of the SSE attained at this point was 0.84488, with H = 2 and training time T = 0.02s. For lag 3, the number of hidden nodes H ranged between 2 and 7 as well. However, unlike in the second lag in which . The SSE ranged between 0.8386 and 5.10316, with the latter values obtained at the learning rate of 0.8 and 0.9. The minimum value of the SSE attained at this point was 0.8386, with H=4 and training time T = 0.03s -all high from the preceding lag. At the fourth lag, the number of hidden nodes H ranged between 3 and 9. Nonetheless, the hidden nodes that yielded minimum SSEs across the learning rates (h) ranged from 3 to 7. Lower rates also had higher values of h than higher rates. The SSE also ranged between 0.83152 and 5.10319, with an increase as well of SSEs greater than 5. The minimum value of the SSE attained at this point was 0.83152, with H = 4 and training time T = 0.02s.   . The SSE also ranged between 0.06955 and 5.10319, with an increase as well of SSEs greater than 5 still beginning from a learning rate of 0.4. The minimum value of the SSE attained at this point was 0.06955, with H = 9 and training time T = 6.78s, a sharp increase in training time.
As of lag 9, the number of hidden nodes H ranged between 5 and 19, with The SSE ranged between 0.01895 and 5.10319, with the latter values having been obtained from the learning rate of 0.3. This is the lag that produced the minimum value of the SSE (0.01895), with H = 10 and the training time T = 11.08s -an all high training time in all the lags. At the tenth lag, the number of hidden nodes H ranged between 6 and 21. The hidden nodes that yielded minimum SSEs across the learning rates (h) had a range ( ) . The SSE also ranged between 0.44025 and 5.10319, with an increase as well of SSEs greater than 5 beginning from a learning rate of as low as 0.1. The minimum value of the SSE attained at this point was 0.44025, with H = 7 and a training time T = 0.08s. From this lag onwards, the minimum SSE began increasing going forward to an all value of greater than 5.
The ultimate and most superior measure of any model is the ability to produce outputs whose differences from the target outputs is as little as possible. This is demonstrated by the value of the errors in the model. As it can be seen in Figure 16, the minimum SSE was obtained at lag 9, a lag whose optimal number of hidden neurons were 10 (Fig. 15), a training time of 11.08s (Fig.  14) and a learning rate of 0.6 (Fig. 17). This implied that the volatility model adopted at this point consisted of 9 input nodes, 10 hidden neurons and a single layer of hidden nodes, with a single output node. It is the output of this model that formed part of the volatility input in the ANN option pricing model, in which case the input nodes were three, a single hidden layer with 1 output node.
The SSE, MSE and the RMSE were used to measure the performance of the ANN option pricing model with non-constant volatility during training, testing and validation. Table 3 summarizes the error statistics obtained in the three phases. The performance of this model was better with very slightly large errors compared to the previous model as the difference between the heteroscedastic and the constant volatility model was in most cases 0.00413 (MSE training), -0.00082 (MSE testing), 0.00319 (MSE validation), 0.0148 (RMSE training), 0.00017 (RMSE testing) and 0.00817 (RMSE validation).
An analysis of the performance of the model with regards to ITM and OTM options reveals that the model also predicts the OTM option prices better than the ITM option prices. Compared to the previous model in which volatility was assumed to be constant, this model performed better in modelling ITM option prices while the constant volatility model performed better in modeling OTM option prices. As shown in Table 4, for the ITM options the difference between the heteroscedastic and the constant volatility model was -0.027914 (MSE validation) and -0.064474 (RMSE validation), while for the OTM options the difference between the heteroscedastic and the constant volatility model was 0.000486 (MSE validation) and 0.002266 (RMSE validation).
It is however worth noting that despite the fact that the ANN model with non-constant volatility performs better than the one with constant volatility, the difference was not significant at 0.05 level of significance.

Conclusion
This study followed the conventional methods employed by other studies and developed two conventional ANN option pricing models -one while holding volatility constant and the other while violating the assumption of constant volatility. In the latter case, in handling volatility inputs, the study used volatility lags just as it were the case with Malliaris and Salchenberg (1993) in which case it started with a few input lags and went on increasing. The performance of the two models developed were mainly measured using the SSE, MSE and RMSE. Results indicate that all the two ANN models modelled the AAPL OTM option prices better than ITM. Secondly, the ANN model (with varying volatility) makes better predictions than the model with constant volatility. However, the difference between the performance of the two was not significant. In terms of model complexity, the study found that 1 layer of hidden nodes was sufficient in developing all the three models, the number of inputs and the value of H notwithstanding. Finally, in terms of learning difficulty, as depicted by the training time, the constant volatility model performed better (0.3 s) compared to the heteroscedastic model with lags (11.08 s). This was attributed to, first, the lesser number of inputs compared to the heteroscedastic model with lags.