Bootstrap Method for Dependent Data Structure and Measure of Statistical Precision

Problem statement: This article emphasized on the construction of vali d inferential procedures for an estimator θ̂ as a measure of its statistical precision for depe ndent data structure. Approach: The truncated geometric bootstrap estimates of stan dard error and other measures of statistical precision such as bias, coefficient of variation, ratio and root mean square error are considered. Results: We extend it to other measures of statistical preci sion such as bootstrap confidence interval for an estimator ̂θ and illustrate with real geological data. Conclusion/Recommendations: The bootstrap estimates of standard error and othe r measures of statistical accuracy such as bias, ratio, coefficie nt of variation and root mean square error reveals the suitability of the method for dependent data struct ure.


INTRODUCTION
Ever since its introduction by Efron (1979), considerable attention has been given to bootstrap methods as an application of theoretical and methodological problems for statistics. The bootstrap method for estimating the distribution of an estimator or test statistic by resampling one's data or a model estimated from the data, are available for implementing the bootstrap and the accuracy of bootstrap estimates depend on whether the data are a random sample from a distribution or a time series process.
A typical problem in applied statistics involves the estimation of an unknown parameter θ. The two main questions asked are (i) what estimator θ should be used? (ii) Having chosen to use a particular θ , how accurate is it as an estimator of θ? (Efron and Tibshiran, 1993). The bootstrap is a general methodology for answering the second question. It is a computer based method, which substitutes considerable amounts of computation in place of theoretical analysis.
This study is concerned with application of bootstrap method to stochastic time series process and we proposed a non-parametric bootstrap method called a truncated geometric bootstrap method for stationary time series data. The procedure attempts to mimic the original model by retaining the stationarity property of the original series in the resample pseudo-time series. The pseudo time series is generated by resampling blocks of random size at each truncation, where the length L of each blocks has a truncated geometric distribution with appropriate probability attached to it. This method shares the construction of resampling blocks of observation with replacement to form pseudotime series of equal or less, with the original series, so that the statistics of interest may be recalculated base on the resampled data set. The method has two major components, the construction of a bootstrap samples and the computation of statistics on the bootstrap samples, through some kind of a loop.
The procedure provides and estimates different measures of statistical accuracy for an estimator θ , such as standard error, bias, coefficient of variation and root mean square error. We extended it to other measure of statistical accuracy by application of bootstrap-t confidence interval with a goal to improve by an order of magnitude upon the accuracy of the standard intervals ( ) Z α σ θ ± , in a way that allows routine application even to a complicated problems and it produced good approximate confidence interval. Most of the proofs and technical details are omitted, these can be found in the references given, particular (Diciccio and Efron, 1996;Efron, 1984;Efron and Gong, 1983;Efron and Tibshiran, 1986).
We described how the bootstrap works, assessing the accuracy or precision of the sample mean. Efron and Tibshiran (1986) described the accuracy of the sample mean for independent data, while in this study it was extended to dependent data structure. Then, a description of the resampling algorithm is as follows: Let Bi, b = [X i , X i+1 , ---,X i+b-1 ] be the block consisting of be observations starting from X i . In the preceding, if j > N, X j is defined to be X k , where k = j(mod N) and X o = X N . Let P be a fixed number in [0,1]. Independent of X 1 , ---,X N , let L 1 ,L 2 , ---be a sequence of independent and identically distributed (iid) random variables having a truncated geometric distribution, so that the probability of the event [L i = r} is K (1-P) r−1 p for r = 1,2,---,N where K is a constant found, using the condition ΣP(L = r) = 1, to be 1/[1-(1-P) N ]. Independent of the X i and L i , let I 1 , I 2 , ---be a sequence of iid random variables that have the discrete uniform distribution on {0,---,N}. Now, a pseudo time series is generated in the following way.
Sample a sequence of blocks of random length by the prescription B I1, L 1 , B I2, L 2 , ---. The first L 1 observations in the pseudo time series 1 * * * 2 N X , X , , X − − − are determined by the first block B I1, L 1 of observations X I1 , ---, X I1 + L 1 -1; the next L 2 observations in the pseudo time series are the observations in the second sampled block B I2 ,L 2 , namely X I2 , ---, X I2 + L 2 -1 . This process is stopped once n observations in the pseudo time series have been generated. Once 1 * * * 2 N X , X , , X − − − has been generated, one can compute the quantity of interest for the pseudo-time series. This method of resampling and generating 1 * * * 2 N X , X , , X − − − defines conditionally on the original data X 1 ,---,X N or probability measure P * and the number of block b at each truncation. It shares the same properties with the stationary bootstrap method of Politis and Romano (1994), since the average length of these blocks is 1/p, it is expected that the quantity 1/p should play a similar role as the parameter b in the moving blocks bootstrap method of Kunsch (1989). Most common statistical methods were developed in the 1920s and 1930s, when computation was slow and expensive. Now that computation is fast and cheap we can hope for and expect changes in statistical methodology. This study discusses one such potential change and evaluates the statistical accuracy or precision of the estimated parameter.

MATERIALS AND METHODS
The description of the bootstrap estimates, as we applied the algorithm described above to a real geological data from a Batan well at regular interval is presented. They are the principal oxide of sand or sandstone, which is SiO 2 or Silicon oxide. The point is that the bulk of oil reservoir rocks in Nigeria sedimentary basins is sandstone and shale, a product of sill stone Olanrewaju (2007) and Nwachukwu (2007). Therefore, having chosen to use a particular θ , how accurate is it as an estimator of θ? We present and test how accurate it is for dependent data structure.
Bootstrap method for standard errors: The bootstrap algorithm works by drawing many independent bootstrap samples, evaluating the corresponding bootstrap replications and estimating the standard error of ∧ θ by the empirical standard deviation of the replications. The result is the bootstrap estimate of standard error denoted by eB s , where B is the number of bootstrap samples used. The bootstrap algorithm for estimating standard errors and coefficient of variation is as follows: • Select B independent bootstrap samples, each consisting of n data drawn with replacement from X • Estimate the standard error se f (θ) by the sample standard deviation of the B replications: Where: The limit of B seB as B goes to infinity is the ideal bootstrap estimate of se f (θ): The non parametric algorithm has the virtues of avoiding all parametric assumptions, all approximations and in fact all analytical difficulties of any kind. The Coefficient of Variation (CV) of a random variable X, is defined to be the ratio of its standard error to be the absolute value of its mean: This measures the randomness or variability in X relative to the magnitude of its deterministic part θ f , which refers to variation both at the resampling (bootstrap) level and at a the population sampling level.

Bootstrap estimates of bias:
Bias is another measure of statistical accuracy, measuring different aspects of θ 's behavior. Bias is the difference between the expectation of an estimator θ and the quantity θ being estimated: We generated the bootstrap samples, evaluate the bootstrap replications θ * (b) = s(X *b ) and approximate the bootstrap expectation E F [s(X * )] by the average: The bootstrap estimate bias based on the B replications is (4) with * (.) The ratio of estimate bias to standard error, B B Bais/ se are also calculated as another measure of statistical accuracy and the smaller the ratio, the higher the efficiency of the estimates. If bias is large compared to the standard error, then it may be an indication that the statistic ˆs (X) θ = is not an approximate estimate of the parameter θ.
Root mean square error: This is another measure of statistical accuracy that takes into account both bias and standard error. The root mean square error of an If bias F = 0 then the root mean square error MSE = its minimum value se F .

RESULTS AND DISCUSSION
The summary of our findings on the performances of a truncated geometric bootstrap method for dependent data structure based on the implementation of the prescribed algorithm and for block sizes of (1,2,3,4) and bootstrap replicates of (B = 50, 100, 250, 500 and 1000) is given in the Table 1. In Table 1 the measures for statistical accuracy of an estimator θ from the geological data is presented.
From Table 1 it is observed that the bootstrap estimate of θ is nearly unbiased. The standard error are crude but useful measures of statistical accuracy, if the true sampling distribution F is (0, 1), then the true standard error are in the column SE. The Coefficient of Variation (CV)) in each bootstrap replications at different block sizes are moderate with less bias and ratio. The bootstrap bias estimates and the ratio of estimated bias to standard error are small with MSE of an estimator θ . This moderate minimum values in each column, indicate that in each replication we do not have to worry about the bias of θ .
As a rule of thumb, a bias of less than 0.25 standard errors can be ignored, unless one are trying to do careful confidence interval calculation Efron and Tibshiran (1993).
The situation is more complicated when the data are time series, because bootstrap sampling must be carried out in a way that suitably captures the dependence structure of the Data Generation Process (DCP). The block bootstrap is the best known method for implementing the bootstrap with time series data when one does not have a finite dimensional parametric model that reduces the DGP to independent random sampling.   The trouble with standard intervals is that they are based on and asymptotic approximation that can be quite inaccurate in practice. We implemented bootstrap -t confidence interval for producing good approximate confidence intervals. The goal is to improve by an order of magnitude upon the accuracy of the standard intervals θ ( ) Z α σ ± , in a way that allows routine application even to very complicated problems. The bootstrap-t procedure is a useful and interesting generation of the usual student's t method and it is particularly applicable to location statistics like the sample mean. The method was suggested in Efron (1983), but some poor numerical results reduced its appeal. Hall (1988) study showing the bootstrap-t's good second-order properties has revived interest in its use. Babu and Singh (1983) gave the first proof of second-order accuracy for the bootstrap-t and Diciccio and Efron (1992) showed that they are also second order correct.
A practicable t confidence interval for θ at level 1α is: From the Table 2, it is revealed that bootstrap-t confidence interval at 95% level of significance has a wider range than the standard normal confidence interval. The distributions are positively skewed. A confidence interval is desired for the scale parameter θ.
In this case the bootstrap-t confidence interval based on θ is a definite improvement over the standard interval. Therefore with the above results of different methods of measure of statistical accuracy of θ , we can fit a time series model to the available data for effective description and predication purposes.
To determine the bottom bootstrap confidence interval. By applying these methods to estimated θ estimator, we have the Table 2.

CONCLUSION
The truncated geometric bootstrap method for dependent data structure is justified by concentrating on basic ideas and applications rather than theoretical consideration. The bootstrap estimates of standard error and other measures of statistical accuracy such as bias, ratio, coefficient of variation and root mean square error reveals the suitability of the method for dependent data structure.
The bootstrap-t confidence interval also produces good approximate confidence intervals for the estimator θ which is suitable for model fitting and predictive purposes.