Sparse Sliced Inverse Quantile Regression

Corresponding Author: Ali Alkenani Department of Statistics, College of Administration and Economics, University of AlQadisiyah, Iraq Email: ali.alkenani@qu.edu.iq Abstract: The current paper proposes the sliced inverse quantile regression method (SIQR). In addition to the latter this study proposes both the sparse sliced inverse quantile regression method with Lasso (LSIQR) and Adaptive Lasso (ALSIQR) penalties. This article introduces a comprehensive study of SIQR and sparse SIQR. The simulation and real data analysis have been employed to check the performance of the SIQR, LSIQR and ALSIQR. According to the results of median of mean squared error and the absolute correlation criteria, we can conclude that the SIQR, LSIQR and ALSIQR are the more advantageous approaches in practice.


Introduction
In many statistical applications, the number of variables becomes huge.Consequently, many of statistical data analyses become hard.A familiar way to cope with this issue is to shrinkage the dimension of the regression model, without much loss of information on regression.This has been obtained via the Sufficient Dimension Reduction (SDR) theory.
The SDR theory (Cook, 1998) aims to reduce the pdimensional predictor X with a d-dimensional vector β T X, where β is a p×d matrix with d≤p, without much loss of information on regression and putting only a few assumptions.The subspace spanned by the columns of β, Span (β), is called a Dimension Reduction Subspace (D.R.S).The minimal subspace is usually uniquely defined in practice and coincides with the intersection of all of the subspaces (Cook, 1996).Such an intersection is a parsimonious population parameter that contains all regression information of Y given X and thus it is the central matter of concern in Dimension Reduction (D.R).This intersection is called the Central Dimension Reduction (CDR) space and is written as S Y|X and its dimension, d = dim (S Y|X ), is called the structural dimension of regression (Cook, 1998).
There have been a number of methods suggested to find the SDR in regression through estimating the Central Subspace (C.S) and one of the well-known methods for estimating C.S is Sliced Inverse Regression (SIR) (Li, 1991).The SIR is especially practical in dealing with high-dimensional covariates, and it has been shown to be an powerful D.R tool in high-dimensional regression problems (Zhu et al., 2006).Li (1991) suggested that an estimate of the S Y|X can be achieved by the first d eigenvectors υ 1 ,…..,υ d for the eigenvalue problem of the form: where, ρ 1 ≥⋅⋅⋅≥ ρ d >0 are the eigenvalues, ∑ x = Cov(X) and M = Cov{E(X|Y)}.Aragon and Saracco (1997) studied the finite sample properties of SIR.The Lasso has been combined with SIR in (Ni et al., 2005;Li and Nachtsheim, 2006) to produce sparse estimates.Li (2007) proposed to combine a regression-type formulation of SDR methods with shrinkage estimation to produce sparse and accurate solutions.This strategy can be applied to SIR and many of SDR methods.Li and Yin (2008) suggested a penalized SIR based on the Least-Squares (L.S) formulation of SIR.Cook (2004) rewrote SIR in (1) as a L.S minimization problem and SIR estimates can be obtained by minimizing: where, Σ .The inversion is not possible and a penalization approach has to be used in case of high correlations between the predictors or small sample sizes compared to the dimension.The L.S formulation of SIR in the original predictor X scale has been derived in (Li and Yin, 2008) in order to avoid the singularity of ˆx Σ as follows: ( ) ( ) The mechanics of the alternating L.S algorithm which is suggested by Li and Yin (2008) to minimize (3) can be described as follow.
Given B the solution of C can be obtained by: where Thereafter, rewrite (3) in the following form: where, vec(.) is a matrix operator.
. Given C, the solution of B in ( 5) is: ) ˆ( and this procedure will continue between minimizing B and C until convergence: where, ( ) ˆ, B C denote the SIR estimator that minimizes (7).Also, Li and Yin (2008) ( ) where, λ > 0 is the penalty tuning parameter.The authors wrote: Then α is the Lasso estimator for the regression of Y ɶ on the p-dimensional "data matrix" X ɶ .
Quantile Regression (QR) has become well-known approach to describe the distribution of a response variable given a set of predictors.QR gives a complete analysis of the stochastic relationships among random variables.The QR has been used in different areas such as finance, microarrays and many other fields (Yu et al., 2003).While QR has become very attractive as a complete extension of the mean regression; however, it suffers from the 'curse of dimensionality' (C.D).
There are a number of approaches tried to shorten the dimension and then estimate the Conditional Quantile (C.Q).For example, Chaudhuri (1991;Horowitz and Lee, 2005;Dette and Scheder, 2011;Yebin et al., 2011).Wu et al. (2010) suggested modelling the conditional quantile by a single-index function to tackle the dimensionality problem.Alkenani and Yu (2013) proposed penalized single index QR to reduce the dimensionality.Gannoun et al. (2004) used SIR to tackle the dimensionality problem of the predictors in order to obtain a more efficient estimator of C.Q. Specifically, the authors employ SIR method as a pre-step to avoid C.R. and then conditional quantile estimators are obtained by inverting the conditional distribution.
In this study, one step sliced Inverse Quantile Regression (SIQR) has been proposed, which will inherit the same advantages as in the SIR.In addition, sparse sliced inverse quantile with Lasso and Adaptive Lasso penalties have been suggested.This paper is arranged as follows.The SIQR is proposed in section 2. The LSIQR and ALSIQR are suggested in sections 3 and 4, respectively.Simulation examples and real data are presented in section 5 and 6, respectively.The conclusions are summarized in section 7.

SIQR
We can write the equation (7) as follows: ( ) Where: where, ρ τ (.) is the check function defined by Then we will replace B by * β in Equation 4to compute Ĉ .The new values of Ĉ will be put in Equation 11to compute a new * β .This procedure will continue between minimizing B and C until convergence.
The algorithm has been summarized as follows: • Initialization step) Obtain the initial β 0 from SIR methods where β 0 is p×1 estimated vector coefficients.Set B = β 0 • Given B = β 0 find Ĉ from equation ( 4) A ɶ is defined in Equation 5• Now, we have X * and Y * .* β can be obtained by solving (11) as standard linear QR.We can use rq(Y * ∼X * ,tau, method = "fn") function in quantreg and repeat steps 2,3 and 4 until convergence is attained LSIQR Tibshirani (1996), Lasso has been proposed for simultaneous variable selection and parameter estimation.It minimizes the residual sum of squares with a constraint on the l 1 norm.Li and Zhu (2008) extended Lasso Tibshirani (1996) to work with QR models.
From Equation 8, we can propose LSIQR as follow: ( ) where, Y ɶ and X ɶ were defined in equation ( 9).The optimization problem in (12) has been solved by employing a standard Lasso QR.
The algorithm has been summarized as follows: • Let B and Ĉ represent the convergent values for B and Ĉ which we obtained from the previous algorithm α can be obtained by solving (12) as Lasso linear QR.We can use ( ~, ,method "lasso") rq Y X tau = ɶ ɶ function in quantreg package to find α ALSIQR Fan and Li (2001) proved that Lasso produces biased estimates and the oracle properties do not hold for the lasso.Adaptive Lasso method, in which adaptive weights are used for penalizing different coefficients in the l 1 penalty, has been suggested in (Zou, 2006).The Adaptive Lasso benefits from the oracle properties that the Lasso does not have (Zou, 2006).The Adaptive Lasso QR has been suggested in (Wu and Liu, 2009).
ALSIQR has been proposed as follows: ( ) where, the weights are set to be for some appropriately chosen γ>0, α ɶ is the quantile sliced inverse regression estimates.
We can summarize the algorithm as follows: • Let B and Ĉ represent the convergent values for B and Ĉ which we obtained from the previous algorithm R .α ɶ can be obtained by solving (13) as Adaptive Lasso linear QR The LARS algorithm (Efron et al., 2004;Zou, 2006) has been applied to get the Adaptive Lasso estimate of α ɶ in (13), which is described as follows: Step 1. yj X ɶ is the jth coordinate of y X ɶ .For any given λ, where, X ** is the re-scaled predictor matrix.
Step 2. Obtain * α by solving the standard lasso QR problem for all λ n by using LARS as follows: ( ) The minimization problem in ( 14) can be solve by using ** ( ~, , method "lasso") rq Y X tau = ɶ function in quantreg package to find * α .
According Also, the results show that the MMSE values for the all considered methods increase when σ = 1 move to σ = 3 for all τ values.Moreover, the values of MMSE for all methods increase when the values of τ decrease.Table 1.Simulation results for the SIQR and LQR based on the linear model in example 1  In general, the ALSIQR, LSIQR and SIQR produce precise estimates and they are more significantly efficient than the other methods.
It can be observed that the ALSIQR, LSIQR and SIQR give a lower MMSE and bigger |r i | than the other methods.The variations in estimates of the proposed methods are approximately same in the most of cases and less than the variations in the estimate of the other methods.Most noticeably, when τ = 0.10 and τ = 0.25, the ALSIQR, LSIQR and SIQR are more considerably efficient than the other methods.From Table 3-9, in term of variables selection and according to Ave 0's, it is obvious that the Ave 0's for the ALSIQR and LSIQR methods is close to the true number.

Air Pollution (A.R) Data
In this section, the ALSIQR, LSIQR and SIQR have been illustrated through an analysis of A.R data.The A.R data is online at the website http://lib.stat.cmu.edu/datasets/NO2.dat.The response Y is hourly values of LOG of the concentration of NO2.The p = 7 predictors X are LOG of the number of cars/hour (x 1 ), temperature 2 m above ground (x 2 ), wind speed (x 3 ), the difference in temperature between 2 and 25 m above ground (x 4 ), wind direction (x 5 ), hour of day (x 6 ) and day number (x 7 ).The predictors and the response have been standardized.
Table 10 reports the results of MAD (median absolute difference between ˆT X β and y) and SD of the prediction errors for estimated quantiles by all of the studied methods based on A.R for τ = (0.10, 0.25, 0.50).
It is clear that the ALSIQR, LSIQR and SIQR have less MAD and SD of the prediction errors than the other methods especially for τ = 0.0.10 and 0.25.This confirms the statement that the proposed methods do well in the extreme quantiles.The results of the numerical examples and the A.R analysis suggests that the ALSIQR, LSIQR and SIQR perform well.

Conclusion
The current study proposes three methods, SIQR, LSIQR and ALSIQR.The ALSIQR, LSIQR and SIQR have been compared with ALLQR, LLQR, LQR and NQR under different situations.In order to examine the performance of the SIQR, LSIQR and ALSIQR, numerical examples were conducted based on the models as described in section 5.It has been concluded based on the simulation studies and A.R data, that the SIQR, LSIQR and ALSIQR more advantageous in comparison to ALLQR, LLQR, LQR and NQR and thus the authors believe that the SIQR, LSIQR and ALSIQR are useful practically.
with y Z denoting the mean of Ẑ in the y th slice, n y is the number of observations within each slice and ˆfraction of observations in slice y and h is the number of nonoverlapping slices.Over B∈R p×d and C = (C 1 ,…,C h )∈R d×h the values of B which minimize L(B,C) form an estimation of the central space S Y|X .Equation 2 n * ×pd predictors matrix β * = pd×1 coefficients vector Then, we can propose SIQR as follows: 10 and x i (i = 1,…,10) and ε are independently and identically distributed (i.i.d) standard normal.We take σ = 1 and σ = 3.Example 2: R = 200 data-sets with size n = 400 observations have been generated from the following model: to the mean and SD of |r i | between LSIQR and SIQR have a better performance than the LLQR, ALLQR, LQR and NQR for all studied cases.It is obvious from the Table1and 2 the preference of SIQR, when it compares with LQR and NQR, depending on MMSE for both σ values and all values of τ.Furthermore, when the values of τ go up the values of MMSE go down.From Table3-8 we find that ALSIQR and LSIQR give MMSE and SD values less than the other methods.

Table 2 .
Simulation results for the SIQR and NQR based on the nonlinear model in example 2

Table 3 .
Simulation results for the ALSIQR, LSIQR, SIQR, ALLQR, LLQR and LQR based on the linear model in example

Table 4 .
Simulation results for the ALSIQR, LSIQR, SIQR, ALLQR, LLQR and LQR based on the linear model in example 3 model

Table 5 .
Simulation results for the ALSIQR, LSIQR, SIQR, ALLQR, LLQR and LQR based on the linear model in example 3 model

Table 6 .
Simulation results for the ALSIQR, LSIQR, SIQR, ALLQR, LLQR and LQR based on the linear model in example 3 model

Table 7 .
Simulation results for the ALSIQR, LSIQR, SIQR, ALLQR, LLQR and LQR based on the linear model in example 3 model

Table 8 .
Simulation results for the ALSIQR, LSIQR, SIQR, ALLQR, LLQR and LQR based on the linear model in example

Table 9 .
Simulation results for the ALSIQR, LSIQR, SIQR and NQR based on the nonlinear model in example 4