Quantile Regression Estimation Using Non-Crossing Constraints

Email: ilaria.amerise@unical.it Abstract: In this article we are concerned with a collection of multiple linear regressions that enable the researcher to gain an impression of the entire conditional distribution of a response variable given a set of explanatory variables. More specifically, we investigate the advantage of using a new method to estimate a bunch of non-crossing quantile regressions hyperplanes. The main tool is a weighting system of the data elements that aims to reduce the effect of contamination of the sampled population on the estimated parameters by diminishing the effect of outliers. The performances of the new estimators are evaluated on a number of data sets. We had considerable success with avoiding intersections and in the same time improving the global fitting of conditional quantile regressions. We conjecture that in other situations (e.g., data with high level of skewness, non-constant variances, unusual and imputed data) the method of weighted non-crossing quantiles will lead to estimators with good robustness properties.


Methodology and Estimation
A typical investigation in statistical analysis consists of the linear regression of one response variable on one or more predictor variables, all of which are observed on a sample of entities. The rational is that by establishing a linear relationship between them, knowledge of the value of predictor variable enables an approximate value to be predicted for the response variable. However, a richer and more precise understanding can be achieved through quantile regression analysis, which allows us to examine and compare different levels of response given the variation in the independent variables by considering simultaneously the conditional quantile functions for a properly chosen set of quantiles.
Let Q p (Y|x) = inf{Pr(Y≤y|x)≥p} indicate the p-th conditional quantile (0<p<1) of a real valued random variable Y given a vector of m covariates x. In short, y is defined as the smallest real value such that the probability of obtaining smaller values of Y is at least p. The quantiles are the values that divide the total probability into parts. Values of interest are themedian which divides the distribution of Y|x into halves (p = 0.5), the three quartiles which divide the distribution into four equal parts (p = 0.25 h, h = 1,2,3), deciles (p = 0.1 h, h = 1, ⋅⋅⋅, 9) and so on. Quantiles could be considered also at irregularly spaced over the (0,1) interval of probabilities. Conditional quantile functions offer simple and flexible models for the stochastic component of a regression and enables us to obtain reasonable estimates in the presence of a broad range of departures from Gaussianity (Parzen, 1979;Gilchrist, 2006).
For a random sample of observations y = (y 1 , y 2 ,..., y n ) of Y, a linear regression model may be specified as: (1) with x i = (x i,1 ,x i,2 ,⋅⋅⋅, x im ), i = 1,2,..., n, being a sequence of m×1 vectors of known covariates and n>m. For each p, the magnitude and the sign of the effect of a given regressor can be compared with the effect at the other quantiles. The vector β(p) ∈ R m contains p-specific coefficients whose estimate should be obtained from sample data. The ( ) Denote X = (x 1 ,x 2 ,⋅⋅⋅, x n ) the n×m known design matrix with columns x i , i = 1,2,⋅⋅⋅, m. We assume that the covariates include an intercept term; therefore, the first column of X consist entirely of ones and that X has rank m. In addition, let S m ⊂ R m the compact domain of the covariates over which the model holds; this means that, for each experimental condition x, an observation y is available according to the model (1). We assume further that y and X are observed with no error and that different observations are independent. Moreover, we assume that e 1 ,e 2 ,⋅⋅⋅, e n are independent random errors with quantile function Q p (e). The quantile function is left unspecified; we only require the verification of the following constraint: which implies that the conditional p-th quantile of e i is null for each i. It follows that the p-th conditional quantile of y|x i is given by: by the definition of the conditional expectation. It is worth noting that there is no assumption on identical distributions and that model (1) allows the errors to change as a function of X and, thus, various form of and local noise rates can be accomodated. The first thing you need to understand is that there are two classes of optimization problems: Function minimization and mathematical programming. Both seek an optimum and both involve constraints, but mathematical programming begins with the constraints. It emphasizes solving the constraints and then looks for the best solution.
A quantile regression estimate of the parameters ( ) p β is defined as those values of the parameters that minimize the asymmetrical loss function: The minimizing β determine a m dimensional hyperplane given by that best fits the n observations. All points above the best interpolating hyperplane contribute with weight p to the estimates of the parameters; all observations below the hyperplane contribute with weigh (1−p).
The intuition behind the seminal article of Koenker and Bassett (1978) is quite simple. There is a complete equivalence between the computation of a quantile in terms of the order statistics y (i) , i = 1,⋅⋅⋅, n and the minimization of an asymmetrical loss function which, in turn, can be reformulated as the minimization of a linear function subject to linear constraints: n n n n n n n n n n n n n n n n n n n n n p β where: u n = The (n×1) vector of ones I n,n = The identity matrix of order n 0 n,n = The (n×n) matrix of zeros 0 n = The (n×1) vector of zeros The linearity of the objective function and constraints implies that the solution has to lie in one of the vertices of the polyhedron defined in (5). The advent of modern linear programming techniques in the later 1940s lead to fast and efficient algorithm. In fact, it may be the computational complexity of least absolute regression, as well as the analytical intractability in a statistical setting, that forced this approach to take the back seat to least squares in multiple linear regression Seneta and Steiger (1984). Here we will not go deep into the computational aspects of quantile regression, but just recall that the availability of efficient linear programming algorithms (Koenker and D'Orey, 1987;1994). For large data sets the interior point algorithm written by Portnoy (1991) is recommended. Buchinsky (1998) notes that the m×1 vector of firstorder conditions for solving the problem in (4) is given by: where, the sgn(.) (signum) function takes the values −1,0,+1 according to whether its argument is negative, zero or positive. The approximation symbol in (7) emphasizes the fact that since U n (β, p) is discontinuous function of β(p), it may not have an exact solution. However, for n → ∞ the expression on the left hand side of (7) converges to zero. Let M = (r 1 ,r 2 ,⋅⋅⋅, r m ) be a subset of m distinct integers from 1,2,⋅⋅⋅, n determining a combination of rows of the design matrix such that X(M) has rank m. According to Koenker and Bassett (1978) (Theorem 3.1) the solutions to (4) have the form: The residuals of the estimated quantile regression have an interesting structure. Koenker and Bassett (1978) (Theorem 3.4) show that: where n − ,n + indicate, respectively, the number of negative and positive residuals. If the solution of (4) is unique then all inequalities are strict. Furthermore, if the quantile function of the error terms is continuous then there are exactly m residuals with value zero. To clarify this point, consider the formulation of (1) with m = 1 (intercept-only model). The resulting value of ( ) For n → ∞, we could estimate an increasing number of quantile regressions; in practice, there may be at most 3n distinct regression solutions for p unequally spaced on the interval [0,1] (Koenker and D'Orey, 1987). The common practice is that quantile regression is designed to be used in groups rather than singly where the number of elements in a group may increase as the sample size increases. In finite samples, Portnoy (1991) shows that the number of distinct quantile regressions is O(nlogn). This opportunity is particularly useful when the regressors have a different impact on different regions of the design space. For example, pairs of extreme conditional quantiles map out a conditional prediction interval within which one expects a specified fraction of individual points to lie. Also, for unimodal distributions, the analysis of kurtosis focuses on how the covariates affect both the tails and the central parts of the conditional distribution. There will have to be a thorough understanding of the research situations in which quantile regression will be used in order both to understand its own limitations and to exploit its full potential.
Inference and asymptotic theory of quantile regression is not discussed in this paper because of the availability of a vast literature on the subject. The survey in Koenker (2005)[Ch.3 and Ch. 4] is particularly effective in this sense. We shall therefore confine ourselves to highlighting a few issues about de estimated error terms.

Crossing Quantile Regressions
Quantile regression estimates are robust in presence of observations that are far in the direction of the response variable (note that this does not apply to outliers of covariates). This is an attractive property, at least in part attributable to "ordinal" nature of the quantiles, which slows down the leverage from outlying observations. At the other side of the coin, there is the potential drawback that quantile regression estimates are not guaranteed to be unique for the given p. When a regression model is assessed, the two main characteristics that need to be considered are robustness and sensitivity. Robustness is a valuable characteristic because quantile regression does not change greatly when data are changed slightly. However, since robustness is achieved at the cost of a loss in precision, it can become a problem if the gap between percentages p are too narrow. Sensitivity is important, but it probably reduces the reliability of estimation when substantially similar observations are mapped onto very distant conditional values of the response.
Robustness and sensitivity are antithetical requirements because robust procedures give greater stability against random changes in data, whereas more sensitive procedures offer a richer source of information regarding the dependence structure. A balanced solution may be the analysis of the conditional quantile function for an appropriate set of percentages 0<p 1 <p 2 <⋅⋅⋅<p k <1 and the estimation of the parameters separately for each quantile. However, when several conditional quantiles are treated, it is not unusual that the estimated parameters generate non parallel hyperplanes.
Quantile regression hyperplanes in R m are defined by the real coefficients β j (p), j = 1,⋅⋅⋅, m where the set: Note that two hyperplanes equations in R m form the same hyperplane if and only if they differ by a multiplicative factor not equal to zero. In general, a violation of the monotonicity condition occurs if for two quantiles p 1, p 2 and a (m×1) vector x we have: If the vectors β(p 1 ) and β(p 2 ) are linearly independent, then we have two independent linear equations in m unknowns. After solving the first equation for x r , this value can be substituted into the second equation, which can be solved for x r,s 6 = r. At this point there are (m−2) free unknowns. Any two non parallel hyperplanes intersect in one hyperplane of dimension (n−2). When k, the number of quantiles, increases and a limited amount of data is available, the phenomenon of crossing becomes much more likely.
To illustrate, consider the quantile regression model (1) with m = 2. In this case we have x i = (1,x i ) and: If the support of x i is the entire real line, then either β 1 (p) is a constant independent of p or two or more conditional quantile regressions overlap for some value of x i . This simply implies that y|x i is higher at a lower quantile and vice versa. For example, a given point (y,x) might result simultaneously below p 1 = 0.20, but above p 2 = 0.25 leading to an invalid y|x distribution. He (1997) observes that crossing quantiles hyperplanes reflects a paucity of data in the region concerned (a sort of misspecification of the covariate effects). In this sense, Koenker and Geling (2001) suggest introducing additional covariates to avoid crossing. For example, we can vary the specification of the model for each quantile by adding and subtracting a positive covariate: This expression incorporates two quantile regressions which never cross one another and do not cross (13). In the multivariate case, crossing could be avoided if all quantile hyperplanes are parallel. For instance, Zhao (2000) first estimates the slope parameters by the least absolute deviation (p = 0.5). Common slopes guarantees that all the quantile hyperplanes will be parallel with no intersection. Second, the estimates of the intercept are obtained at different quantiles of the residuals determined in the first step. The combined estimates produce a consistent estimator of the theoretical regression quantile. Note that this is the only possible solution when the support of he covariates is R m . Tokdar and Kadane (2012) build a model of quantile regression monotonically increasing in p ∈ [0,1] obtained by reparametrizing the elements of β(p) as linear combinations of two monotonically increasing curves. Bassett and Koenker (1982) (Theorem 2.1) show that the estimated conditional quantile function at the centroid = x x (the vector whose the i-th element i x is the average of x i ) we have x x , which is a monotone jump function of p on the interval [0,1]. Moreover, Q p (y i |x) must be monotonic in p in a neighborhood of = x x . Thus, incidence of crossing generally occur only in outlying regions of the observed covariate space S m . On the other hand, we should ignore points close to the boundary or lying outside S m unless the data set include sufficient observation the extreme regions of the design space to allow a reliable computation of quantiles. Schnabel and Eilers (2013a) point out that, although in many cases crossing is only a visual annoyance, it may jeopardize further analysis, e.g., when studying conditional distributions at specific values of the independent variable.
Convergence to the true conditional quantile functions renders legitimate the expectation that the crossing phenomenon will eventually disappear as the sample size n increases. Machado and Mata (2005) recall the theoretical results of Bassett and Koenker (1982) (Theorem 3.2) and Bassett and Koenker (1986) (Theorem 3.1), which show that the estimated parameters of the quantile regression are consistent for their population counterpart. The theory, therefore, predicts that the potential violations of monotonicity will be smaller the larger the sample size and the sparser the set of p ∈ [0,1]. This is not necessarily true for a general design matrix X and the estimated hyperplanes are not guaranteed to be parallel. On the other hand, because of the phenomenon known as the "course of dimensionality" (which is virtually omnipresent when analyzing data in high-dimensional spaces) even large datasets may become rarefied in certain regions to a degree which favors quantile crossings. Bondell et al. (2010) observe that quantile crossing is a well-known problem, but no simple and general solution currently exists. In order to circumvent this difficulty, many authors have looked for techniques which are capable of fitting the data appropriately and several attempts at this have been made since the late 1990s. The literature on avoid crossings can generally be divided into two major approaches: Semi-parametric techniques, where the underlying error quantile function does not assume any specific form and non-parametric methods where various smoothing techniques (e.g., kernel fitting or polynomial spline fitting) are adapted to the error distribution. Two methods to prevent quantile inversions, one for each approach, were proposed by He (1997): The first, applied the Box-Cox power transformation to restrict regression quantiles (Heagerty and Pepe (1999). The second imposes certain restrictions on the space of possible solutions to conditional quantiles. The restricted regression quantile curves are easy to compute, but do not suffer from the problem of quantile reversal in certain areas of the design space. Yu and Jones (1998) study nonparametric regression quantile estimation by kernel weighted local linear fitting. Specifically, given the current quantile function, the next quantile function is estimated so that it does not cross with the existing quantile. The authors show that local linear conditional quantile estimation is feasible and practical. Results are at the least comparable with those produced by other approaches. Takeuchi and Furuhashi (2004) addressed the problem following a support vector machine approach. With the use of kernel-based estimator, a non-crossing conditional quantile estimator is derived in the form of a constrained maximization of a piecewise quadratic function Takeuchi et al. (2006).
To deal with the potential lack of monotonicity in multiple quantile regressions, Melly (2005)  Neocleous and Portnoy (2007) show that by choosing an appropriate grid of p-values and defining the quantile functions by linear interpolation between grid values, the resulting conditional quantile estimator is strictly monotonic with probability tending to one and is asymptotically equivalent to the usual regression quantile estimator. Dette and Volgushev (2008) proposed noncrossing estimates of quantile curves using a simultaneous inversion and isotonization of an estimate of the conditional distribution function. They also demonstrated that the new estimates are asymptotically normal distributed and asymptotically first order equivalent to quantile estimates obtained by local constant or local linear smoothing of the conditional distribution function. Shim et al. (2009) propose a new non-crossing quantile regression method using doubly penalized kernel machine which uses heteroscedastic location-scale model as basic model and estimates both location and scale simultaneously by kernel function. Wu and Liu (2009) introduce a stepwise estimation scheme. With the current quantile regression function at a particular given level, constraints are added in the estimation procedure to ensure the next quantile regression function does not cross the current one. The procedure continues till quantile regression functions at all desired levels are obtained. One drawback of this algorithm is its dependence on the order that the quantiles are fitted.
The point of departure of Chernozhukov et al. (2009;2010) is that if an original, potentially non-monotonic, estimate is available, then the rearrangement operation from variational analysis can be used to monotonize the original estimate of the quantile regression curve. In this sense, the authors propose monotone rearranging the original estimated curves, which is closer to the true quantile curve than the original curve in finite samples. However, the estimate of the conditional distribution function y|x is modified in a way which makes problematic to quantify effects of the covariates. . The weight is unique to the p-th hyperplane but it is common to all the n observations.
Liu and Wu (2011) employ simple constraints on the kernel coefficients which can guarantee the estimated conditional quantile functions never cross each other. This kernel formulation covers both linear and nonlinear models. Furthermore, the authors demonstrate how that through sharing strength among different quantiles, simultaneous noncrossing quantile regression can produce better estimation than individually estimated quantile functions.
The basic idea of Schnabel and Eilers (2013a; 2013b) is to introduce a surface on a twodimensional domain. One axis is for the covariates, the other is for the probability p. The quantile curve for any probability is found by cutting the surface at that probability. Effectively, all possible quantile curves are estimated at the same time and the crossing problem disappears completely if the sheet is monotonically increasing with p for every covariate value. Rather than directly modeling the level of each individual quantile, begins with a single quantile (usually the median) and then add or subtract nonnegative functions (called quantile spacings) to it in order to find the other quantiles. His approach is analogous to methods for approximating intervals, where one models the midpoint and the range of the interval, rather than try to model the upper and lower bounds directly.

Weighted Quantile Regressions
Crossings of quantile regression hyperplanes is an undesirable inconsistency which undermines the theoretical integrity of the quantile regression method and limits its usefulness in applications where monotonicity is a critical issue, such as prediction intervals for forecast. We therefore attempt to force proper ordering of the quantile curves to ensure that there is no crossings over some relevant region of covariate space.

Non-Crossing Quantile Regressions
If we apply the quantile function model (1) for p ∈ P = (0<p 1 <⋅⋅⋅, p k <1) then we need to estimate k sets of coefficients B = [β(p 1 ), β(p 2 ),⋅⋅⋅, β(p k )]. The corresponding k conditional quantile functions should verify, in a natural way, the monotonicity require-ments with respect to the percentage p: where, S m is the convex hull of a set of the n observed x ∈ R m . More specifically, S m is the intersection of all convex sets containing the observations: An obvious method to obtain the matrix B is the execution of an estimation procedure for each of the k different conditional quantile regressions. In the absence of further restrictions, the estimators to be included in B would be obtained by solving the minimization problem (4) for each p ∈ P. As we have said in the previous section, crossings should never happen in theory because of the properties of the quantile regression estimators. The question remains however how to deal with overlapping hyperplanes when such cases do occur.
Let L = (L 1 ,L 2 ,⋅⋅⋅, L m ) and U = (U 1 ,U 2 ,⋅⋅⋅, U m ) be, respectively, the vector of minimum and the vector of maximum elements observed for each covariate (with the exclusion of the first columns consisting entirely of ones). To simplify the evaluation of constraints (15), we can transform the covariates so that they range into the iinterval [0,1]: , 2, , , ; 1, 2, , . .
Hence a quantile regression estimate of the unknown parameters is given by: where where V is the matrix whose rows are v 1 ,⋅⋅⋅, v n . For simplicity of manipulation, it is convenient to redefine the k solution vectors of (19) for p ∈ P as follows: The restrictions described in (15) are equivalent to: This condition, according to Bondell et al. (2010), is both necessary and sufficient to prevent overlapping hyperplanes. The merit of this approach is that the question of quantile crossings is now reduced to a linear programming problem, which can be solved via standard software.

Non-Crossing Weighted Quantile Regressions
One unrealistic assumption underlying the quantile regression model is that each point of the p-th regression quantile hyperplane provides equally reliable and valid information about the deterministic part of the response variable. We argue that quantile regression crossings are due, at least in part, to the fact that all observations are considered on the same footing although the data might not justify this. Furthermore, we claim that the use of residuals from quantile regression can be of help to avoid such shortcomings. Amerise (2016) for a good review.
Consistent with this premise, we believe that a way to avoid intersections between estimated hyperplanes (over the design space) is to put more emphasis on observations which are more coherent with the model (1) and give less importance to observations thought to be cause of irregularities. Therefore, to deal with the crossing issue, we propose to estimate the quantiles under the noncrossing restrictions (15) by adjusting fit of h-th quantile regression to the following objective function: where, the weighting system w verifies the conditions.
The magnitude of wi;n quantifies the suitability of the information contained in the i-th observation relatively to the k regression hyperplanes fitted to the n data points. Strictly positive weights are strongly recommended by Koenker (2013)[p. 17] since a null weight is ambiguous. Moreover, the system of weights tends to the equal weighting scheme when hyperplane do no cross. Note that the weighted version of the objective function (22) can be solved by applying the unweighted algorithm to the responses and design vectors defined by r i,h = w i,n y i ,z i = w i , n v i i = 1,2,⋅⋅⋅, n. Therefore, problem (22) can be reformulated as: In practice, the computation of non-crossing weighted quantile regression can be efficiently accomplished by exploiting the same software developed for Bondell et al. (2010). Our approach presupposes that the weights are fixed and known in advance. For example, they can hold information about the reliability of imputed values or values derived from previous experience or source known to be polluted by errors. In practice, however, this assumption rarely holds so estimated weights must be used instead. There are many ways to estimate w. We base our choice on the idea that the weight for each observation should be inversely related to the size of the corresponding disturbance ŷi yi − yi where y i is the i-th value of the response in a sample of n points and ŷi is some estimate of y i |x i obtained with an unconstrained quantile regression hyperplane.
Let Ĉ the (n×k) matrix with columns given by the n estimated residuals of the noncrossing quantile regression associated with p h,h = 1,⋅⋅⋅, k. This implies that the unweighted estimators of Bondell et al. (2010) are taken as a benchmark against which to compare there is a range of possibilities for converting distances into weights. An exponential transformation is especially appealing to us because of its simplicity and flexibility: where τ ≥ 0 is a finite-dimensional parameter that may be varied to modify the influence of the distances. Increasing values of t make the observation which is at distance one from the null vector progressively less relevant. For a given τ>0, weights decrease as distances from the null vector increases. Form another point of view, if we consider the resemblance between (26) and the density function of an exponential random variables, then t can be thought to be the inverse of the expected uncertainty of the observations. Constant τ can be chosen arbitrarily in principle. Based on empirical experience with real as well as simulated data we suggest applying the optimize function for onedimensional optimization offered in Base-R. The method used is a combination of golden section search and successive parabolic interpolation which searches a specified interval from lower to upper for a minimum of a For what concerns large sample properties of noncrossing weighted quantile regressions, consider a set of percentages p 1 <p 2 <⋅⋅⋅,<p k such that p h ∈ [ε,1 − ε] for h = 1,⋅⋅⋅, k and 0<ε<0.5 and assume: The conditional densities f yi|x are differentiable with respect to y i for every x and each i = 1,⋅⋅⋅, n C.3 For D a bounded domain and 0<ε<1, there exist constants a>0, b,c<∞ such that: uniformly for x ∈ D, ε ≤ p ≤ (1−ε) and uniformly in i = 1,⋅⋅⋅, n.
Under the above conditions, Bondell et al. (2010) prove that the estimator obtained via (22) is asymptotically equivalent to the typical quantile regression estimator, regardless of the choice of a weighting systems w i,h , i = 1, ⋅⋅⋅, n, h = 1,⋅⋅⋅, k. Furthermore, in another theorem, the authors show that inference for the n -consistent constrained quantile regression can be achieved by using the known asymptotic results for classical quantile regression.

Experimental Results
The examples and experiments presented here look for evidence that incorporation of a weighting systems into the core of the non-crossing quantile regression procedure can lead to an alternative and (at least on specific occasions) better mechanism for fitting multivariate data. In this section, we use three examples to compare three different algorithms: Unconstrained, Unweighted Non-Crossing (UNC), Weighted Non-Crossing (WNC) for the quantile regression and thereby show the advantage of our new algorithm for the WNC.
Iriarte-D´ıaz (2002) discusses the relationship between maximum relative running speed (body length/second) and body mass (kg) concerning n = 142 species of terrestrial mammals, in order to evaluate whether the relative locomotor performance shows a differential scaling depending on the range of mass analyzed. Overall, maximum relative running speed was found to decrease with increasing body mass. Figure 1 illustrates the results of application of the three different techniques considered in the present paper for p ∈ P(0.50 : 0.95, by 0.05).
From graph B, it is apparent that the computation method of non-crossing quantiles proposed by Bondell et al. (2010) avoids the intersections which are present in graph A, at least within the design domain delimited by the vertical lines. Our version with weights inversely related to the residuals of the quantile regressions (graph C) generates regression lines which not only bypass crossings, but also gather near the center of the observed points. It must be noted, in fact, that there is an entity which does not match the general impression: 100 corresponding to the heteromyid rodent (Dipodomys merriami). The bias attributable to the presence of this outlyier can be noticed looking at the highest two lines in graph A) and B. In the former there are crossings clearly due to the carry-over effect from the isolated point. In the latter, the problem of the crossing is solved, but the outlier shares the line with other regular entities. In graph C the influence of the outlier has been entirely removed. The accumulation of lines around the center is presumably due the fact that the relationship between maximum running speed and body mass is curvilinear rather than linear.
To assess the difference in efficacy between different methods of estimation, we evaluate the behavior of the absolute errors affecting the various regression models. In particular, Table (1) compares the mean, the maximum and the minimum sum of absolute errors associated with the k = 10 quantile regression hyperplanes. The findings in Table (1) reveal that our weighted version of the noncrossing regression quantiles attains a better performance than the standard procedure with respect the abso-lute residuals. The unweighted non-crossing technique does not improve, from a fitting point of view, upon unconstrained quantile regressions.
As the secnd example, we analyze the data set sbp included in the package multcomp of R for the percentages (0.10, 0.25, 0.50, 0.75, 0.90). The data set refers to systolic blood pressure (in mmHg), age (in years) and gender of n = 69 people. In Fig. 2 it is shown that, in absence of outliers in the data and non-crossing lines, the three estimation methods behave similarly.     The well-known Housing Data Set which is available online at http://lib.stat.cmu.edu/datasets/boston_corrected.txt is considered for the third example. The data comprises n = 506 observations for 14 non-constant predictor variables and one response variable, Corrected Median Value of owner-occupied homes (CMEDV). For simplicity, we excluded the categorical variable RAD and the Charles River dummy variable (because there are too few on one status) and considered m = 11 predictor variables.
We select virtual random samples without repetition of n ∈ (120, 240, 360, 480) observations from the total data set. The results are reported in Table 2 where each entry is an average across L = 100 experiments of the same type. The weighted non-crossing quantile regressions yield average absolute errors systemically better than those of the other methods. It appears that, the adjustments caused by the unweighted restrictions on the intersection of hyperplanes of the ordinary estimates have resulted in relatively minor modifications to the extremes quantile regressions. The adjustments are more substantial for weighted non-crossing regressions and these seem to be concentrated in the central and higher percentages where the most pronounced reduction of residual reductions is observed.
The quality of the fitting expressed by the columns of Table 2 does not improve with increased sample size.

Discussion and Conclusion
Conditional quantile functions ofer simple and flexible models for the stochastic component of a regression and enable us to obtain reasonable estimates in the presence of a broad range of departures from Gaussianity (Parzen, 1979;Gilchrist, 2006). However, the interpretability of QR estimates deteriorates when conditional quantile functions cross or overlap.
Our aim in this paper is to introduce a new methods of estimation for the parameters of quantile regressions that avoids the problem of crossing quantile curves.
Based upon the work Bondell et al. (2010), a weight is attached to each observation inversely related to the estimated disturbances associated with the unweighted quantile regressions. We are convinced that the in sequence of disturbances corresponding to a given observation decreases exponentially with the Mahalanobis distance from their centroid. This scheme can be particularly effective when the intersection of hyperplanes is most probably due to the presence of outlying entities. The estimation of multiple noncrossing quantile regressions is enforced by requiring nothing more than lower quantile levels do not cross higher quantile levels. This gives rise to a set of inequalities that should be all satisfied. Inequalities can be considered a priori pieces of information about the true parameters that restrict the original parameter space. It is known that, under general conditions, the estimate ( ) h p β has optimal properties for the h-th conditional distribution and this is also true for h = 1,⋅⋅⋅, k. Since, the criterion (4) does not use the fact that β(p h ), h = 1,⋅⋅⋅, k lie in A m , one might wonder if using such conditions in estimation procedures give a gain in efficiency. This is not necessarily so (Rothenberg, 1973[p. 55-57]) for the case of the linear least squares estimator). The question then is to find the best way of satisfying the constraints without worsening the properties of the regression quantile estimators.
We have shown that our method, because of the introduction of an efficient system of weights, is successful at determining quantile regression hyperplanes that do not cross in the convex hull of the explanatory variables. The results presented in this paper support this view. There are still many unknown aspects of our methodology; for example, what is the efficiency of parameter estimates for clean data (absence of outliers or Gaussian disturbances), what is the power function of the test statistics and what is the bias in parameter estimates when data are affected by specific forms of heteroscedastic errors. These problems can be addressed through asymptotics for large samples and via a diffuse Monte Carlo simulation plan evaluation for finite samples. These will be topics for further study. Two other potential directions for future research should be considered: to devise a multistep mechanism for building more effective weights and to establish test statistics which help which help to decide on goodness of fit for systems of quantile regressions on the same data set.