Using Regression to Establish Weights for a Set of Composite Equations through a Numerical Analysis Approach: A Case of Admission Criteria to a College

Problem statement: Mathematically little is known of college admissio n criteria as in school grade point average, admission test scores o r rank in class and weighting of the criteria into a composite equation. Approach: This study presented a method to obtain weights on “c mposite admission” equation. The method uses an iterative p rocedure to build a prediction equation for an optimal weighted admission composite score. The thr e-predictor variables, high school average, entrance exam scores and rank in class, were regres s d on college Grade Point Average (GPA). The weights for the composite equation were determined through regression coefficients and numerical approach that correlate the composite score with co llege GPA. Results: A set of composite equations were determined with the weights on each criteria i n composite equation. Conclusion: This study detailed a substantiated algorithm and based on an optimal composite score, comes out with an original and unique structured composite score equation for admissions, which can be used by admission officers at colleges and universities.


INTRODUCTION
Institutions of higher education vary considerably in the degree to which they are selective in admissions. Typically, students may get rejected or accepted to a major of their choice based on a set of academic criteria that a university establishes and uses. This variety in selectivity, perhaps, is one reason that admissions practices are not generally well understood by either the applicants or their parents.
Research studies have not tackled the development of a logical, mathematical, interpretive and algorithmic model in the selection and weighting of admission criteria to colleges and universities. The use of a twovariable criteria for admissions to predict admission (Hu, 2002) as in High School Grade Point Average (HSGPA) and standardized tests, such as the Scholastic Achievement Test (SAT) that predict (directly or indirectly) an applicant's probability of academic success in the first year of college. However, few studies have emerged where the index on each criterion was not validated through an empirical numerical and computational approach. In this study, we fill this gap by proposing a way to establish weights for admissions and validate these weights using numerical methods that build a composite admission score. A common procedure for admission decisions is to select a number of predictors for general performance at college (videlicet, GPA), assign subjective weights to criteria that predict performance and then build a composite score (McCormick and Ilgen, 1980). To validate the composite equations, the scores on the equations will be correlated with the third semester GPA. The College Board and the Trends in College Admission Report (Breland et al., 2002) show an overwhelming number of US universities requiring HSGPA, standardized tests, or Rank In Class (RIC) for admissions to college.
Since there is a complete absence of a unified and standardized set of criteria for admissions in US-style universities around the world, an operational structured approach, describing and justifying the selection and weighting process of admission criteria is fundamental and informative for parents and students. The main goal of this research is to find a statistically valid, mathematically logical and applicable admission method aiming at establishing appropriate weights for a university admissions criterion.
This study develops a model to weight the admission criteria to a college: High School Average (HSA), Entrance Exam Scores (EES) or SAT and RIC. The criteria and weights are summarized in a composite equation. The model is algorithmic and applies a regression analysis, which then correlates the regression line with ordinal weighted criteria on a predicted college GPA (Maximal correlations between the average weighted criteria for the composite and college GPA dictates the weight for each criteria). Based on the regression, a decision-tree algorithm iterates a correlation between the observed college GPA and the result of the ordinal-weighted composite equation that predicts college GPA. The repetitive calculations of an ordinal weighted criteria and repetitive correlation is what we call as the numerical analysis that weights the admission criteria, which in turn is iterated by decreasing or increasing the coefficients of each criteria. The selection of the optimal composite equation has different average-weighted criteria into a composite score, achieved through the correlation between a composite score and college GPA.

Review of literature:
There is a general trend among universities and colleges to use a reliable basic skills test (home-made within a college or a university), HSGPA and the SAT scores, for admittance into college (Breland et al., 2002). Some leeway in the use of admission criteria in US universities suggests a greater support for HSGPA as the predominant criteria for admittance. The most frequent requirement, for both public and private institutions in the US, is HSGPA. The trends report indicates that a little more than one-half of all four-year institutions in 2000 required a minimum HSGPA and a standardized admission tests like the SAT, underlining the diminishing level and use of RIC for admittance (only about ¼ of the four-year institutions reported minimum standards for high school rank).
A number of methods have been established to determine the weights for an admission index. Earlier studies show that both HSGPA and the standardized test scores are the strongest predictors for institutional selectively of students (Weizman, 1982). Even, with many criteria empirically tested, high school average is a slightly better predictor of first-year college GPAs of 2.00 or higher than were admission college test scores, with HSGPA (Noble and Sawyer, 2000;. HSGPA showing a significant prediction power for institutional selectivity of students (Noble and Sawyer, 2002) a primary predictor to student success in graduate work. Generally, maximal correlations between the average weighted criteria for the composite and college GPA dictates the weight for each criterion) (Paolillo, 1982;Youngblood and Martin, 1982). A number of other studies have also confirmed prediction power and use of these variables such as high school rank, standardized test score (ACT or SAT scores) and RIC to predict college performance (Price and Kim, 1976;Mouw and Khanna, 1993;Beecher and Fischer, 1999;Noble and Sawyer, 2000;Xiao, 2002).
A variety of studies have provided evidence and criteria for college success (Paolillo, 1982;Sisson and Dizney, 1980), generally showing a correlation between university input measures such as standardized test scores, national exams, SAT and HSGPA on college GPA (Eno et al., 1999;Noble and Sawyer, 2000).
The use of the regression method is probably one of the most widely used methods for establishing balanced weights for admission criteria (McCormick and Ilgen, 1980). Other compatible weighting methods have also been compared (Fralicx and Raju, 1982) indicating the similarities of these methods. While a logistic regression analysis to predict binomial or discrete outcomes as a valid admission index to college success (Xiao, 2002) However, not one study has approached the weighting of criteria using a conservative, operative and logical numerical method to validate the weights for a set of admission criteria. Hence, a composite admission score based on regression provides a balanced weight for each criterion and a standard and indicator to whether students get admitted or not (Talley, 1989;Talley and Mohr, 1991).

MATERIALS AND METHODS
The data: Data from 2001-2006 of all undergraduate enrollees in a private university. The study used the base-semester data starting in the fall of 2001. All identifiers as names of students were removed from the data set. The data was accrued from the student record's system. The data included high school average of the last 2 years of secondary School Entrance Exams (EES) or SAT I-Reasoning scores and RIC. The third semester GPA was used as a dependent variable. For example, data as high school average, EES and RIC from fall of 2001 were entered in the regression to predict college GPA (third-semester GPA).
To counter for irregularity in grading scales, high school grades were standardized for each school and a standard score (z) calculated using the mean and the standard deviation of the same school. The z was then converted to nonstandard normal distribution and then to percentage score.
The method in this study used regression to predict the third semester after enrollment. Two sets of regression equations were used, the first regression equation included two variables: high school average and EES. The use of the two-variable predictors is because student's RIC was not available; as many schools do not maintain a RIC or those who enrolled with the International Baccalaureate scores had no RIC. For students who had RIC, a second regression was used and it included three predictor variables: High school average, EES (or the SAT-I) and RIC. Regression coefficients were calculated and standardized coefficient ratios determined the weighting order of the admission criteria (i.e., high school average, EES and RIC).

RESULTS
Regression using high school average and entrance exam scores: All grades whether high school average, EES or RIC were all converted to percentage scores. In the first regression, high school average and EES were regressed on the third semester GPA. The first regression equation was based on 2947 data points. The standardized coefficients of high school average and EES generated by the regression analysis are shown in Table 1. The standardized betas of the regression were used to obtain a ratio referred to as the Standardized Regression Coefficient Ratio (SRCR). The ratio of 1.55: 1 (the SRCR: 0.34/0.22 = 1.55, Table 1) for high school average to EES indicates, at some permutable level, a higher weight for high school average to EES (or SAT I), which indicates that every unit, change in EES gives 55% greater change in high school average.
The regression equation was set-up such that an ordinal-weighted scheme was applied on each variable of high school average and EES, by increasing the weight on one and decreasing the weight on the other and its converse. A weighted change on each criterion of the predicted equation was then used. The weighted increase/decrease is generally dictated by the SRCR value, thus, by using a maximum weight increase of 55% higher per unit change in EES. An iterated predicted GPA value was calculated using the regression equation. Thus, the iterations defined by the SRCR are indexed by i (i = 1-22). For each iteration, a 5% weight change was used for the changed criterion. The iteration process can be expressed in Eq. 1: For each iteration, a correlation r i was calculated for the predicted GPA (Y i ), with the actual third semester GPA. The highest correlation appeared for a weighted coefficient of I = 0.1 and I = -0.1, a 10% increase/decrease in EES and its converse. This gave a tolerance range of 10 to -10% weight increase/decrease used to determine the weighted criteria for the actual values of high school average and EES on the composite Eq. 2.
Calculating the weights: A composite equation was constructed based on the criteria of high school average and EES. Through SRCR, a directional increase in weight was allocated for high school average and a decrease in EES. The composite equation was reiterated 10 times in a positive directional weight increase of high school average by a maximum tolerance of 10% and incremented by 1% and a decrease for the weight of EES by 1% reaching a maximum tolerance of 10% decrease.  Considering that each variable would weight equally and add up to a composite of 100%, these criteria were in percentage units and then each would have a 50% chance of the total score (Eq. 3). Therefore, the composite score formula was weighted such that high school average had a higher weight than EES, where a greater proportion was given to high school average (as shown through SRCR). At each iteration-an increase in high school average (above the 50%) and decrease in EES (below the 50%)-a correlation was calculated between the composite score and the third-semester GPA. The highest correlation r = 0.48, p<0.0001 appeared for an increase of 10% to a cumulative of 60% weight for high school average and a decrease by 10% to a reduced cumulative of 40% for EES. The results merely validate the weighted predictor regression of college GPA. The final weighted score structure is shown in Eq. 4: Regression 2: Using high school average, entrance exam scores and student's rank: In the previous algorithm, two variable criteria were used to determine a composite score. When there are more than two criteria, as high school average, EES and RIC, the algorithm goes through a decision-tree heuristic. The criteria as high school average, EES and RIC were regressed on the third-semester GPA. The analysis was based on 1207 data points. The standardized coefficients of high school average, EES and students' RIC results are obtained from the regression analysis presented in Table 2 and its application in Eq. 5: A similar methodological approach to the twovariable criteria was used with the three variable criteria. However, a step-wise weighting method based on a decision-tree algorithm produced two SRCR corresponding to two tolerance levels. The SRCR showed that the highest standardized regression beta was for high school average followed by EES and lastly, by RIC. Thus, two comparisons were made: the first between high school average and EES and second between EES and RIC. There was no need to calculate a third SRCR between high school average and RIC because a transitive relation can be implied. The weight ratio of high school average and EES was at 1.1757:1, which indicates that high school average has an 18% higher weight for a one-unit increase in EES scores. The weight ratio of EES to RIC was 1.89:1.
The weights were constrained through an increase/decrease dictated by the SRCR; thus, by holding RIC constant and using a 5% (i.e., 0.05) on the increased weight for average high school grade and simultaneous decrease in EES of 5% and its converse, a correlation was calculated between the predicted, weighted college GPA with actual third-semester GPA: The predicted college GPA for the first regression equation (Eq. 6) showed the highest correlation for a tolerance level for an increment between -0.1 and 0.1, (i.e., 10% decrease/increase). A second regression held high school average constant and EES and RIC weighted by a 0.1 increase/decrease (Eq. 8 and 9). At each increment increasing EES by a 0.1 increment and decreasing by 0.1 for RIC, a correlation was calculated between the predicted college GPA and actual GPA: The results of the second regression showed that at a 10% increase/decrease for EES and RIC respectively gave the highest correlation coefficient of the predicted college GPA and the actual third-semester college GPA. The composite equation was formulated based on the decision-tree algorithm. The weights for the preselected variable with the highest SRCR showing an increase in high school average by a tolerance level of 10% that would provide the highest correlation with the third-semester GPA and hence, high school average having a 10% increase in weight. The other two variables would have a 10% decrease in weight to compensate for the increase in high school average. Because of the two-step increase/decrease in the weighted ratio approach, the variable that has the highest standardized regression coefficient, viz., high school average would have the highest weight. However, it could be implied that each variable (high school average, EES and RIC) has one-third of the total score. This would not hold in the weighting scheme because of the greater weight ratio given to high school average when compared with EES and RIC. In effect, the composite formula would be distributed equally between high school average, on the one hand and EES and RIC, on the other. Thus, each would have a 0.5 chance for a full score. This allocation is based on the SRCR where high school average produced a 110% cumulative increase over EES and RIC combined. Thus, allocating 50% of the weight to high school and 50% for EES and RIC combined gives a substantive conservative approach. Therefore, a 10% increase in high school average would require a 10% decrease in EES with RIC combined.
We introduced a range of reduced weights for the two combined variables EES and RIC. The increment change for the two combined variables ranged between 0.9 and -0.9 with 0.1 incremented runs as to correlate with third-semester GPA; this can be expressed in the following Eq. 10: PCS = (1.1×HSA)/2+(EES×(0.9-I)+RIC×(I))/2 (10) PCS = Predicted Composite Score.
The composite Eq. 10 had the weights decrease then increase on EES and RIC (ordered according to SRCR) from a value of -0.1 to -0.8 and from 0.-0.8 respectively; at an increment of 0.1. At each increment a claculated predicted composite score, was correlated with third-semester GPA. The highest correlation appeared for a 0.1 increment (10%), which decreases the EES by 0.8/2 and RIC by 0.1/2. The best possible weights for a valid, appropriate and empirically substantiated weight according to the final output of the three iterations could be calculated in the following manner: • Weight for high school average= ((1.1)/2)×100 = 55% • Weight for EES = ((0.9-0.1) /2) × 100 = 40%

DISCUSSION
This study illustrates the development of a logical, interpretive and substantive academic admission composite score. The composite score is simply a weighted index of high school average, EES or SAT and RIC. The study validates the weighted composite score by correlating the score with the third-semester college GPA. The algorithm uses regression, which typically is used to validate predictive criteria for college GPA. Once the regression coefficients are calculated, they create the course for weight tolerance levels and provide the directional weight increase/decrease to the criteria used in the composite equation.
It is shown through numerical methods that the correlation with college GPA of a composite equation constitutes an evidence of an objective measure and calculation. The method can be seen as a model for which other variables can be used as criteria for admission. The method is very conservative in that it uses the regression equation and establishes the weights based on the regression coefficients to obtain a tolerance level as a best predictor for GPA. The predictor variables accurately use the SRCR as the guide for weighted limits. Homogeneously distributed weights on the composite equations were altered through a decision-tree approach, which alternates the weights for each variable sequentially with the weighted change between a maximum of two variables (criteria) at a time, until all possible combinations are exhausted. The composite score equation is then weighted with respect to the tolerance levels. The highest correlation between predicted college GPA and actual college GPA provides the optimal weighted criteria on a composite equation. This validation overlaps with the predicted regression equation with its weighted criteria. The significance of this method is that it can effectively predict college success and thus help admission officers set the criteria for admission to a college. The composite score can also define and can be used for cut-off scores based on admission policy. We would also show in future work that admission cutoffs could be empirically validated through numerical methods (iterative procedures) and thus, not altogether a subjective procedure. Developing weights in the composite equation constitutes a better approach when compared with subjective determination process. Subjectivity is probably a necessary approach to creating an index, particularly in choosing a cut-off score for performance in college, (i.e., college GPA) (Xiao, 2002). Results of the regression and numerical analyses suggest that n number of criteria can be used within a composite equation and have a specific application for the prediction of GPA.
Another important finding in this study, as other studies have shown, is that high school average is the most important variable to predict college performance (Noble and Sawyer, 2000;Hu, 2002, Noble andSawyer, 2002). In comparing the weight of high school average with EES or RIC, it is clear that high school average is a consistent, valid and reliable factor that is associated positively with college GPA.

CONCLUSION
The results of this study have important implications for admission officers at universities and colleges. In a broader context, it provides a method to weight criteria for admissions using the multiple regressions and the algorithmic decision-tree approach. We invite researchers to work out the algorithm in their own institutions. We would encourage others to use additional variable criteria for creating a composite equation and a number of criteria as predictors for success in college (Rowe et al., 1985). Some of these factors could illuminate and can be used in composite equations. Future research might include better outcome measures, like success in work, satisfaction in college, to improve the predictive power of the admission index.