EMPIRICAL COMPARISON OF TWO METRICS SUITES FOR MAINTAINABILITY PREDICTION IN PACKAGES OF OBJECT-ORIENTED SYSTEMS: A CASE STUDY OF OPEN SOURCE SOFTWARE

Software maintainability has been an important external quality attribute that concerns both styles of software development, the proprietary model as well as open source. As lots of open source software are predominantly built using the OO paradigm, there exists a need for empirical validation with respect to certain quality aspects especially maintainability. There are a few studies in the past which use code metrics and a few which use design metrics, much earlier in the software development life cycle to predict maintainability. In addition, there are studies which apply both code as well as design metrics to evaluate maintainability. The objective of this research is to perform an empirical comparison of two popular OO metrics suites, the Martin suite and the CK suite on four open source software systems by analysing a few key design metrics such as size, coupling, complexity, inheritance and stability. Two important observations were made with this empirical study. First, between the two OO suite of design metrics, the prediction model developed using Martin metrics scores better than the model developed using the CK suite. Second, the combination of Martin and CK suites is helpful in predicting the maintainability of OO software, with a predictive accuracy of 66.7%, better than that of the models constructed by either Martin metrics or by the CK metrics individually.


INTRODUCTION
The IEEE Standard Glossary of Software Engineering Terminology defines maintainability as "The ease with which a software system or component can be modified to correct faults, to improve performance or other attributes, or adapt to a changed environment" (IEEE, 1990).Software maintainability is an important external quality attribute that plays a primary role when the quality of software is evaluated.There have been several proprietary software systems which have evolved using the Object-Oriented paradigm.Plenty of research works exist which have studied the quality attributes of such software.
Recently, many open source software systems have also started evolving the Object-Oriented way and hence it becomes essential to investigate such software also from the quality perspective.
At the initial stages of software development, the evaluation of quality parameters was carried out during the later stages of the software development life cycle.At this stage, it becomes rather difficult to make changes in the design.Many empirical studies that have been conducted in the recent past indicate that software design metrics help in better prediction of maintainability when compared to measuring it during the later stages of the Science Publications JCS software development life cycle.Bansiya (2002) built a hierarchical model using the OO design properties and related those properties to high-level quality attributes.Subsequent to this research, several design metric suites such as Martin metrics (Martin, 2003;Chidamber and Kemerer, 1994) and MOOD metrics (Brito and Abreu, 1996) were extracted from the data sets of commercial software projects and subjected to statistical analysis.Some studies were performed to verify which particular suite of metrics would be able to quantify a specific quality attribute in the best possible way.Subsequently, predictive models gained popularity and thereby researchers started building predictive models using these design metrics to evaluate the quality of many software systems.As lots of open source software is built, predictive models using data sets of open source software systems have gained significance and are particularly focusing on certain quality parameters like maintainability, fault-proneness and understandability.This research study introduces a new perspective in predicting maintainability using design metrics by making an empirical comparison between two popular OO metric suites, the Martin and CK suites.Further, it also provides indications on which particular metric suite would be better in predicting maintainability of OO software.For this purpose, certain package-level design metrics of the Martin metrics suite were extracted using the jdepend tool (Clark 1999) and the CK suite using the ckjm tool (Greece 2005) from all the versions of popular open source software applications namely jfreechart, javageom, freemind and treeview.
The empirical analysis that has been performed, compares the relationships between the package design metrics proposed by the Martin and CK suite across the above four open source software systems.The rest of the paper is organized as follows.Section 2 reviews the related work.Section 3 defines both the Martin as well as CK suite of metrics.Section 4 describes the open source software taken for a case study.Section 5 highlights the methodology that was used in predicting maintainability.Section 6 gives the results.Section 7 presents the discussion.Section 8 presents the threats to validity.Section 9 gives conclusions obtained from the empirical study.Oman and Hagemeister (1994) quantified the maintainability of a system with an MI (Maintainability Index) which was primarily a combination of different code metrics.The concept of using both the code as well as design metrics in predicting maintainability was proposed by Misra (2005).In this study, it was found that both the metrics were useful in evaluating the maintainability of software.Later, (Zhou and Baowen, 2008) empirically investigated the relationships open source software systems.Based on this investigation between 15 design metrics and maintainability of 148 java software, it was found that size and complexity metrics strongly related to maintainability.Gupta and Chhabra (2012).empirically studied 18 packages from two open source software systems and found strong correlations between package coupling and understand ability of a package (s).This study also suggested that coupling metrics could be used to represent other external quality factors.Elish (2010) explored the relationships between five package-level metrics of the martin suite and the effort required to understand a package.This study studied eighteen packages from two open source systems and found statistically significant correlation between most of the martin metrics and understandability of a package.Elish et al. (2011) empirically evaluated three suites of package-level metrics (Martin, MOOD and CK) in predicting the number of pre-release and post-release faults in packages of eclipse software.It was found that models which are based on Martin suite had more predictive power when compared to the MOOD and CK suites across various releases of eclipse.The current study that has been conducted in this study empirically compares the relationships between the Martin suite and the CK suite on maintainability.

Martin Suite of Metrics
The metrics proposed by Martin (2003) which are used in this empirical study are defined in this section.These design metrics were extracted for all the 52 versions of jfreechart using the jdepend tool (Fig. 1) since its release.Further, the metrics were extracted at the package level as packages have now become important organizational units for large applications (Niemeyer and Knudsen, 2005).

Distance from the Main Sequence (D)
This is the perpendicular distance of a package from the idealized line A+I = 1.A package that is squarely on the main sequence is optimally balanced with respect to its abstractness and stability.Ideal packages are either completely abstract and stable or completely concrete and unstable.The range of values for this metric is between 0 to 1, with D = 0 indicating a package coincident with the main sequence and D = 1 indicating a package as far as possible from the main sequence.

Chidamber and Kemerer (CK) Suite of Metrics
The CK suite consists of six class-level metrics that are defined in this section as follows.

Weighted Methods Per Class (WMC)
WMC is defined as the sum of the complexities of all the methods defined in a particular class.

Coupling between Object Classes (CBO)
This metric gives the number of classes coupled to a given class.

Response for a Class (RFC)
This metric measures the number of different methods that can be executed when an object of that class receives a message.

Depth of Inheritance Tree (DIT)
This metric provides for each class a measure of the inheritance levels from the object hierarchy.

Number of Children (NOC)
This metric measures the number of immediate descendants of the class.

Lack of Cohesion in Methods (LCOM)
This metric counts the sets of methods in a class that are not related in sharing some of the class's data.Since the CK suite captures the metric values at the class level, they have been converted to the package level by taking the average of all classes in a package.

CASE STUDY
The software systems that have been taken for a case study are jfreechart, javageom, freemind and treeview, all of which are open source.The jfreechart software is a very popular charting application that enjoys the maximum downloads (4000 downloads per week).All the 52 versions of jfreechart, 15 versions of freemind software, 21 versions of javageom software and 18 versions of treeview software are taken for analysis.All these software systems are popular systems among the user community.The significance of this selection is that all these software were developed using the java language.A dataset of 106 versions of open source software was taken for statistical analysis.

METHODOLOGY USED IN PREDICTING MAINTAINABILITY
In our case study, the maintainability of a system is quantified with a Maintainability Index (Oman and Hagemeister, 1994).MI is a combination of different metrics that affect maintainability.It can be defined as follows: where, aveV is the average Halstead's Volume per module, aveV(g') is the average extended cyclomatic complexity per module, aveLOC is the average count of lines of source code per module and perCM is the average percentage of lines of comments per module.This is a code metric which takes into account several aspects of maintainability like size, complexity and selfdescriptiveness of the source code.The range of MI values are given in Table 1.The maintainability index for all the versions of the four different open source software was measured and this was taken as the dependent variable for studying the relationships between design metrics and maintainability.
The different package design metrics (AC, CC, Ca, Ce, I, A, D, WMC, RFC, DIT, NOC, CBO and LCOM) were taken as the independent variables.These metrics have already been defined in Section 3. Metric data was collected from the several versions of the four open source software.Notably, as all these metrics were captured at the package level, the mean value of all packages in a particular version was taken as the independent variables for the study.
We know that every software system consists of both the system packages as well as user-defined packages.In this study, only the user-defined packages across all the versions have been considered.This would provide clear indications on how user-defined packages have been designed.Further, it will also provide indications on which metrics need to be taken care while designing the next version of software.The study was conducted in three phases as below: Several statistical tests like multivariate correlation, multivarite regression and factor analysis were performed using the dataset in all the three phases.Further, we tested the OO dataset for multi-collinearity by performing a test for multi-collinearity and a VIF (Variance Inflation Factor) test.The following subsections define the different statistical tests that were applied in all the three phases of our case study.

Multivariate Correlation
The degree of relationship between two or more variables is statistically called as correlation.It can also refer to the co-variation (variation in one variable affecting the variation in the other variable).The degree of correlation between two variables is called as simple correlation or univariate correlation and the degree of correlation between one variable and several other variables can be called as multiple correlation or multivariate correlation.Both uni-variate and multivariate correlation were performed to understand the influence of all the design metrics on maintainability.The following tests were performed to test the levels of correlation.

Test for Multi-Collinearity
Multi-collinearity is a statistical test that is used the test the level of dependence or correlation among design metrics.During correlation, if we find that every variable in correlation is depending on every other variable, chances of multi-collinearity is possible.This can be detected when almost all the inter-correlations between variables have a value greater than 0.9.Statistical evidence has shown that the existence of multicollinearity within a dataset would never help in providing the right prediction about the correlations between design metrics and if not detected, would result in making biased conclusions.

Variance Inflation Factor (VIF) Test
Multi-collinearity can also be detected by testing the variance inflation factor of all the design metrics.We Science Publications

JCS
tested this also and kept the VIF to a minimum by applying another multivariate statistical technique called as Factor Analysis.

Multivariate Regression
Regression is the determination of statistical relationship between two or more variables.One variable (independent) is the cause of the behavior of another one (dependent).When there are more than two independent variables, the analysis concerning the relationship is known as multiple correlations and the equation describing such relationship is called as the multiple regression equation.Regression analysis is concerned with the derivation of an appropriate mathematical expression which is derived for finding values of a dependent variable on the basis of independent variable(s).It is thus designed to examine the relationship of a variable Y to a set of other variables X 1 , X 2 , X 3 ………….X n .Therefore, multivariate regression analysis was performed to examine the common effectiveness of the metrics.The general form of a multivariate linear regression model can be given by: where, x i 1 ,…,x i k are the independent variables, a 0 ,...,a k are the parameters to be estimated, ˆi y is the dependent variable to be predicted, y i is the actual value of the dependent variable and e i is the error in the prediction of the i th case.We used stepwise regression to build the model.

Factor Analysis
This is a multivariate statistical technique that is used if multi-collinearity exists within the data set.If multi-collinearity is left undetected within a data set, biased conclusions can be made while making a few predictions.We performed regression after obtaining the factor scores as a result of factor analysis.Factor scores are a set of values that are generated from the original data set.Regression is later performed with factor scores as the independent variables and MI as the dependent variable.There are two important parameters of factor analysis.The KMO measure of sampling adequacy is used to compare the magnitudes of the observed correlation coefficients in relation to the magnitudes of the partial correlation coefficients.KMO values range between 0 and 1 and it is good to have values closer to one.
Bartlett's test of sphericity is a statistical test that is used to test whether the correlation matrix is an identity matrix i.e., all metric variables are perfectly correlated with themselves (a value of one) and have some level of correlation with the other metric variables.If they are not correlated with the other items, then they can't be a part of the same factor.Researchers always look for significance value less than 0.05.
The communalities are yet another result of factor analysis.The communalities explain the proportion of variance accounted for by the common factors (or 'communality') of a variable.The communality value has a range between 0 to 1.A value of 0 means that the common factors don't explain any variance; 1 means that the common factors explain ALL the variance.Researchers always look for a higher value closer to one.
Therefore, we performed all the tests in each phase that were necessary to make strong conclusions on predicting maintainability.

Multivariate Correlation
The inter-correlation values between the design metrics of the Martin suite were not greater than 0.90.Except for the concrete classes, all the other metrics had significant influence on maintainability.

Multivariate Regression
The regression model fetched a multiple correlation coefficient of 0.787.The value of R 2 was 0.620 and also significant at the 99% level.The f values were also high.

Factor Analysis
Since the efferent coupling of the Martin suite had a VIF of 6.946 (not a desired range), we performed factor analysis on the data set and then later performed regression using the factor scores obtained from factor analysis.The Martin metrics gave a KMO value of 0.592.The Bartlett's test of sphericity value was less than 0.01.

Multivariate Correlation
As with the Martin suite, all the inter-correlation values had no multi-collinearity i.e., all the correlation values are not >.9.The variance inflation factor was also checked to detect the presence of multi-collinearity within the dataset.It clearly showed that out of six metrics, two metrics namely WMC and CBO have significant correlation with MI at 99% level.The LCOM metric is also significant at 95% level.WMC was positively contributing towards maintainability.CBO showed a negative correlation.

Multivariate Regression
The NOC metric became a removed variable from the regression analysis as it did not contribute significantly on MI.The coefficient of determination R-square was found to be 0.471.The R square value was also significant at 99% level.

Factor Analysis
As done with the Martin metrics suite, factor analysis was performed on the CK metrics data set to remove any levels of multi-collinearity.The results of factor analysis and factor scores regression were as follows: • The KMO value is just 0.465 which is less than what was obtained with the Martin metrics suite • The communalities value of the DIT metric was at 0.480 whereas in the Martin metrics suite, all the variables had a very high communality value • There are two factors that have been formed by factor analysis that explains 82% of the total variance which is less than the Martin metrics suite which obtained 92% • The regression performed after factor scores obtained through factor analysis yields an R 2 of 0.242 which is very less when compared to the Martin suite which gave an R 2 of 0.463

Multivariate Correlation
The following were the inferences from the analysis.Six metrics out of seven of the Martin suite and three metrics of the CK suite are showing significant correlation with MI.The DIT metric(Martin suite), NOC metric and RFC metric(CK suite) did not show any impact on MI.

Multivariate Regression
Stepwise regression was performed with the combination of the Martin suite and CK suite.We found that the distance metric of the Martin suite as the primary contributor influencing MI.The CBO, WMC and RFC metrics of the CK suite are secondary indicators.The abstractness metric of the Martin suite is significantly influencing MI.

Factor Analysis
The regression with factor scores gave an R 2 of 0.511 i.e., the variables explain 51.1% of the variance in MI.

Multivariate Correlation
Since the inter-correlation values between the design metrics of the Martin suite were not greater than 0.9, this indicates that there is no big multicollinearity in the dataset.

Multivariate Regression
As the predicted values were obtained as a linear combination of the distance metric, afferent couplings, concrete classes and efferent couplings, the co-efficient value of 0.787 indicates that the relationship between maintainability and the four independent variables of the Martin suite is quite strong and positive.The coefficient of determination R-square measures the goodness of fit of the estimated Sample Regression Plane (SRP) in terms of the proportion of the variation in the dependent variable explained by the fitted sample regression equation.Thus, the value of R square is 0.620 simply means that about 62% of the variation in maintainability is explained by the estimated SRP that uses distance, afferent coupling, concrete classes and efferent coupling as independent variables.

Factor Analysis
The KMO value of 0.592 is good.The Bartlett's test is less than 0.01 i.e., i.e., .000which is very good and is a test which indicates that factor analysis can be Science Publications JCS continued further.It was also noticed that all the Martin design metrics showed a high communality value which provides us a fact that most of the variance in the dataset have been explained by the factors.This is very positive and good.

Multivariate Correlation
The WMC had a positive influence on the maintainability i.e., when the weighted methods for a class increases, maintainability also increases, which is a surprising result.Literature shows that high WMC results in high complexity which in turn reduces maintainability and a low WMC always helps in reusability, testing and more importantly bettering maintainability levels.CBO showed a negative correlation i.e., when CBO decreases the maintainability increases and vice versa.RFC also showed a significant negative influence on maintainability.LCOM shows a positive correlation i.e., when the levels of method cohesion in a class increases, maintainability increases and vice versa.There is past literature which justifies the fact that when higher levels of LCOM exists within a class, it results in a fault or error.

Multivariate Regression
The R 2 value explains about 47.1% of the variation in maintainability that uses CBO and RFC as independent variables.The R square value was also significant at 99% level.The other metrics were removed by the regression model.Though both the metrics CBO and RFC are significant at the 99% level, the F values are not very high.

Factor Analysis
The F-values that were obtained by the CK suite were much lower than the F-values of the Martin suite i.e., The R 2 also seems to be lower in the case of the CK suite when compared with the Martin suite.

Predicting Maintainability using Martin
and CK suite

Multivariate Correlation
There is no multi-collinearity in the OO dataset taken for analysis.Therefore, the conclusions made are valid conclusions.

Multivariate Regression
The following conclusions can be made after performing multivariate regression analysis: • The distance metric is the balance between abstractness and instability which is giving a negative influence on MI.Previous literature has shown that as and when packages have a high distance value, maintainability becomes difficult.When packages stay within the main sequence, it is good for maintainability purposes (Martin, 2003).Abstractness talks about the number of abstract classes when compared to the concrete classes in a package Instability is the ratio of efferent coupling to total coupling (efferent coupling + afferent coupling).This negative influence indicates that coupling has a negative influence on MI • The CBO metric of the CK suite is the next important predictor which again indicates that any sort of coupling is detrimental in bringing down the values of MI.To add, it again gives a negative influence on MI • The WMC metric delivers a negative influence i.e., when the weighted complexity of methods in a class increases, the MI would decrease.It is advised to reduce the complexity of the methods in a class • The RFC metric and the Abstractness metric are giving positive influences.This gives us another indication that when the count of abstract classes are higher when compared to the concrete classes, this stands as a good sign in increasing the MI.It is advised to use more abstract classes in package design The model generated is able to give a predictive accuracy of 0.667% i.e., the model is able to explain 66.7% of the variance in MI.The F values are also significant and higher when compared to the F values of Martin and CK suite.

Factor Analysis
Stepwise multiple regression was done with the four factor scores generated after applying factor analysis.The factor scores were taken as the independent variable and the MI was taken as the dependent variable.The comparative study is presented in Table 2.

JCS
conclusion is very much evident where Martin metrics are scoring better than CK metrics (Table 2).When both the Martin and CK suite were used to build a model, there are a few important parameters where this model (Martin and CK) seems to predict maintainability better than the Martin and CK suite independently.i.e., The goodness of fit (R 2 ) after regression is 66.7%, which is better than the Martin suite and the CK suite and the other is the R 2 value with factor scores regression which is 51.1%.Therefore, it is advised to use the Martin and CK suite model in predicting maintainability of open source software.More importantly, the Distance and Abstractness metric of the Martin suite and CBO, WMC and RFC metrics of the CK suite are significantly influencing maintainability either positively or negatively.
As future work, we would like to investigate other popular object oriented-suites and extract evidence on their impact too on predicting the maintainability.Our immediate focus would be on getting the right blend of metrics that would help in predicting the maintainability of object oriented open source software in the best possible way.

ACKNOWLEDGMENT
I would like to thank Dr. Chitra Babu, Professor and Head, Department of CSE for giving effective research directions.I would also acknowledge the PG students of my Department who supported me in pursuing this research work".

Funding Information
This project was funded by the Department of Computer Applications, SSN College of Engineering, Kalavakkam.

Ethics
I wish to state that this work is done by me wholly and there are no ethical issues that would arise after this article gets published.

Fig. 1 .
Fig. 1.The extraction process a) A predictive model for OO software maintainability using Martin metric suite b) Identification of the most influential metrics from both Martin and CK suites useful for predicting OO software maintainability c) A predictive model for OO software maintainability using a subset of Martin and CK metric suites