Cohesion Metrics for Ontology Design and Application

: Recently, domain specific ontology development has been driven by research on the Semantic Web. Ontologies have been suggested for use in many application areas targeted by the Semantic Web, such as dynamic web service composition and general web service matching. Fundamental characteristics of these ontologies must be determined in order to effectively make use of them: for example, Sirin, Hendler and Parsia have suggested that determining fundamental characteristics of ontologies is important for dynamic web service composition. Our research examines cohesion metrics for ontologies. The cohesion metrics examine the fundamental quality of cohesion as it relates to ontologies.


INTRODUCTION
With the tremendous development of the Internet and combined new technologies, ontologies have become a key technology to provide shared knowledge models to semantic-driven applications on the Internet. Defined as "explicit specification of a conceptualization" [1], ontologies have been used for artificial intelligence, natural language processing, information query systems, agent software systems. With today's internetbased world, ontologies provide intelligent and adaptive solutions to today's distributed and dynamic information processing. There have been many knowledge representation languages developed to model domain ontologies. Recently, Web Ontology Language (OWL) is "already being used as an open standard for deploying large scale ontologies on the Web [2]. As an ontology language, OWL supports knowledge representation, domain vocabulary sharing, advanced search, software agents and knowledge management [3,4]. With the promising benefits provided by ontologies, domain ontology development and management have become more and more important in all kinds of knowledge-driven applications with commercial success. It has become important to determine fundamental characteristics of ontologies [5]. Since software metrics have been used in objectoriented/object-based software systems to assess software quality, we examine ontology metrics that would be a desirable feature in ontology assessment. Similar to software metrics, ontology metrics are expected to give some insight for ontology developers to help them design ontologies, improve ontology quality, anticipate and reduce future maintenance requirements, as well as help ontology users to choose the ontologies that best meet their needs. For ontologybased systems, measuring ontologies in the early phases of the software development life-cycle allows better management for the later phases, more effective quality assessment and maintenance estimation. This study, proposes a set of ontology cohesion metrics to measure the modular relatedness of OWL ontologies. These metrics are Number of Root Classes (NoR), Number of Leaf Classes (NoL) and Average Depth of Inheritance Tree of all Leaf Nodes (ADIT-LN). The metrics are collected by using a standard XML DOM parser that parses the XML-based OWL ontology syntactically, but computes cohesion metrics conceptually based on predefined OWL primitives which explicitly defined tree-based semantic hierarchies in OWL ontologies. These metrics are theoretically validated using standard metrics validation frameworks, then are empirically validated by comparing them statistically to assessments performed by a human team of evaluators.

Background
Ontology: An ontology is defined as "a formal, explicit specification of a shared conceptualization [1]. The ontology structure O, proposed by Maedche [4], can be described by a 5-tuple O: = {C, R, H, rel, Å }. C is for concepts; R is for relations; H is for concept hierarchy; rel is a function relating the concepts non taxonomically; and Å is a set of ontology axioms expressed in appropriate logical language. As an explicitly defined and machine-processable abstract model, ontologies were developed for the purpose of knowledge sharing to provide shared common understanding about domain knowledge. In order to serve as metadata schemas to help people and machines to communicate concisely and consistently, different ontology representation languages have been used to store domain knowledge so that knowledge can be reused, shared and interchanged. Recently, as an effective and expressive ontology representation language, Web Ontology Language (OWL) is regarded as "an important step for making data on the Web more machine processable and reusable across applications" [2]. OWL is developed and recommended by the W3C Web Ontology Working Group as a knowledge modeling language to define and instantiate Web ontologies, which describe classes and properties and their instances and from which the logical consequences can be semantically derived, though the facts are not literally presented [4,6]. OWL is based on RDF/RDF-S (Resource Description Framework and RDF Schema) and evolved from DAML+OIL (DARPA Agent Markup Language + Ontology Interference Layer). As an ontology language with rich representative primitives, OWL provides the following promising benefits: knowledge representation, domain vocabulary sharing, advanced search, software agents and knowledge management [3,4].

Cohesion Metrics:
Cohesion traditionally refers to the degree to which the elements in a module belong together. More particularly, in object-oriented software, cohesion refers to the degree of the relatedness or consistency in functionality of the members in a class; strong cohesion is recognized as a desirable property of object-oriented classes because it measures separation of responsibilities, independence of components and control of complexity [7,8]. Several software cohesion metrics have been proposed , [9][10][11]. One of the most widely known object-oriented cohesion metrics was proposed by Chidamber and Kemerer [9,10]: Lack of Cohesion of Methods (LCOM). The value of LCOM is the number of pairs of methods in a class having no common instance variables, |P|, minus the number of pairs of methods in the class having common instance variables, |Q|. If |P| < |Q|, the value of LCOM is set to zero. Value of LCOM is an inverse to cohesion: a higher value of LCOM demonstrates lower cohesion.

Criteria for Analyzing Metrics:
It is desirable to have a formal set of metrics evaluation criteria to assess the usefulness and correctness of software metrics. Kitchenham et al. [12] proposed a general framework for software measurement validation. They identify concepts that are necessary for measurement: * Entities: objects will be researched * Attribute: properties will be measured about the entities * Units: how to measure an attribute * Scale type: nominal, ordinal, interval, or ratio Described in their framework, a direct measure should have the following four aspects to be theoretically valid: * Attribute validity: the entity being evaluated has the attribute. * Unit validity: the unit used is appropriate for the attribute. * Instrumental validity: the underlying model is valid and the measurement instrument is calibrated. * Protocol validity: the protocol used for the measurement is consistent, unambiguous and prevents problems.
Briand et al. [11] proposed a set of criteria for different software metrics including more specific criteria for cohesion metrics: * Nonnegativity and normalization: the value is never negative and the values can be compared between different modules. * Null value: the value is zero if there is no intramodule relationship within a module. * Monotonicity: the value never decreases when adding intramodule relationships into a module. * Cohesive module: the value of merged modules is never greater than the maximum cohesion of the original modules.

Ontology Cohesion Metrics
Research Description: We consider ontology cohesion metrics as part of a measure for ontology modularity: ontology cohesion refers to the degree of the relatedness of OWL classes, which are semantically/conceptually related by the properties. An ontology has a high cohesion value if its entities are strongly related. The idea behind this is the concepts grouped in an ontology should be conceptually related for a particular domain or a sub-domain in order to achieve common goals. We did related research from software cohesion metrics and noticed that actually the most cited Chidamber and Kemerer software cohesion metrics were theoretically based on concepts similar to those of objects of ontologies [13,14]: because cohesion metrics are intended to measure modularity, the metrics similar to the software cohesion metrics can be defined to measure relatedness of elements in ontologies. In this paper, our goal is to define and validate a set of ontology cohesion metrics to measure cohesiveness of OWL ontologies. The idea is that these ontology cohesion metrics, similar to software cohesion metrics, can be used to measure separation of responsibilities and independence of components of ontologies. To achieve this goal, we studied ontologies and ontology engineering in general in order to propose a set of cohesion metrics based on general characteristics of ontologies, which we then theoretically validated using Kitchenham et al. [12] software measurement validation framework and Briand et al. [11] more specific validation criteria for cohesion metrics. An automated data collection tool was then developed and implemented to collect samples of our ontology metrics from a set of OWL ontologies. The results were statistically compared with the assessments of a human evaluation team in order to demonstrate the correlation between the two data sets.

Common Formal Notation:
We use the following formal notation to mathematically define the ontologies and the cohesion metrics: * Let C 1 , C 2 , …, C m be the set of m classes explicitly defined in an ontology. * Let P 1 , P 2 , …, P n be the set of n properties which work as relationship between the set of classes. * Let F c1, F c2, …, F cm, be the fanout of each class C i in the set. * Let O i be an OWL ontology of interest. * Let be subtype relationship from C i to C j such that C i C j if class C j is a subclass of class C i .
In ontologies, concepts are typically organized into a taxonomy tree where each node represents a concept and each concept is a specialization of its parent [5]. OWL defines hierarchical constructs to present a treebased structure for subtype relationships between entities in OWL ontologies. The following terminology used in this paper describes this tree-based relationship.

Cohesion metric #3: Average Depth of Inheritance
Tree of Leaf Nodes: Definition: Average Depth of Inheritance Tree of all Leaf Nodes, ADIT-LN is the sum of depths of all paths divided by the total number of paths. A depth is the total number of nodes. A depth is the total number of nodes starting from the root node to the leaf node in a path. The total number of paths in an ontology is all distinct paths from each root node to each leaf node if there exists an inheritance path from the root node to the leaf node. And root node is the first level in each path. For example, ADIT-LN of an OWL ontology is described in Fig. 3.
Mathematically, ADIT-LN is formulated as follows: ADIT-LN(O i ) = D j /n for all D j (D j is total number of nodes on j th path); 1≤j≤n (number of paths in O i ); In the Fig. 3, the ontology O i has ADIT-LN(O i ) = 2.

Analysis of Ontology Cohesion Metrics Theoretical Validation Analysis of NoR (Number of Root Classes):
According to Kitchenham et al. [12] software measurement validation framework, cohesion metrics NoR is a direct measure, in this case, to count the root nodes (classes) in ontologies. For this measurement: * The entity is the ontology O i being analyzed. * The attribute measured is the number of root classes. * The unit is the class. * The data scale is interval.
The NoR meets to Kitchenham, Pfleeger and Fenton's properties as follows: * Attribute validity: the entity (the ontology being analyzed) has the attribute (number of root classes), a measure of the total number of root classes explicitly defined in the ontology. * Unit Validity: the attribute is measured by counting the number of root classes. * The instrument is valid as long as the metrics collecting tool parses and counts the number of root classes defined in the ontology O i correctly. * Protocol validity: calculations performed according to the formal notation given in this paper will be free from counting errors by counting the number of root classes, which is consistent and unambiguous.
According to Briand, Morasca and Basili's cohesion measurement validation: * Nonnegativity and Normalization: the value of NoR is never negative and the values can be compared between different ontologies. * Null Value: not applicable to the NoR metric. If there is no intramodule relationship within an ontology, the value of NoR is still the number of root nodes (in this situation, the root nodes are also leaf nodes). * Monotonicity: may not applicable to the NoR metric. The value of NoR may decrease if a root node becomes a non root node after adding an intramodule relationship. For example, in Fig. 4, the added relationship (represented by a dashed arrow line) makes the previous root node C 5 a subclass of another node C 3 . Otherwise, the criterion is applicable to the NoR metric.
* Cohesive Module: the value of NoR of merged modules is never greater than the maximum NoR of the original modules. The number of root nodes of merged modules will never be greater than the maximum number of root nodes in the orginal modules, because there is no reason that a non-root node should become a root node after merging modules.

Analysis of NoL (Number of Leaf Classes):
According to Kitchenham, Pfleeger and Fenton's software measurement validation framework, cohesion metrics NoL is a direct measure, in this case, to count the leaf nodes (classes) in ontologies. For this measurement: * The entity is the ontology O i being analyzed. * The attribute measured is the number of leaf classes. * The unit is the class. * The data scale is interval.
The NoR meets to Kitchenham et al. [12] properties as follows: * Attribute validity: the entity (the ontology being analyzed) has the attribute (number of leaf classes), a measure of the total number of leaf classes explicitly defined in the ontology. * Unit Validity: the attribute is measured by counting the number of leaf classes. * The instrument is valid as long as the metrics collecting tool parses and counts the number of leaf classes defined in the ontology O i correctly. * Protocol validity: calculations performed according to the formal notation given in this paper will be free from counting errors by counting the number of leaf classes, which is consistent and unambiguous.
According to Briand et al. [11] cohesion measurement validation: * Nonnegativity and normalization: The value of NoL is never negative and the values can be compared between different ontologies. * Null value: not applicable to the NoL metric. If there is no intramodule relationship within an ontology, the value of NoL is still the number of leaf nodes (in this situation, the leaf nodes are also root nodes). * Monotonicity: may not applicable to the NoL metric. The value of NoL may decrease if a leaf node becomes a non leaf node after adding an intramodule relationship. For example, in Fig. 5 the added relationship (represented by a dashed arrow line) makes the previous leaf node C 3 a superclass of another node C 4 . Otherwise, the criterion is applicable to the NoL metric. * Cohesive module: the value of NoL of merged modules is never greater than the maximum NoL of the original modules. The number of leaf nodes of merged modules will never be greater than the maximum number of leaf nodes in the orginal modules, because there is no reason that a non-leaf node should become a leaf node after merging modules.

Analysis of ADIT-LN (Average Depth of Inheritance
Tree of Leaf Nodes): According to Kitchenham, Pfleeger and Fenton's software measurement validation framework, cohesion metrics ADIT-LN is a direct measure to count the depth of the inheritance tree for all leaf nodes (classes) in ontologies. For this measurement: * The entity is the ontology O i being analyzed. * The attribute measured is the average depth of inheritance tree of all leaf classes. * The unit is the depth of inheritance. * The data scale is interval.
The ADIT-LN meets to Kitchenham et al. [12] properties as follows: * Attribute validity: the entity (the ontology being analyzed) has the attribute (number of leaf classes), a measure of the average depth of inheritance for all leaf nodes in the ontology interested. * Unit Validity: the attribute is measured by counting the depth of all leaf classes. * The instrument is valid as long as the metrics collecting tool parses and counts the average depth of all leaf classes defined in the ontology O i correctly. * Protocol validity: calculations performed according to the formal notation given in this paper will be free from counting errors by counting the depth of all leaf nodes in the inheritance tree and the number of all leaf classes, which is consistent and unambiguous.
According to Briand et al. [11] cohesion measurement validation: * Nonnegativity and normalization: the value of ADIT-LN is never negative and the values can be compared between different ontologies. * Null Value: not applicable to the ADIT-LN metric.
If there is no intramodule relationship in an ontology, the value of ADIT-LN is the 1, because each leaf node has a depth of one. * Monotonicity: may not applicable to the ADIT-LN metric. The value of ADIT-LN may decrease. For

Empirical Validation Description of Empirical Study:
To perform our empirical analysis on ontology cohesion metrics, we collected a set of ontologies developed by Creative Commons [17]. The computation was performed by Ontology Metrics Parser (OMP), a XML parser, that parses the XML-based OWL ontology syntactically, but computes cohesion metrics conceptually based on predefined OWL primitives which have explicit semantic meanings. Then a panel of eighteen evaluators were assembled to assess the set of ontologies to determine cohesiveness of these ontologies. The evaluators have 3 to 5 years average experience in software development, while thirteen of the evaluators had experience with knowledge based systems or knowledge representation. The remaining evaluators were provided ontology training before serving in an evaluation capacity. Each expert was given the set of ontologies and asked to assess the cohesiveness of each ontology. The experts rated the cohesiveness of each ontology on the following scale: * 0 = Low * 0.25 = Moderately * 0.5 = Average * 0.75 = High * 1.0 = Excellent First, we determined Interrater reliability. Interrater reliability determines how well the evaluators agree with one another on a particular evaluation. For example, one evaluator rates a particular ontology with 0.25, while another rates the same ontology with 1.0. Therefore, Interrater reliability addresses the consistency of the implementation of a rating system and is expressed as a real number in the range of [0.1]. We computed Interrater reliability from assessment of all eighteen experts and across all thirty three ontologies. We used MiniTab's Two-Way Mixed Effect Model and considered the people effect to be random and the measure effect to be fixed. For our case, the Interrater reliability is 0.9014, which indicates a consistent agreement between the evaluators. Next, we performed statistical analysis to check the correlation between the averaged evaluation ratings for each of the ontologies and the cohesion metrics. We used Pearson's correlation coefficient with the following hypotheses: H0: = 0 (There is no correlation between the metrics values and the evaluators' values) H1: 0 (There is correlation between the metrics values and the evaluators' values) The Pearson correlation coefficient measures the strength and direction of linear relationship between a pair of variables. The correlation coefficient assumes between -1 and 1. The larger absolute value of the correlation coefficient means stronger correlation between the pair of variables. And if the pair of variables are independent, the correlation coefficient between the two variables is 0. The following scales proposed by Cohen [18] and Hopkins [19]: * < 0.1 -trivial * .10 -.30 -minor * .30 -.50 -moderate * .50 -.70 -large * .70 -.90 -very large * .90 -1.0 -almost perfect Another quantitative measure for the correlation between a pair of variables is the p-value. P-values are used in hypothesis tests to either reject or fail to reject a null hypothesis. A small p-value indicates that a null hypothesis is false. Table 1 shows the Pearson correlation coefficients between cohesion metrics values and evaluators' values and p-values for the hypothesis that H0: = 0.

CONCLUSION AND FEATURE RESEARCH
In this study, we introduced cohesion metrics for measuring ontology cohesiveness which can help ontology developers and system developers better understand ontology structures and potentially could help estimate cost and maintenance for the ontology itself and also for the whole lifecycle of ontology-based software system. We have performed theoretical and empirical analysis of the ontology cohesion metrics #1, 2 and 3. The results from our theoretical studies indicate that the ontology cohesion metrics are theoretically valid. The results from our empirical studies indicate that a good correlation exists between evaluators' opinions of ontology cohesiveness and the cohesiveness measured by the cohesion metrics proposed by this study.
In the future, we will continue work on the ontology cohesion metrics and other ontology metrics. In this research, we mainly considered the subclass relationship of classes in ontologies. We may add more metrics to the ontology cohesion metrics set. Also, future research may include additional uses for the metrics and how the metrics can be used to effect ontology system development.