Initial Hybrid Method for Analyzing Software Estimation, Benchmarking and Risk Assessment Using Design of Software

Problem statement: Estimation models in software engineering are used to predict some important attributes of future entities such as dev elopment effort, software reliability and programme rs productivity. Among these models, those estimating software effort have motivated considerable research in recent years. Approach: In this study we discussed an available work on th e effort estimation methods and also proposed a hybrid method for effort estimation process. As an initial approach to hybrid technology, we developed a simple approach to SEE based on use case models: The "use case point's method". This method is not n ew, but has not become popular although it is easy to understand and implement. We therefore investiga ted this promising method, which was inspired by function points analysis. Results: Reliable estimates can be calculated by using our method in a short time with the aid of a spreadsheet. Conclusion: We are planning to extend its applicability to est imate risk and benchmarking measures.


INTRODUCTION
The planning, monitoring and control of software development projects require that effort and costs be adequately estimated. However, some forty years after the term "Software Engineering" was coined [27] , effort estimation still remains a challenge for practitioners and researchers alike. There is a large body of literature on software effort estimation models and techniques in which a discussion on the relationship between software size and effort as a primary predictor has been included [1,2,4] . They conclude that the models, which are being used by different groups and in different domains, have still not gained universal acceptance [19] . As a role of software in the society becomes larger and more important, it becomes necessary to develop a package which is used to estimate effort within a short period. In order to achieve this goal, the entire software development processes should be managed by an effective model [16] . So, our proposed model will be focusing on three basic parameters: (1) software estimation, (2) benchmarking, (3) risk Assessment [32] . So far, several models and techniques have been proposed and developed [6,9,12] and most of them include "Software Size" as an important parameter [23] . Figure 1 shows the application of software engineering principles and standards in medium sized organizations. in very small enterprises [6,9,12] Use case Model can be used to predict the size of the future software system at an early development stage to estimate the effort in the early phases of software development; use case point method has been proposed [13,20] . Use case Point Method is influenced by the Function Points Methods and is based on analogous use case point [11,22] .
We have been involved in the activity of developing a hybrid model to estimate the effort in the early phase of software engineering development [20,24] . This study describes the method of introducing use case points method to software projects for estimating effort. The study also describes the automatic classification of actors and use cases in the UCP model rather than doing it manually. The result of this study will be taken as a base for developing a hybrid method which will be used for bench marking and risk assessment [32] .
Problem framework: Our understanding of the effortestimation problem arises from the idea that any software project is the result of a set of business goals that emerge from a desire to exploit a niche in the marketplace with a new software product. Take, for example, the development of an application server that caters to on-demand software. The business goals of having a robust, high-performance, secure server lead to a set of architectural decisions whose goal is to realize specific quality-attribute requirements of the system (e.g., using tri-modular redundancy to satisfy the availability requirements, a dynamic load-balancing mechanism to meet the performance requirements and a 256 bit encryption scheme to satisfy the security requirements). Each architecture A that results from a set {Ai} of architectural decisions has a different set of costs C{Ai} (Fig. 2). The choice of a particular set of architectural decisions maps to system qualities that can be described in terms of a particular set of stimulus/response characteristics of the system {Qi}, i.e., Ai -> Qi. (For example, the choice of using concurrent pipelines for servicing requests in this system leads to a predicted worst-case latency of 500 ms, given a specific rate of server requests.) The "value" of any particular stimulus/response characteristic chosen is the revenue that could be earned by the product in the marketplace owing to that characteristic. We believe that the software architect should attempt to maximize the difference between the value generated by the product and its cost. Related work: Until today, several researches [7,8] and case studies have been reported about the use case point and effort estimation based on Use Case Model [20] . Smith proposed a method to estimate Line of code from use case diagram [21,22] . Arnold and Pedross reported the Use Case Method can be used to estimate the size of the software [26] . They also suggested that Use Case Point Method should be used with other estimation method to get the optimum result.

Limitations of function points:
Function Point is a measure of software size that logically measures the functional terms and the measured size stays constant irrespective of the programming language and environments used [15,22] . In Function Point, it is very much essential to use the detailed information about the software. Such detailed information will be available in software design specification. Function Point metric evaluation is difficult to estimate for software which has short development time [11,25] . So, in reality estimation of software at the earlier phase of the development life cycle process will certainly reduces risk. To estimate the effort in the earlier phase of the development life cycle process, use case point method has been proposed [20] .

MATERIALS AND METHODS
Use case model: The first and the foremost step are to calculate Use Case Point (UCP) from use case model [20] . The use case model mainly consists of two documents, system or sub system documents and use case documents contains the following description of items: system name, risk factors, system-level use case diagram,, architecture diagram, subsystem descriptions, use case name, brief description, context diagram, preconditions, flow of events, post conditions, subordinate use case diagrams, subordinate use cases, activity diagram, view of participating classes, sequence diagrams, user interface, business rules, special requirements and other artifacts [14] . From the above specified information we are going to focus mainly on two parameters system-level use case diagram and flow of events. System-level use case diagram includes one or more use case diagrams showing all the use cases and actors in the system [14] . Figure 3 shows an example of system level use case diagram for "ATM systems": • A session is started when a customer inserts an ATM card into the card reader slot of the machine • The ATM pulls the card into the machine and reads it • If the reader cannot read the card due to improper insertion or damaged stripe, the card is ejected, an error screen is displayed and the screen is aborted • The customer is asked to enter his/her PIN and is then allowed to perform one or more transactions, choosing from a menu of possible types of transaction in each case Counting use case point: Intuitively, UCP is measured by counting the number of actors and transactions included in the flow of events with some weight. A transaction is an event that occurs between an actor and the target system, the event being performed entirely or not at all. But, in our method the effort estimation is calculated by applying the following procedure.

Procedure 1: Counting actors weight:
The actors in the use case are categorized as simple, average or complex. A simple actor represents another system with a defined API. An average actor is either another system that interacts through a protocol such as TCP/IP or it is a person interacting through a text based interface. A complex actor is a person interacting through a GUI interface. The number of each actor type that the target software includes is calculated and then each number is multiplied by a weighting factor shown in Table 1. Finally, actor's weight is calculated by adding those values together.

Procedure 2:
Counting use case weights: Each use case should be categorized into simple, average or complex based on the number of transactions including the alternative paths. A simple use case has 3 or fewer transactions, an average use case has 4-7 transactions and a complex use case has more than 7 transactions.  Then, the number of each use case type is counted in the target software and then each number is multiplied by a weighting factor shown in Table 2.
Finally, use case weight is calculated by adding these values together.

Procedure 3:
Calculating unadjusted use case points: It is calculated by adding the total weight for actors to the total for use cases (Fig. 4).

Procedure 4: Weighting technical and environmental factors:
The UUCP are adjusted based on the values assigned to a number of technical and environmental factors shown in Table 3 and 4.

Method:
Each factor is assigned a value between 0 and 5 depending on its assumed influence on the project. A rating of 0 means the factor is irrelevant for this project and 5 means it is essential.

Calculation of TCF:
It is calculated by multiplying the value of each factor (T 1-T 13 ) in Table 3 by its weight and then adding all these numbers to get the sum called the T factor. Finally, the following formula is applied: Calculation of environmental factor: It is calculated accordingly by multiplying the value of each factor (F 1-F 8 ) in Table 4 by its weight and adding all the products to get the sum called the E factor. Finally, the following formula is applied: Research method: Based on the proposed method, we have planned to develop a framework [3] as an automated tool under the name (Hybrid tool). The input is a XMI file. The tool is implemented in JAVA and Xerces 2 Java parser is used to analyze the model file [30] .

An automated tool for estimating use case point: Overview:
In order to effectively introduce use case point method to the software development, we have decided to create a use case point measurement tool. There were several existing tools available which is based on use case model but in all these existing models,   it is necessary to judge the complexity of actors and use cases by manually (Fig. 5). The judgment is the most important part in software cost estimation so we have decided to create an automated tool. So, in order to obtain the entire procedure automatically, it is mandatory to describe a set of rules to classify the weight for actor and use case. Also, it is necessary to write the Use-Case Model in machine-readable format. So, we assume that the use case model is written in XMI (XML) Metadata Interchange [30] . The reason for choosing this type of file format is because most case tools for writing UML diagrams support to export them as XMI files [30] .

Rules for weighting actors:
The weight for each action is determined by the interface between actor and the target software. But, the interface information will not be available in the actor description. Only the name of the actor will be available. So, it is very much essential to create a protocol which determines the complexity of actor.
Step 1: Classification based on actor's name: At the initial stage of the classification we are going to determine whether the actor is a person or an external system based on the name of the actor. That is, beforehand, we prepare the list of keywords which can be included in the name of the software system.
For example the keywords "system" and "server" are used in the system's name.
Keywords for step 1 (KL a ): System, server, application, tool We are planning to initially start the automated tool with a minimal set of keywords. As on later stages, the new keywords will be updated automatically and can be used for later projects.
Step 2: Classification based on keywords included in use case: Here, we are going to classify based upon on the flow of events to which the actor is relevant. As an initial stage, we are planning to develop a three set of keywords to each complexity factor of actor and then, we will try to extract all words included in the flow of events and then match them with each keyword in the lists. Finally, the actor's weight is assigned as the complexity for the keyword list that is most fitted to the words in the flow of events: Keywords for average actor (system) (KL aas ): Message, mail, send Keywords for average actor (person) (KL aap ): Command, text, I/P, CUI Keywords for complex actor (KL ca ): Press, push, select, show, GUI, window Keywords for simple actor (KL sa ): Request, send, inform Step 3: Classification based on experience data: Suppose, if we are unable to determine the actor's weight at step 2, we determine it based on the experience data. The experience data includes the information about the use case model and the use case point developed in the past software projects.

Rules for weighting use cases:
The complexity of use case is determined by the number of transactions. So, we have decided to focus on the flow of events in the use case model. The simplest way to count the transaction is to count the number of events. There are no standard procedures or protocols to write the flow of events and it is also quite possible that several transactions are described in one event. So, because of this limitation several guidelines to write events in use case model have been proposed [14] . There are ten guidelines to write a successful scenario. Among them, we focus on the following two guidelines: Include a reasonable set of actions (9) Jacobson suggests the following four pieces of compound interactions should be described: • The primary actor sends request and data to the system • The system validates the request and the data • The system alters its internal state • The system responds to the actor with the result So, based on the above said guidelines, we propose the way to analyze the events using the morphological analysis and syntactic analysis. Through these analyses, we can get the information of morpheme from the statement and dependency relation between words in the statement. We conduct the morphological analysis for all statements and get the information of the subject word and predicate word for each statement.
Then, we apply the following rules: Rule U-1: We regard each set of the subject and predicate word as a candidate of a transaction (10) Rule U-2: Among the candidates, we identify the one that related to actor's operation and system (11) response as a transaction For each use case, we have to apply the above said rules and based on these rules, we get the number of transactions. Then, based on the number of transactions we determine the complexity of each use-case.

RESULTS
In order to evaluate the usefulness of the automated tool, we applied it to actual use case models developed in software companies. We collected use case models from five software projects where middle-size application programs were developed [14,18] . All use case models were developed on a UML Design tool "Describe" [35] . In the evaluation, we focused in the results of the automatic complexity classification of actors and use cases. So, we compared the measurement results calculated by our tool and ones calculated by a specialist of use case point counting.

DISCUSSION
Here, we discuss the following points: validity and the limitation of our results:

Description of events:
The use case models that we have used in the model were constructed by the engineers who have some experience of writing use case models. So, actually, events descriptions of use case were mostly satisfied with the guidelines described in [2,13] . So, in order to confirm the applicability of the automated tool we have to apply it to more use case models developed by many engineers who have various experience in the actual projects. Also, it would be very much essential to prepare formal guidelines how to write use case models to effectively use the automated tool in companies.

Language:
The input use case models to the automated tool must be written in English.

CONCLUSION
This study has proposed an automated Hybrid tool which calculates Use Case Points from Use Case Models in XMI files [30] . We will use the effort estimation based on this Hybrid Tool in the hybrid technology proposed for risk assessment and benchmarking. We will also extend this technique for developing an automated tool for assessing risk and benchmarking.