Transformation Box-Cox for Stabilisation of Diversity in Group Random Design

Corresponding Author: Yenny Sylvia School of Computer Science, Bina Nusantara University, Jakarta, Indonesia Email: djunsan2002@yahoo.com Abstract: In the field of research, there are a variety of statistical methods used. Hypothesis testing is one test that is often used to obtain the conclusions of the research conducted. One hypothesis testing that is often used is the testing of hypotheses through ANOVA (Box et al., 1978). However, in determining the validity of a hypothesis is obtained, there are assumptions that must be met, namely the assumption of homogeneity range between treatments (Box et al., 1978; Box and Cox, 1964; Gomez and Gomez, 1984; Guerrero, 1993; Mattjik and Made, 2006). Examination of a wide homogeneity assumption has been made in research involving plants, which use a randomized block design. Wide homogeneity assumption was tested using Bartlett test, but from the tests that have been done show no results were less consistent (Box et al., 1978; Box and Cox, 1964; Gomez and Gomez, 1984; Guerrero, 1993; Mattjik and Made, 2006). Transformation Box-Cox used to transform data that is not homogeneous (Box et al., 1978; Box and Cox, 1964). Calculation of the variance range and test computer-based transformation has been compiled using C # programming language.


Introduction
In general Research, will use statistical methods to analyze the data obtained. From the research data, will usually take a hypothesis and then will be continued to test this hypothesis. One method of this hypothesis testing can be done by Analysis of Variance (ANOVA). However, in determining the validity from the calculation of ANOVA, there are several assumptions that must be considered is the influence of additives, freedom of error/galat, homogeneity of variances and normal distribution error. So it is better if the existing data tested first according to the assumptions set because if the data is tested failed to meet one or more of the assumptions that have been mentioned, then this will affect the sensitivity of the results of the F test in the ANOVA. Therefore, any deviation of one or more of these assumptions must be corrected before continuing again into calculating ANOVA test (Box et al., 1978;Box and Cox, 1964;Gomez and Gomez, 1984;Guerrero, 1993;Mattjik and Made, 2006).

Problem Formulation
Issues that will be examined in this research is formulated as follows, (a) Bartlett test will be used to test the homogeneity variance range of the experimental data research plants, (b) Box-Cox transformation method is used to transformed existing research data that has properties of a wide heterogeneity among treatments were tested in experimental plant, wich will become a new data. Furthermore, the data that has been transformed to be used in the calculation of the F test in ANOVA, (c) To facilitate and speed up the calculation, it will be made an application program for the calculation of the transformation Box-Cox research data (Box et al., 1978;Box and Cox, 1964;Gomez and Gomez, 1984;Guerrero, 1993;Mattjik and Made, 2006).

Scope
In this research, the problem to be discussed is about the variance range of stability testing in a Randomized block design (RAK) and the use of transformations Box-Cox. Here is the scope of this research study, (a) The research data used in this study were taken from the Center for Research and Development of Agricultural Biotechnology and Genetic Resources, (b) The data will be used is the result of an experiment in a randomized block design in plants, (c) Research to be done is to test to examine whether there is homogeneity in the variance range of experimental data to be analyzed, (d) Statistical calculation method to be used is the method of calculation of a randomized block design ANOVA, Bartlett test, Box-Cox transformation method and Least Significant Different test (Box et al., 1978;Box and Cox, 1964;Gomez and Gomez, 1984;Guerrero, 1993;Mattjik and Made, 2006).

Objectives and Benefits
The purpose of this research is (a) To detect the presence or absence of homogeneity in the data range to be tested, so that the homogeneity of the test results obtained, it can be seen that the next step should be done that is necessary to transform the data or not, (b) Using statistical sciences in examining research data that has been there, in order to get better results and proper, (c) Creating a program that can help speed up the calculation of data analysis results. The benefits of this research are (a) For the researcher: (1) Can obtain information about the research data which has a wide homogeneity properties, (2) Can obtain results more accurate data analysis, (3) Can take a more informed decision, (4) Information systems developed to simplify and accelerate the researcher in the process of calculating the test data, (b) For authors and other readers: (1) Deepen the knowledge of a wide variance of homogeneity test and the Box-Cox transformation method, (2) Directly apply statistical sciences that have been studied, (3) Apply directly sciences programming that have been studied.

Research and Development Center for Biotechnology and Genetic Resources (BB-BIOGEN)
Vision of BB-Biogen is a leading center (center of excellence) in research and development of agricultura biotechnology and genetic resources that can support sustainable food security and agribusiness is highly competitive. BB-Biogen program developed and formulated based on the key issues facing conventional solution which is difficult or impossible. To achieve this mission specifically BB-Biogen is (1) Increase the quantity of Human Resources (HR) qualified in the field of agricultural biotechnology and genetic resources, (2) Manage and utilize agricultural genetic resources to support research in the field of biotechnology and plant breeding, (3) Develop a strong research program in the genetic improvement of plants and microbial properties, as well as components of agricultural cultivation technology with biotechnology approaches to technology and product BB-Biogen highly competitive, (4) Contribute to the development of national agricultural development with the development and dissemination of appropriate technologies to improve the competitiveness of Indonesian agricultural products in the national and global markets (Gomez and Gomez, 1984;Mattjik and Made, 2006).

Design of Experiments Definition
Definition of experimental design according to (Mattjik and Made, 2006) is a test or series of tests both use statistics and inferential statistics description which aims to transform the input variables into an output which is the response of the experiment.

Basic Principles of Experimental
In an experimental design, the data were analyzed statistically valid or invalid if said data obtained from an experiment that meets the three basic principles (Mattjik and Made, 2006), namely: (1) Replay, (2) Randomization, (3) Environmental control (Mattjik and Made, 2006).

Classification of Design Experiments
Broadly speaking, the experimental design can be classified as follows (Mattjik and Made, 2006): (1) The design of the treatment, (2) Draft Environmental, (3) Draft Measurement (Mattjik and Made, 2006).

Randomized Block Design (RAK) for Complete Group
According to Pratisto (2004), a randomized block design is a form of design that is widely used in various experiments, agricultural sciences, industry and others. This design is characterized by the presence of a number of groups in which each group is given the same treatments. Through groupings, expected treatment errors can be reduced. Thus it can be said that the purpose of grouping is to reduce the diversity of experimental units in each group, or in other words we seek homogeneity within a group. In RAK to note is that the grouping is not a new variable as well as treatment. Formation of groups based on specific criteria. Linear model of a randomized block design in general bewritten as follows (Mattjik and Made, 2006;Pratisto, 2004): Where: Y ij = Observation on the treatment of the i-th and j-th group µ = The average general τ i = Effect of treatment i-th β j = Effect of the j-th group ε ij = Random effect of the i-th treatment and the j-th group Hypotheses that can be tested from a randomized block design is the effect of treatment and the effect of grouping. Form of the hypothesis can be written as follows:

Effect of Treatment
(Treatment had no effect on the observed responses) at least one i where.

Effect of Gouping
(The group has no effect on the observed responses) at least one j where. To determine the effect of treatment was tested in the experiment, can be tested using Analysis of Variance (ANOVA) (Box et al., 1978;Mattjik and Made, 2006;Pratisto, 2004).
ANOVA table structure is given as follows (

Testing of Hypothesis
Fcount = KTP/KTG follow the F distribution with degrees of freedom for t-1 numerator and denominator degrees of freedom equal to (t-1) (r-1). If the value Fcount is greater, than the null hypothesis is rejected and the opposite applies. Fcount = KTK/KTG follow the F distribution with degrees of freedom for r-1 numerator and denominator degrees of freedom equal to (t-1) (r-1). If the value is greater than the null hypothesis is rejected and the opposite applies (Box et al., 1978;Mattjik and Made, 2006;Pratisto, 2004).

Coefficient of Diversity
Value of coefficient of variance showed that the level of accuracy compared with the treatment and a good index of the state of the experiment. It is expressed as a percentage of the average experimental error/galat, so that the greater the diversity coefficient, the lower the reliability of the experiment. where, t α distribution with degrees of freedom equal to (t) (r-1) (Box et al., 1978;Box and Cox, 1964;Gomez and Gomez, 1984;Guerrero, 1993;Mattjik and Made, 2006;Pratisto, 2004).

Variety Analysis (ANOVA) Definition
According to Walpole et al. (1995), analysis of variance is a method to describe the total diversity of data into components that measure various sources of variability. Analysis of variance was used to test some middle value simultaneously.

Assumptions Analysis of Variance
ANOVA calculation results will be valid if one or more mathematical assumptions about the data are met. Assumptions-The assumptions are (1) Effect of additives, (2) Freedom error, (3) Homogeneity range, (4) Normal spreads.

Homogenity Range
Wide heterogeneity can be classified into two types (Gomez and Gomez, 1984), namely: (1) Where the change is a variety of functions with the average relationship, (2) where there is no relationship between the range and the averaging function.

Variety Test Homogeneity (Bartlett Test)
Bartlett test is a test that can be used to test the homogeneity range.
Hypotheses to be tested are: Calculation of the Bartlett test using chi-squared distribution approach with k-1 degrees of freedom.
Calculation steps are as follows: (2,3026) Transformation Box-Cox (Box et al., 1978;Box and Cox, 1964) Data transformation is a common technique used to solve the measurement problem where a wide range of heterogeneity and Average have a relationship function. With the transformation, the original data is converted into a new scale and make a new group which is expected to eventually be fulfilled so that the stability range can approach the validity of the testing process. One of transformation that is used is the transformation of Box-Cox.
The general form of the transformation Box-Cox is: is a transformation of the Box-Cox with the y at specific variable lambda (λ). In the transformation of Box-Cox, heavily dependent on the value of lambda (λ) are used.
Therefore the value of lambda will be sought and will be raised on the original data to obtain new data that has been transformed.
Lambda value calculation can be done with two approaches, by using range and standard deviation.
With a variety of approaches (Guerrero, 1993): To find the value of a and b used formula (Walpole et al., 1995): Approach with a standard deviation (Box et al., 1978) To find the value of a and b used the formula: Software Engineering (Pressman, 2010) Software engineering is defined by the Fritz Bauer (Pressman, 2010) is the activity of applying and using the principles of good engineering in order to produce software that is economical, reliable and work efficiently. In software engineering, there are three main elements (Pressman, 2010), namely: (a) Process (b) Methods (c) Tools. In software engineering there are various kinds of models. Models are often commonly used is Waterfall model. This model is called the waterfall model for each running process must first wait for the completion of the process previously and run sequentially. In general, the stages in the waterfall model can be seen in the following (Fig. 1).
According to Pressman (2010), there are 6 stages in the waterfall model are (1) (Shneiderman and Plaisant, 2010) Five criteria that must be met by a system that is user friendly (Shneiderman and Plaisant, 2010) are (1)

Database
According Farthansyah, 2004, the database is one component of the system data base. The database consists of three things: The well-organized collection of data, relationships between data and objective. There are many options to organize the data and there are a lot of considerations in forming relationships among the data, but in the end the important thing is the main objective that we should always remember that the speed and ease of interacting with managed data/processed.

Microsoft Excel
Microsoft Excel is a powerful spreadsheet program that works under windows operating system. There are many conveniences that obtained during use microsoft excel as working with lists of data, calculate numbers, create reports, charts and graphs. Microsoft Excel can also be used to process statistical data. Input data from excel can also be connected and read the application that created the program.

State Transition Diagram
State Transition Diagram (STD) is used to describe the time dependence of the properties contained within the system. Components-the main component of the

STD are (1) State, (2) Transition, (3) Condition and Action:
Flowchart Is a flow chart diagram showing the sequence and relationships between processes and their instructions. This diagram is denoted by the symbol, so each symbol illustrates a particular process. Symbols used such as rectangular, rhombus and oval are used to express the operation, while the relationship between the process described by a hyphen. But an outline flowchart consists of three main parts, namely: Input, Process, Output. Flowchart compiled with symbols. This symbol is used as a tool to describe the process in the program.

Analysis of Issues
ANOVA test results obtained from the experimental design will be valid or invalid value if one of the assumptions that have been set. One such assumption is the homogeneity range and testing of these assumptions is a test that is used by the author. Of the assumptions of the test results obtained conclude whether or not the data is homogeneous. If it is not homogeneous, then there is a method to homogenize the data. One commonly used method is the method of transformation. Transformation is a writer who used the transformation method Box-Cox. The variables that are used to support these simulations is the value of the treatment and test scores of the experiment. By entering the data-data that has been available, it will obtain a homogeneous assay results, the value of lambda (otherwise homogeneous) which will be used to get the value of the new data, ANOVA and LSD values.

Data Collection
The data used are secondary data in the form of crop research data obtained from the Center for Research and development of agricultural Biotechnology and Genetic resources (BB-BIOGEN). Based on the research data will be retrieved value of the treatment and test scores.

Data Collection Method
The data and information required in this study were collected with the following steps (a) Research field, where the data and information gathered is obtained by direct observation in an institution that became the object of this research study. The research was done by (1) Observation, which conducted direct observation of activities in the field related to the problem under study, (2) Interview, where questions related to the data that is needed in this research study submitted to the company.

Data Analysis Techniques
Analysis techniques that will be used to analyze the data is to use a test of homogeneity of variance (Bartlett test) to test the homogeneity of the data and the transformation of the Box-Cox to get the value of lambda functions in getting new data. Stepsto process the data with the test and transformation Bartlett Box-Cox can be seen (1) Collect the data, (2) Visualisation variety of relationships and the average of the data in the form of scatter plots, (3) The data obtained were then tested homogeneity using Bartlett test manifold. Bartlett test statistic used in the test of chi-squared with degrees of freedom k-1. If the H0 is accepted, which means that the data is tested is homogeneous, (4) If the data is homogeneous, followed by calculation of ANOVA and coefficient of variance, (5) If the conclusion of F count in ANOVA rejected, followed by Least Significant Difference/BNT (LSD) test, (6) If data is not homogeneous, then continued with the transformation Box-Cox to find the value of lambda which serves to transform the old data into the new data, (7) Once the data is transformed, then proceed with the calculation of ANOVA, coefficient of variance and LSD test (if the calculated F value in ANOVA rejected) (Box et al., 1978;Box and Cox, 1964;Gomez and Gomez, 1984;Guerrero, 1993;Mattjik and Made, 2006;Pratisto, 2004).
Screen Design (Pressman, 2010;Shneiderman and Plaisant, 2010) The design of the main screen for calculation program consists of a screen with two tabs (Main and Box-Cox), this is so that users can easily use it. Function to be placed in the calculation of the main screen. There are 2 buttons ie browse and count. Serves browse button to locate and open the excel file desired by the user. Count function button to display the results of calculation and testing data that includes display scatter plots, ANOVA and all the conclusions from the data calculations (Fig. 2).
The display screen on the main tab and the main screen is the same. When a user opens the main screen, meaning users are in the main tab screen. Here the user will press the browse button to open the excel data is desired. After showing the data, if the user has not pressed the button count, so that users will see is just excelnya data display. To do as well as see the results of the calculation process, the user must press the calculate button. Calculation results are shown is a scatter plot of the data, the value of chi-square, the value of chi-squared table, conclusion which states homogeneous or heterogeneous, the coefficient of variability, the value of Least Significant Difference (LSD), the transformation of Box-Cox stating yes or no need for transformation, the value of lambda, ANOVA and conclusions. LSD/BNT will exit if the value in the ANOVA rejected the conclusion that F count>F. Lambda values and calculation results tab Box-Cox would come out if the results in the transformation of the Box-Cox is yes. Tab Box-Cox will work if the results of the transformation of the Box-Cox on the main tab stated. Display the data that is displayed on this tab screen is value-the value of the data that has been transformed and their averages, a scatter plot of data transformation, the data value of chi-squared transformation (χ 2 ), the value of chisquared table ((χ 2 table), conclusion which states homogeneous transformation or heterogeneous, the coefficient of variability of data transformation, the value of Least Significant Difference (LSD) of data transformation, transformation of data and conclusions ANOVA.  (Pressman, 2010;Shneiderman and Plaisant, 2010) BNT values will exit if the value in the ANOVA rejected the conclusion that the value F count>F.
Steps for application Program with algorithm as follow: (1) Start (2) (1) Experiments the effect of lime on the corn cob on the number and weight of seeds sterile corn, Amount of sterile Corn Cob, In Table  2, K0 treatment is the control treatment. Table 2, looks range from the data are divided into 2 groups where lime treatment K0, K1 and K2 has a great range of values, while the lime treatment K3, K4 and K5 has a small range of values. Lime treatment K0 has the largest range of variety and value, while the lime treatment K5 has the smallest value of the variety and range (Fig. 4).
From Table 3, it can be seen that the average number of cobs sterile because K0 lime treatment had average-the highest average, while the average treatment on the K5 has-the lowest average (Fig. 2).   /powder  Testing  treatment  ----------------of corn  I  II III Range Variation Average  K0  51 30  21  110,

Discussion for this Research
From Table 3 and 4, note that for the data before and after transformations, experimental effects of herbicide treatment manual, oksadiazon, benthonil and 2,4 D amine to weight weed/grass control with herbicide treatments, has a different herbicide treatments.

Conclusion
Discussion of the results of experiments that test 8 data examined using Bartlett test method, transformation Box-Cox, ANOVA and LSD test, it was concluded as follows (1) Bartlett test conducted mostly not successful enough to prove whether the range of experimental data is homogeneous or heterogeneous. This is evidenced by the 2 experimental (treatment insecticide against rice pests and the amount of lime treatment on the number of sterile corn cobs) were there, which is visualization, the data clearly seen not homogeneous, but the Bartlett test results indicate otherwise, (2) Transformation function Box-Cox went pretty well, because of the existing experiments in which the results of a variety of heterogeneous data, after the data is transformed into a homogeneous variety, (3) Programs that have been made provide analytical results quickly and accurately. The advice can be given of the results of this study are (1) To test homogeneity of variance would be better to try other methods such as test Neyman-Pearson to get better results, (2) Application programs can be developed further with the addition of other statistical calculations in order to be better.

Funding Information
The authors have no support or funding to report.

Author's Contributions
All authors equally contributed in this work.

Ethics
This article is original and contains unpublished material. The corresponding author confirms that all of the other authors have read and approved the manuscript and no ethical issues involved.