Application of Statistical Procedures for Analysis of Genetic Diversity in Domestic Animal Populations

Problem statement: A wide range of studies for the assessment of gene tic diversity in livestock breed were conducted using genetic distan ce. For high-accuracy and unbiased estimation sampling methods, criteria of choosing type of DNA markers, distance measurement strategies, cluster analysis will be important for any genetic diversit y projects. Approach: Main objective of this short review is focusing on application statistical proce dures and methods in analysis of genetic diversity data in animals. Results: There is no simple strategy to address for best an d effectively genetic diversity results by the way regarding to some impo rtant factors can make reliable results for next analysis. Conclusion: There is still a distinct need for developing compr ehensive and user-friendly statistical packages that facilitate an integrated analysis of different data sets for generating reli able information about genetic relationships, genome div ersity, and favorable allele variation. Equally important and perhaps more challenging, is the conc erted and planned utilization of genome information in animal breeding programs on the bas is of knowledge accrued from studies on genetic diversity.


INTRODUCTION
Genetic diversity: Livestock breeding is important strategy for supporting our future requirement for best response against different environments [12] . Genetic conservation also is significant powerful tools for keep long term genetic relationships of animals. An essential first step in management farm and wild animal genetic resource is recognizing of genetic diversity parameters for make any decisions [33] . Diversity can be defined in a number of different way can we recognizes difference of two individuals. Genetic diversity is a platform for future genetic response [18] . Considering genetic diversity in agriculture populations not only the capacity to evolve with changing environment but also the capacity of copy with changing market requirement [5] , Thus genetic diversity is seen as insurance against future changes [34] . The phenotypic difference are the result of genetic diversity and environmental difference more than a third of about 600 documented livestock breeds are under risk of extinction and up to two percent of the breed go extinct every year, thus one to two breed are lost per week [31] . The key question is which breeds should be chosen to assure the highest genetic diversity within species for the future [11] . The genetic composition of a population is usually described in terms of allele frequencies number of alleles and heterozygosity [24] . A wide range of studies for the assessment of genetic diversity in livestock breed were conducted using genetic distance [5] . For genetic distances the genetic difference between populations are assessed based on differences between allele frequencies at several loci [8] . Genetic distance is used to classify and elucidate the evolutionary relationship between populations such as species, which have been diverging for long period.

Genetic markers:
The largest part of animal diversity is hidden, because it is genetic diversity. Hidden genetic variation is even more extensive than that observed through the phenotype so much therefore it is virtually impossible for two individuals in a population to have the same genotype at all loci, this genetic variation can detected through molecular technologies [12] . Modern technologies used in genetics enable us to measure this type of variability. Molecular genetic markers can be used to examine a group of individuals or populations to estimate various diversity measures and genetic distance. In principle genetic diversity can be measured on the basis of polymorphic characters occurring at the different system (morphological, biochemical, protein), but DNA markers are very powerful tools for study of genetic diversity [12] . In practice, there is very litter information on the population, reproduction, adaptation and disease resistance potential of the most livestock breed, in this situation genetic information can provide valuable estimates of genetic diversity within and between populations. With regard to genetic diversity studies, molecular markers can be subdivided into two categories [15] . The first category comprises the allelic informative or codominate markers, such as microsatellite marker (SSRs) and Restriction Fragment Length Polymorphism (RFLP). The second category comprises the non-informative or dominates markers such as Amplified Fragment Length Polymorphism (AFLP) and Random Amplified Polymorphic DNA (RAPD). Genetic loci used in genetic distancing should be informative, manning they should display sufficient polymorphism, for a correct estimation of genetic distances [19] . It is important number of allele for loci for example about SSR markers; this should have at last 4 different alleles [9] . With regard to genetic diversity studies, microsatellite markers are very interesting tools because of their codominate mode of heritance, their high degree of reproducibility, their high level of polymorphism and therefore their high discriminative power.

Sampling:
Optimal sampling strategies will support to next reliable results. Sampling is part of statistical practice concerned with the selection of individual observations individual observations intended to yield some knowledge about populations [17] . Basically, sampling strategies of animal population would be very difficult because various factors including total population size, migration, inbreeding, selection will affect reliability of next following results. Since sampling methods are begin point of any genetic diversity investigations thus understanding of statistical methods which can allow more reliable results will perfectable. Structure of our population, mating system, number of allele per locus, frequency of alleles per locus are some of factors for consideration for The sampling frame must be representative of whole population. With any form of sampling, there is a risk that the sample may not adequately represent the population but random sampling enable an appropriate sample size to be chosen. There are two types of random variables: categorical and numerical, categorical random variable yield response such as present and absent. Numerical response is such as your height in centimeters [30] . Usually, however, the true genotype frequencies are not known, estimation of minimum sample size for detecting alleles in population is very important when our population show complete hemozygosity it means that we need minimum sample size because most of new allele can be found per homozygous individual. When allele frequencies in the population are know and also population is under HW equilibrium, in the other hand, if the alleles are randomly associated in the genotype, minimum sample size is equal to one half the minimum sample sizes than complete hemozygosity. Generally, N = 25 sampled animal are taken to be a minimum requirement, with that 2N = 50 drawing of alleles per locus are performed, which should give a reasonably reliable estimate of allele frequencies [30] .
Genetic diversity of sample can be described by quantifying allelic richness and allelic evenness of the sample [18] .

Important parameters in genetic diversity analysis:
Quantification of genetic diversity depends on some parameters: Hardy-Weiberg equilibrium Test (HWT), Polymorphism, Average number of alleles per locus Effective number of alleles, Average expected Heterozygosity (He), Shannon index.

Hardy-Weiberg equilibrium test (HWT):
Hardy-Weiberg equilibrium explains that the both gene and genotype frequencies will be constant from generation to subsequent next generations [13] . Hardy-Weiberg assumption is under following consideration: Diploid, sexual reproduction, Random mating, no selection, no mutation and no immigration [14] . Deviation from HWT indicates that one or some of mentioned factors make disequilibrium from this test. Chi-square test is useful for determining whether the allelic frequencies are in HW equilibrium. The statistical test follows this formula [7] : If X 2 cal ≤ X 2 tab then H 0 hypothesis is accepted, it mean that allele frequencies for loci in a given population are HWT equilibrium, if X 2 cal≥X 2 tab then H 0 hypothesis is rejected [10] .
Polymorphism: A polymorphic gene is usually defined as one for which the most common alleles has a frequency of less than 0.95. Genetic loci use in genetic distance should be informative, meaning they should display sufficient polymorphism, for a correct estimation of genetic distance it is important number of allele for loci, for example about SSR markers, this should have at least 4 different alleles. This parameter has best application in codominate markers.

Effective number of alleles:
The measure explain about the number of alleles that would be expected in a locus in each population: where, p a 2 is the frequency of the a th of k alleles. By taking allele frequencies into account, this descriptor of allelic richness is less sensitive to rare alleles. This parameter also play fundamental role for verification of our sampling strategy. If the figure obtained the second time is less than the first estimated number. This could mean that our sampling strategies need revising.

Average expected Heterozygosity (H e ):
Average expected heterozygosity is the probability that at a single locus a diploid organism any two alleles, chosen at random, are different from each other. It is an indicator of genetic diversity in a population: Range of this parameter from 0-1 and it is maximized when there are many alleles at equal frequency. [32] : The measure explain about gene diversity, when Shannon index is near 1 then we can conclude that heterozygosity in our population is high and also we can compare Shannon index when it calculated for two loci, if one primer was higher amount of Shannon Index than other primers, it means that primers is suitable for genetic diversity studied in that breeds or populations. [31] : fixation indices F is , F st , F it were used to analyze of partitioning of genetic variation, fixation indices are measures of standardized variances in allele frequencies that detect departure from HWT caused by biased inbreeding, biased outbreeding or population subdivision and drift. The subscript I, S and t refer to individual, subpopulation and total population. The F statistics is a measure of the difference between the mean heterozygosity among the subdivision is a population and potential frequency of heterozygote if all members of population mixed freely and none assertively.

F-statistics
F is : F is detects inbreeding individuals relative to subpopulation (within individual within populations). This parameter can range from -1 to 1 indicating maximal inbreeding and outbreeding respectively. A positive F is value indicates inbreeding as the observed heterozygosity is lower than the expected heterozygosity: ∑∑ d E between studies is difficult and rang of this value between zero and √2 m. Where p ila the frequency of allele à at locus L for individual P jla . The frequency of allele à at locus L for individual j, m the number of loci and K. the number of alleles of alleles at locus L.
Various genetic distance measures have been proposed for analysis of molecular marker data, for example, we can use these distances for analysis of SSR, RAPD, AFLP, PBR DNA markers studies. For dominate markers, the total number of bands is conventionally set as the number of analyzed loci. For codominate markers, genetic similarity between two individuals number of alleles per locus determined for total collection, is in general higher than two, Opposite to the 1-and 0-allele for dominant markers. Generally, genetic distance in codominate markers are based on allele frequencies.
If we assume that a = 3, b = 1, c = 3 and d = 2 then: The Jaccard coefficient only count bands present for either individual, double absences are treated as missing data. If false-positive or false negative data occur, the index estimate tends to be biased. Nei and Li coefficient counts the percentage of shard bands among two individuals and gives more weight to those bands they are present in both.
In other hand, Nei coefficient puts more weight to shared bands than the coefficient of Jaccard. When our population is line, Nei and Jaccard coefficient lead to identical ranking, but in hybrid population, it seem that result will be different.
Clustering methods: Cluster analysis is the grouping of objects into different categories or class based on similarities between items in order to minimize variation within and maximize variation between categories [1] . For cluster method, we must consider that what kind of reproduction system we have in population and also we must know about levels of heterozygosity and which genetic characters we want to analysis [18] . Three main clustering methods are about: • Nearest neighbor (simple matching) • Furthest neighbor • Unweighted Pair Group Method using Arithmetic Average (UPGMA) SM consider absence corresponds to homozygous loci, it can be used with dominate marker (RAPD, AFLP) because absence could corresponds to homozygous recessives. UPGMA is most commonly method for cluster analysis, UPGMA can only be used when the evolutionary rate is nearly same for all groups included in the study, when studying the genetic diversity of germplasm collection, SM method should be preferred above the UPGMA clustering method, because genetic difference among accessions in germplasm are dominantly determined by selection and breeding rather than by evolutionary forces.

Validation of a single cluster:
Resampling is a term used in statistics for bootstrapping and permutation [6] these procedures can be used in genetic diversity studies to assign confidence to the presence of clusters in a dendrogram.
Bootstrapping is a statistical method for estimating the sampling distribution of a estimator by sampling with replacement from the original sample [4] , major purpose of bootstrapping is deriving robust estimates of standard errors and confidence intervals of population parameters.
A permutation test is type of statistical significant test in which a reference distribution is obtained by calculating all possible values of the test statistic under rearrangements the tables on the observed data points.
Molecular data analysis software: Many software programs for molecular population genetics studies have been developed for personal computer [16] . Four important software for analysis of population genetics are TFPGA, Arlequin [3] , GENEPOP and POPGENE by using these software we can calculate observed and expected heterozygosity, percent polymorphic loci, Hardy Weinberg test, Nei distance and UPGMA clustering methods. TFPGA, Arlequin and population program are available in windows environment GEEPOP can used DOS operation system.

CONCLUSION
There is still a distinct need for developing comprehensive and user-friendly statistical packages that facilitate an integrated analysis of different data sets for generating reliable information about genetic relationships, genome diversity, and favorable allele variation. Equally important, and perhaps more challenging, is the concerted and planned utilization of genome information in animal breeding programs on the basis of knowledge accrued from studies on genetic diversity.