Assessment of Genetic Diversity among Elite Breeding Lines of Arabica Coffee ( Coffea arabica L.) in Ethiopia using Simple Sequence Repeats Markers

: Despite being the center of origin and genetic diversity, there has been limited application of molecular markers in arabica coffee breeding in Ethiopia. In this study, the extent of genetic diversity and relationships of 48 elite breeding lines of arabica coffee from different geographical origins were evaluated using 14 SSR markers. The SSR markers amplified 105 alleles of which 104 are polymorphic with an average of 7.4 polymorphic alleles per locus. The average percent of polymorphism and Polymorphic Information Content (PIC) of the SSR markers were 98.81 and 0.79, respectively. The average genetic similarity coefficient among all possible pairs was 0.37 and ranged from 0.39 to 0.46 within the same geographical origin. The Unweighted Pair Group Method Using Arithmetic averages (UPGMA) cluster analysis placed the studied breeding lines into seven clusters with Jaccard’s similarity coefficient of <0.44. The results of this study demonstrated the powerfulness of the SSR markers and the presence of a high level of genetic diversity and wide genetic dissimilarity within and among elite breeding lines from the same and different geographical origins. The observed diversity could be utilized in the breeding program to improve productivity and maintain the typical quality profile of origin-based landrace coffees. The SSR markers could be used in future genetic analysis of arabica coffee germplasm to increase the efficiency of the breeding program in Ethiopia. The studied breeding lines could also be conserved and managed as an important source of genetic variability to be used in the future coffee improvement program.


Introduction
The beverage quality, aromatic characteristics and low caffeine content of arabica coffee make it the most preferred over other coffee species in the global coffee market.In addition, the ever-increasing consumers demand for differentiated and high-quality coffee contributed to the double-digit growth of the specialty coffee market (Giovannucci et al., 2008).This necessitates the coffee breeding program elsewhere to focus on developing productive and genetically superior quality coffee varieties with specific flavors for the sustainability of the market (Montagnon et al. 2012).Ethiopia has a great potential to supply such differentiated high-quality coffee, due to the presence of immense genetic diversity and different geographical regions that produce arabica coffee with unique flavors recognized in the international trade (Gallu, 2006;Labouisse et al., 2008).However, quality was not such a major area of focus of the coffee breeding program until very recently.To exploit all the available opportunities of agro-ecological and genetic diversity, a new coffee breeding strategy known as 'Local coffee landrace variety development program' was designed and adopted recently by the coffee breeding program of Jimma Agricultural Research Center (JARC) in Ethiopia (Bellachew and Labouisse, 2007).The new strategy aimed at developing improved coffee varieties that possess a typical quality profile independently for each geographical region using coffee germplasm accessions collected from and evaluated in the respective major coffee-producing areas.
In effect, a large number of coffee accessions were collected and planted in field genebank of different research sub-centers of JARC established in major coffeeproducing geographical regions (Labouisse et al., 2008;Benti, 2017).Eventually, the evaluation of some batches of the germplasm collections, as per the step-wise procedures described in the new breeding strategy (Bellachew and Labouisse, 2007), has resulted in the selection of 48 elite/promising breeding lines for Limmu (11), Illubabor (13), Wollega (12) and Bale ( 12) geographical regions that are among the areas representing primary coffee-producing regions in Ethiopia.These regions are also known to produce arabica coffee with a unique quality profile.Currently, the elite breeding lines are under the final breeding stage (verification trial) in the respective regions to confirm their performance before recommendation for release as a new pure-line landrace coffee variety.
It is well known that a better understanding of the extent of genetic diversity of breeding materials could help in the efficient management and utilization of genetic resources in the breeding program.It has also been recognized that molecular markers-based genetic diversity study, among others, provides useful information that helps to understand the extent of genetic diversity of available genetic resources and improves selection efficiency to maximize genetic gain (Tesfaye et al., 2014).Among DNA markers, Simple Sequence Repeats (SSRs) markers are considered ideal to study genetic diversity and relationships among closely related crop species such as arabica coffee due to their abundance, hyper-variability, multi-allelic and co-dominant nature (Powell et al., 1996).Comparative studies using RFLP, RAPD, AFLP and SSRs have also indicated that SSRs markers detect a higher level of polymorphism than other markers (Powell et al., 1996;Dessalegn et al., 2009).Simple sequence repeats markers were also found to be efficient in discriminating within and among selected varieties of arabica, robusa and liberica coffee species (Santos et al., 2016;Pruvot-Woehl et al., 2020;Montagnon et al., 2021).Emanuelli et al., (2013), also confirmed the high discriminating capacity of SSR markers in grapevine varieties as well as being as efficient as SNPs to establish the genetic diversity of the evaluated varieties.In another study, SSR markers were also found to be more effective than SNP markers when the objective was mainly focused on the study of genetic diversity (Singh et al., 2013).Moreover, the high level of polymorphism in SSRs also makes it a dependable, practical and cost-effective choice (Hodel et al., 2016).
Previous studies on coffee using various molecular markers reported the narrow genetic base and the low level of genetic diversity in ex-situ conserved arabica coffee germplasm accessions, cultivated varieties and elite breeding lines of arabica coffee outside of Ethiopia (Anthony et al., 2002;Maluf et al., 2005;Missio et al., 2011;Geleta et al., 2012;Al-Murish et al., 2013;Scalabrin et al., 2020).On the other hand, moderate to a high level of genetic diversity was reported among arabica coffee germplasm accessions established in ex-situ field genebank in different countries including Ethiopia (Anthony et al., 2001;Moncada and McCouch, 2004;López-Gartner et al., 2009;Mekbib et al., 2022) and specifically in forest coffee populations and farmers' cultivars in Ethiopia (Aga et al., 2005;Al-Murish et al., 2013;Tesfaye et al., 2014) using various DNA markers.
Although outstanding achievements have been recorded following the conventional breeding approach, the lack of application of molecular markers has long been recognized as one of the research gaps in the coffee breeding program in Ethiopia (Labouisse et al., 2008).Among the very limited works done so far, (Teressa et al., 2010) reported the presence of a high level of genetic diversity in 57 coffee accessions obtained from the research plot of the main breeding program using SSR markers.Similarly, SSR markers revealed a high level of genetic diversity among 40 commercial arabica coffee varieties developed from the coffee breeding program in Ethiopia (Benti et al., 2021).However, the extent of genetic diversity and relationships within and among the aforementioned elite breeding lines selected for the respective geographical regions has not been studied using DNA-based molecular markers.As new varieties are continuously developed from any breeding program, assessment of genetic diversity among elite breeding lines and selection of genetically divergent genotypes is very crucial for the development and release of improved pure line varieties and exploiting hybrid vigor in any crop including coffee (López-Gartner et al., 2009).Moreover, analysis of genetic diversity among genetic resources obtained from different geographical origins is vital to understand the level of diversity present within the same and among different regions that would help to design appropriate breeding and conservation strategies.Therefore, in the present study, SSR markers were used to detect polymorphism and generate information on the level of genetic diversity and relationships within and among 48 elite breeding lines (upcoming new varieties) of arabica coffee originating from different geographical regions.

Genetic Materials and DNA Extraction
A total of 48 elite breeding lines selected from four geographical regions were used in this study.The detailed information is summarized in Table1.These breeding lines are pure-line selections advanced from different batches of germplasm collections based on coffee production areas where the local landrace variety development is targeted.The DNA of each breeding line was extracted from silica-gel dried young leaves of a single plant following a modified version of the CTAB method (Borsch et al., 2003).The DNA purification, quality, concentration and dilution were performed as described by (Benti et al., 2021).

SSR Markers and PCR Amplification
In this study, 14 SSR primer pairs, labeled with 6FAM, NED, PET, or VIC fluorescent dye at the 5′-end of the forward primers, were used (Table 2).The SSRs markers were selected based on their high polymorphism and discrimination capability reported in previous studies (Combes et al., 2000;López-Gartner et al., 2009) and by Institute for Research and Development (IRD), France (personal communication).The annealing temperature of each primer, PCR amplification, SSR multiplexing and allele calling was performed as described in detail by (Benti et al., 2021).

Genotyping and Data Analysis
The allele peaks were visually inspected and then analyzed using GeneMapper Software 4.0 (Applied Biosystems) based on the GeneScan™ 500 LIZ® Size Standard.Each peak was considered as an allele of an appropriate microsatellite locus according to size (bp) and area of the peak.Accordingly, every allele of each primer across all breeding lines was scored and used for the analysis.To calculate the basic genetic parameters, scored alleles were formatted to present (1) or absent (0) for polymorphic marker loci (Medini et al., 2005).Number of total (Na) and polymorphic alleles (Pa) per locus and rate of polymorphism (Pr) were calculated as described by (Morgante et al., 1994).The Polymorphism information content (PIC) values were also determined based on allelic frequency using PowerMarker version 3.25 (Liu and Muse, 2005).A binary data matrix was generated from the allelic data.Pair-wise genetic similarity was also estimated between the breeding lines using Jaccard's similarity coefficient (Jaccard, 1908).The resulting similarity matrix was used for cluster analysis and the relationships among breeding lines were displayed as a dendrogram constructed based on the UPGMA method (Sneath and Sokal, 1973).Correlation between the underlying similarity matrices and the relationships between breeding lines in the dendrogram, the cophenetic correlation coefficient (r), was computed with 1000 permutations.The significance of the cophenetic correlation was also tested by the Mantel correspondence test (Mantel, 1967).All analyses were performed using NTSYS-PC 2.11 software (Rohlf, 2000).List of primer pairs used to amplify the SSR loci of the elite breeding lines of arabica coffee along with their annealing temperature (Ta)

SSR Polymorphism
The genetic parameters (measures of genetic diversity) calculated for the 14 SSR markers are shown in Table 3.A total of 105 alleles, of which 104 are polymorphic were amplified by the markers across the evaluated breeding lines.The allelic richness of the SSR markers varied from 3 (AJ-250255) to 10 (AJ-250260), with an average of 7.5 total and 7.4 polymorphic alleles per locus.The Polymorphism rate (Pr) ranged from 83.3% to 100%, with an average of 98.8 percent.The PIC values also ranged from 0.26 (Sat-180) to 0.94 (AJ-250260), with an average of 0.79 per locus.

Genetic Similarity
The differences among the evaluated breeding lines at the DNA level were determined by comparing the genetic similarity.The Jaccard's similarity coefficient values among all possible pair-wise combinations ranged from 0.15 to 0.78, with an overall mean of 0.37 (data not presented).In general, 92% of the total pairwise combinations exhibited genetic similarity values ranging from 0.15 to 0.50.Moreover, the genetic similarity coefficient between elite breeding lines from the same geographical origin was also evaluated independently.Accordingly, the similarity coefficient values within elite breeding lines from Limmu, Ilubabor, Wollega and Bale origins ranged from 0.20 to 0.78 with average values of 0.42, 0.39, 0.46 and 0.45, respectively (Table 4A-4D).In the same order, 78.2, 89, 68.2 and 79% of the pair-wise combinations among individuals of the same origin were < 0.50.

Clustering Patterns
The Jaccard's genetic similarity-based cluster analysis of the breeding lines revealed seven clusters with genetic distance values ranging from 0.56 to 0.73 (Fig. 1).The cophenetic correlation value between the dendrogram and the original similarity matrix was 0.88 with correspondence being significant at p = 0.05 by the Mantel test.Cluster I consisted of five genotypes from Bale and one from Limmu origin.Cluster II comprised five genotypes originating from Illubabor group.Similarly, Cluster III contained only two genotypes from Illubabor.
Cluster IV contained 18 genotypes of different geographical origins viz., Limmu (eight), Bale (six) and Illubabor (four).This cluster divided in to three sub clusters where some of the breeding lines from Limmu, Bale and Illubabor were grouped closely to each other in the sub clusters.Cluster V consisted of 11 genotypes, all of which were from Wollega.Cluster VI contained only two genotypes each were from Bale and Wollega.The last cluster (VII) consists of four genotypes that belong to Illubabor (two) and Limmu (two).

SSR Polymorphism among Elite Breeding Lines
The SSR markers used in this study were found to be highly polymorphic (Table 3).Except for primer set Sat-180 which showed a moderate (0.26) PIC value all the remaining 13 SSR markers detected high (>0.50)PIC values indicating high discriminating power of the markers.Moreover, none of the evaluated breeding lines exhibited the same profile across the 14 loci assessed.The results demonstrate the potential of the markers for use in genotyping, quantification of genetic diversity and differentiating between elite breeding lines by fingerprinting.This agrees with previous results of several authors who reported the successful use of SSR markers in diversity analysis of different genetic categories of Coffea arabica (Teressa et al., 2010;Tiago et al., 2017;Pruvot-Woehl et al., 2020;Benti et al., 2021;Montagnon et al., 2021).The mean values of genetic parameters exhibited by the SSR markers were also high, demonstrating the presence of a high level of genetic diversity among the elite breeding lines.Similar results were reported in commercial verities of arabica coffee in Ethiopia, with mean values of number of polymorphic alleles (7.4) per locus, rate of polymorphism (98.1%) and PIC (0.80) using the same sets of SSR markers (Benti et al., 2021).In agreement with our study, (Pruvot-Woehl et al., 2020;Montagnon et al., 2021) also reported a high level of genetic diversity in arabica coffee comprised of different breeding categories, with 11.9 and 7.4 alleles per marker respectively, using eight SSR markers.
In contrast, (Missio et al., 2011) reported the presence of a low level of genetic diversity with a mean number of 4.0 and 1.9 alleles per locus and 11.1 and 28.6% polymorphism in elite breeding lines of arabica coffee developed from Brazilian coffee breeding program using 17 EST-SSR and 18 g SSR markers, respectively.Values of genetic parameters detected in our study are also higher than those reported among cultivated commercial varieties and ex-situ conserved genetic resources of arabica coffee outside of Ethiopia using SSR markers (Anthony et al., 2001;Teressa et al., 2010;Geleta et al., 2012;Al-Murish et al., 2013).In most of these studies, low levels of genetic diversity with mean values ranging from 2.0-2.87 alleles per locus, 32-42% polymorphism and 0.22-0.33PIC were reported across the evaluated coffee samples.These differences compared to the results of the present and the previous studies could partly be explained by the variation in the number and repeat motifs of the SSRs primer sets used for the analysis as well as the sample size and genetic background of the studied genetic materials.The elite breeding line analyzed in our study were selected from populations with broad genetic bases assembled from diverse geographical origins whereas most of the coffee samples analyzed in previous studies were traditional cultivars developed by line selection or crosses between parental lines derived from bourbon and Typica that are known to have a narrow genetic base (Van der Vossen, 1985).

Genetic Similarity among Evaluated Breeding Lines
The genetic similarity coefficient values for the majority (92 %) of the pair-wise combinations and the overall average (0.37) were very low, indicating the presence of wide genetic diversity among the studied breeding lines.A similar result was reported by (Benti et al., 2021) in commercial coffee varieties in Ethiopia using the same stets of SSR markers.Such results are expected as the coffee breeding program has been relying on a source population having a broad genepool assembled from fairly distant geographical regions (Benti, 2017).The present study also revealed the presence of a high level of genetic diversity among individuals of the same geographical origin.This can be noted from Table 4A-D where 68.2 to 89% of the pairwise as well as the average similarity coefficient values detected within the regions were less or equal to 0.50 which corroborates with values observed in local landrace varieties developed independently for Hararge, Sidama/Yirga cheffe and Wollega coffee-producing regions (Benti et al., 2021).However, several authors reported contrasting results and very close genetic relationships with similarity coefficients ranging from 0.90 to 0.96 in previous studies using SSRs and other marker systems (Dessalegn et al., 2009;Mishra et al., 2012;Al-Murish et al., 2013).The presence of distantly related elite breeding lines within the same geographical regions is of particular importance in the development of improved locally adapted landrace coffee varieties through pure-line selection or hybridization without affecting the typical quality profile of the targeted coffee-producing areas viz., Limmu, Illubabor, Wollega and Bale.

Clustering Patterns of Elite Breeding Lines
The results of cluster analyses also agree with the results of genetic parameters and the estimates of genetic similarity.The cophenetic correlation coefficient (0.88) was high, indicating that the dendrogram corresponded to 88% of the similarity matrices.Accordingly, most of the breeding lines with low pair-wise genetic similarity coefficients (0.20-0.24), for instance, L6xL7 (in clusters VII and IV), L5xL10 (clusters VII and VI), mm3xmm5 (clusters II and VII), w3xw8 (clusters V and VI) and B4xB12 (clusters VI and VII), belong to different clusters (Table 4A-D; Fig. 1).Whereas those pairs of lines exhibited the highest coefficients, as in the case between L2 x L4 (0.68), m4 x m6 (0.65), w7 x w10 (0.68) and the most genetically similar lines B2 x B3 (0.78), grouped very closely in the same cluster viz., clusters IV, II, V and IV, respectively.Moreover, two clustering patterns were also noted on the denerogram.In the first case, breeding lines from the same geographical origin were distributed into different clusters, except those from Wollega where almost all are assigned to a single cluster (Fig. 1).
The distribution of the breeding lines into different clusters regardless of their geographical origin could facilitate easy identification of distantly related parental lines for crossing to develop new productive landrace hybrid coffee varieties.This, in turn, would account for broadening the genetic base of released varieties, as well as contribute towards a sustainable supply of origin-based high-quality coffee from each region to the specialty coffee market.Similar results were reported by (López-Gartner et al., 2009) who observed the distribution of 68 arabica coffee accessions collected from the same geographical regions into different clusters using SSR markers.Furthermore, distantly related elite breeding lines from different geographical origins assigned into different clusters could also be crossed to study the effects of genetic and/or geographical variation on the typical quality profile as well as to develop productive hybrid coffee varieties that can be grown in wider agroecologies and changing environmental conditions in the current climate change scenario.
On the other hand, there was a clear relationship between geographic origin and the resulting clusters as observed in clusters I (Bale), II and III (Ilubabor) and V (Wollega), where each of them is dominated by genetic materials from the same origin.Similar results were reported previously by (Aga et al., 2003;Tesfaye et al., 2014) using ISSR markers, where coffee tree samples comprising forest, semi-forest and farmers' cultivars collected from different parts of Ethiopia were clustered based on their geographic origin.Such results were also reported from Indonesia, where 73 to 92 % genetic similarity was detected by eight SRAP markers in populations of five arabica coffee varieties collected from different locations (Yunita et al., 2020).This indicates that some genotypes from the same geographic origin are more closely related at the DNA level than those from different origins.This could be due to a large number of shared alleles that resulted from gene flow within the population of the same region than between different regions.Such clustering pattern is more reflected by coffee types from Wellega (Cluster V) where proximity among all the breeding lines was observed on the denderogam.This could also be explained in part by the domestication process of Wollega coffee might have been initiated from source populations that differ in origin from the other groups of breeding lines evaluated in this study.In such cases, coffee breeders need to focus on the presence of morphological variation as a complement to genetic distance while selecting parental lines for hetersosis breeding or direct release as a new pure line variety.The importance of variation in morphology and/or geographical origin among parental lines in exploiting hybrid vigor in arabica coffee was reported by (Bellachew et al., 2008).

Conclusion
In this study, we demonstrated the potential of SSR markers in depicting genetic diversity among elite breeding lines of arabica coffee.Hence, 13 of the markers that showed high PIC (above 0.50) values could be used for future genetic analysis in advanced breeding lines of arabica coffee.The use of SSR markers also verified considerable genetic diversity and distant relatedness within and among the elite breeding lines selected for each geographical origin.The information generated from this study could also be used by coffee breeders to design appropriate breeding strategies for regional and/or wider adaptable variety development considering origin-based quality profiles and the present climate change scenario, respectively.The studied breeding lines could also be conserved and managed as an important source of genetic variability to be used in the future coffee improvement program.
GTCCCTGAT a Combes et al. (2000).b Institute for Research and Development (IRD) France (personal communication)

Fig. 1 :
Fig. 1: Dendrogram generated after UPGMA based on Jaccard's similarity coefficient of the 48 elite breeding lines using 14 SSR markers.

Table 1 :
List of and information on Elite Breeding Lines (EBL) of

Table 3 :
Indices of genetic diversity in elite breeding lines of arabica coffee using 14 SSR markers Note: Na = Number of total alleles, Pa = polymorphic alleles, Pr = polymorphism rate, PIC = polymorphic information Content