Estimating the Genetic Polymorphisms of Gujarat Population using 17 Y Chromosomal STR loci

Corresponding Author: Aditi Mishra School of Forensic Science and Risk Management, Rashtriya Raksha University, Gandhinagar, India E-mail: aditi250192@gmail.com Abstract: The present study aims to report genetic variations in unrelated male (n = 118) of the Gujarat population with the help of seventeen YSTR markers (DYS456, DYS389I, DYS390, DYS389II, DYS458, DYS19, DYS385a/b, DYS393, DYS391, DYS439, DYS635, DYS392, Y GATA H4, DYS437, DYS438 and DYS448). Allele frequencies and forensic parameters were calculated. A total of 96 distinct alleles were reported with their corresponding allele frequency values ranging from 0.008 to 0.644. A higher value of discrimination capacity (0.9830) and haplotype diversity (0.9997) was observed in the studied population. A total of 116 haplotypes were observed among which 114 haplotypes were unique. The genetic diversity for 17 Y-STR ranged from 0.526 (DYS437) to 0.837 (DYS385b). The locus DYS385b showed the highest polymorphism (0.809). Genetic relationship and their corresponding pvalue were also studied among the studied population and other Indian populations. Apart from this, the obtained results were evidenced by constructing a phylogenetic tree and were also shown using Multidimensional Scaling (MDS) plot. The higher value of haplotype diversity and discrimination capacity indicates greater potential in distinguishing male individuals from the studied population. Hence, it is assumed that generated population data are highly valuable for forensic caseworks and population genetics.


Introduction
Y-chromosome Short Tandem Repeats (Y-STRs) markers are extensively used in forensic DNA analysis for various applications such as paternity testing, sexual assaults, evolutionary studies, kinship analysis, disaster victim identification, familial searching and genealogical research (Corach et al., 2001;Kayser, 2017;Kumar et al., 2018). The application of Y-STRs are highly evidenced in cases where both male and female have contributed in to the same trace and using these markers male and female DNA samples could be differentiated (Roewer, 2019). The lack of recombination process makes them inherit same from one generation to another without going much transformations. Hence, they are also known as male marker. With advancement multiple STR kits have been introduced that provides better discrimination power and shows great genetic variations (Arnaud, 2017). Apart from the forensic applications, population studies have been also conducted by researchers to identify whether these Y-STR markers based on the diversity measures are suitable for paternal lineage identification or not (Chen et al., 2021). Therefore, the present study has been carried out to see the genetic diversity or variations in the Gujarat population using seventeen Y-STR markers (DYS456, DYS389I, DYS390, DYS389II, DYS458, DYS19, DYS385a/b, DYS393, DYS391, DYS439, DYS635, DYS392, Y GATA H4, DYS437, DYS438 and DYS448). So far, very limited population studies have been conducted and reported on the Gujarat population (Mishra et al., 2019;Dubut et al., 2009;Chaudhari and Dahiya, 2014;Sharma et al., 2009). Hence, there is a need to report more population data on this particular population in order to observe their genetic diversity. This study aims to explore Y-STR haplotypes of 118 unrelated male samples from Gujarat, India using AmpFlSTR ® Yfiler™ kit (Thermo Fisher Scientific, Foster City, CA, USA) and then compared to previously published thirteen Y-STR haplotype data. It is expected that the present findings will add to the existing state of knowledge about the population genetics and distribution of Y-STR haplotypes in Gujarat Population.
The state Gujarat is derived from Sanskrit (Gurjar-Rashtra) located in Western part of India and lies on the Kathiawar peninsula. It is the fifth-largest Indian state by area and the ninth-largest state by population. Gujarat has very rich cultural and historical diversity. It also consists some sites of the ancient Indus Valley Civilisation, such as Lothal, Dholavira and Gola Dhoro. The total population of Gujarat is 6.27 Crore, which is 4.99% of the total population of the country (according to 2011 census report). Among the total population 3.1 Crore are males and 2.8 Crore are females. There are total 33 districts in the Gujarat state. As Gujarat has vast diversity, it is important to study their genetic variations or genetic diversity.

Methods and Materials
Sample Prior to collection, ethical clearance was obtained from the University Ethical Committee. Peripheral blood samples of 118 unrelated healthy male individuals were collected from the Gujarat State. All the samples belong to the population of Gujarat, as their ethnicity and birthplace was documented through self-made questionnaire. A well-informed written consent was also obtained prior to the study as per the Declaration of Helsinki (PP, 1964). Samples were collected in EDTA blood collection tubes and marked accordingly.

DNA Isolation and DNA Quantitation
The collected blood samples were subjected to phenol-extraction method and genomic DNA was extracted. After extraction, they were quantified using Quantifiler® Duo DNA Quantification Kit (Thermo Fisher Scientific, Foster City, CA, USA) as per the manufacturer's protocol.

PCR Amplification and Genotyping
Using AmpFLSTR Yfiler™ PCR kit (Thermo Fisher Scientific, Foster City, CA, USA), Genomic DNA was amplified for seventeen Y-STR loci (DYS19, DYS389I, DYS389II, DYS390, DYS391, DYS392, DYS393, DYS385a/b, DYS437, DYS438, DYS439, DYS448, DYS456, DYS458, DYS635 and Y-GATA-H4). Negative and positive reactions were also used throughout the process. Gene Amp PCR System 9700 Thermal Cycler (Applied Biosystems, Foster City, CA, USA) was used for the amplification process. Capillary electrophoresis method was used for separation of amplified DNA products. With the help of ABI 3100 Genetic Analyzer (Life Technologies Corporation, Carlsbad, California, USA) genotyping were carried out. GeneScan500-LIZ internal lane size was used for sizing the DNA fragments. All the procedures and steps were carried out as per the recommended protocol by manufacturer. Gene Mapper ID v3.2 (Applied Biosystems, Foster City, CA, USA) software was used to generate DNA profile from capillary electrophoresis.

Statistical Analysis
An online software named Software STR Analysis for Forensics (STRAF) was used to calculate the allelic frequencies and forensics parameters of studied population (Gouy and Zieger, 2017). Forensic parameters such as Gene Diversity (GD), Haplotype Diversity (HD), Power of Discrimination (PD), Polymorphic Information Content (PIC) and Matching Probability (PM) were calculated. The Discrimination Capacity (DC) was computed by the ratio of total number of observed haplotypes to the total number of haplotypes (Eliades and Eliades, 2009). Haplotype was calculated using Haplotype Analysis v1.04 software (Eliades and Eliades, 2009). Genetic distance (Rst) and their corresponding values between studied populations and other published populations were calculated using Analysis of Molecular Variance (AMOVA) (Roewer et al., 1996), available at YHRD website(http://www.yhrd.org). Also, the obtained results were evidenced by constructing phylogenetic tree and was also shown using Multidimensional Scaling (MDS) plot. MEGA 6.0 software was used to construct the Neighbour-Joining (NJ) tree with 1000 bootstrap replicates (Caspermeyer, 2018).

Y-Chromosome Diversity
Allele frequencies distribution of seventeen Y-STR loci in the Gujarat population are reported in Table1. A total of 96 different alleles are observed and their allele frequencies ranged from 0.008 to 0.644. Locus DYS385b has highest number of alleles (n = 9) while locus DYS391 has lowest number of alleles (n = 3). Hence, higher level of polymorphism was observed in DYS385b locus. The highest genetic diversity was found at locus DYS385b (0.837) and the lowest genetic diversity found at locus DYS437 (0.526) (Fig. 1). Locus DYS385b showed the highest value of Polymorphic information content (0.809) while locus DYS391 showed lower value of polymorphic information content (0.467). For Match Probability (PM), DYS437 showed the highest (0.479) matching probability while DYS385b showed lower matching probability (0.170) in this set of studied STRs. For Power of Discrimination (PD), DYS437 showed the lowest value (0.3075) while DYS385b showed the highest value (0.8143). All the forensic parameters for seventeen Y-STRs are summarised in Table 1.

Haplotypes Analysis
The seventeen Y-STR data was also computed for the haplotype analysis and among 118 haplotypes, 116 (96.61%) haplotypes were observed, out of which 114 haplotypes were unique and two haplotypes were shared between two individuals. The overall haplotype diversity and discrimination capacity of seventeen Y-STRs were found to be 0.9997 and 0.9830. As observed, both the forensic parameters have higher values and thus indicates greater degree of genomic diversity in the Gujarat population.  (Ghosh et al., 2011;Imam et al., 2018;Mohapatra et al., 2019;Kumar et al., 2020;Kumawat et al., 2020) . Paired genetic distance (RST) and their corresponding values were estimated at the same 17 Y-STRs by comparing the haplotypes of studied population and other thirteen published populations haplotypes using Analysis of Molecular Variance (AMOVA) (  (Fig. 2) and by the clustering pattern observed in NJ tree (Fig. 3) and showed that the also showed the Odisha and Gujarat population was closest to the studied population.

Conclusion
Due to absence of recombination process in Y-STR chromosome markers, they are considered highly effective in studying the genetic diversity of male individuals in population. They also have wide applications in forensic caseworks such as sexual assault cases where mixed samples of male and female are found. In order to understand the origin of modern human, it is important to conduct population studies of various populations. Although various population studies have been reported from different regions of India, but limited population studies have been conducted and reported from Gujarat region. One such genetic study was conducted on two ethnic groups of Asian ancestry (Gujarat and Guangdong-Fujian provinces) from Reunion Island (Indian Ocean) where 123 male haplotypes (Dubut et al., 2009) was studied based on 10-Y-STR loci. Since, the sample size was very less it might not be informative enough in inferring population history. Khurana et al. (2014) also investigated the population structure, population histories and genetic diversity of 284 unrelated male of Southern Gujarat population using 48 bi-allelic Y-STR markers.
So far, the genetic structure and population relationships in Gujarat population remain unclear due to limited studies and relatively small sample sizes in previous studies. Hence, there is need to report more population studies that could explore the genetic diversity of Gujarat population. The present study aims to explore the genetic polymorphism of the Gujarat population by targeting 17 Y-STR markers. All the forensic parameters and allelic frequencies were calculated. As the results showed, the DC value (0.9830) was higher in the studied population, demonstrating that these 17 Y-STRs have more polymorphism and higher systematic efficiency in Gujarat populations and can be used as a useful forensic marker set. The distribution of haplotypes was also observed. Higher value of haplotype diversity has been reported here.
The present study also calculates Pairwise population values among studied population and other published populations and our results demonstrate that Gujarat people show a close affinity with Odisha, India  [Bhotra] cluster in same branch. In conclusion, an attempt has been made to develop the better understanding of genetic relationship between Gujarat (Western India) and other neighbouring populations. This is the first report that provides population data on the genetic variations on the Y-STR polymorphism in Gujarat population, India using 17 Y-STR markers. Overall, the findings of the present study showed that Y-STR markers are polymorphic and can be highly valuable in forensics as well as anthropological studies for Indian population. The author highly encourages the other researchers to explore more genetic polymorphism in the Gujarat population.