Comparative Genomics of Domesticated and Wild Sunflower: Complete Chloroplast and Mitochondrial Genomes

Corresponding Author: M.S. Makarenko, Department of Genetics, Southern Federal University, Rostov-on-Don, Russia Email: mcmakarenko@yandex.ru Abstract: The entire chloroplast and mitochondrial genomes of domesticated and wild type sunflower were sequenced. The comparative analysis of chloroplast genomes revealed 43 variant sites, including 21 polymorphic SSR loci and 22 SNPs. About 14 variant sites were found by collation of mitochondrial DNA (mtDNA), among them 4 SSRs, 8 SNPs and 2 deletions. About 9 SNPs were located in coding region of chloroplast DNA (cpDNA) and single SNP was mapped in mitochondrial gene. Only three SNPs caused amino acid changes: Two SNPs in cpDNA and one mtDNA SNP. Despite the fact that sunflower mitochondrial genome sequence is twice as long as chloroplast genome sequence, mtDNA has one third as much variant sites than cpDNA.


Introduction
Whole Genome Sequencing (WGS) is now a common technique in contemporary plant research (Balakrishnan et al., 2015). WGS provides an opportunity to investigate nucleotide diversity much faster and more accurate than hybridization-based methods (molecular beacons, microarrays etc.), enzymebased methods (RFLP, many PCR methods, HRM) or Sanger sequencing technique. Due to WGS recently there has been rapidly increasing number of genomes in GeneBank, predominantly small genomes as bacterial or organelle genomes (Phan and Nguyen, 2013). Mostly, WGS data are used in phylogenetic analysis. For plant phylogenetic analysis, typically chloroplast DNA (cpDNA) is used. Whole chloroplast genomes of buckwheat (Logacheva et al., 2008), citrus (Carbonell-Caballero et al., 2015), mulberry (Kon and Yang, 2015) etc. have revealed relationships between valuable cultivated plants and their wild ancestries. For understanding of plant phylogeny, mitochondrial DNA (mtDNA) sequences also are significant (Knoop et al., 2011), although, for such purpose, complete mitochondrial genomes are used less often than chloroplast. Relatively few papers provide phylogenetic analysis on data of both extranuclear genomes. Moreover, phylogenetic trees based on chloroplast genomes and mitochondrial genomes of the same objects may have quite different topologies (Bock et al., 2014). In addition, WGS data make possible creation of DNA markers (SSR, CAPS etc). As well as complete genome sequences can be used for developing specific transformation vectors (Chen et al., 2011) or other genetic engineering applications.
In the present work we studied the polymorphism of the chloroplast and mitochondrial genomes of Helianthus annuus. We sequenced the cpDNA and mtDNA of wild and domestic sunflower and identified the polymorphic loci that can be used as DNA targets for extranuclear genomes genotyping. These data will also be useful for future phylogenetic analysis.

Plant Material
The study was carried out on two forms of Helianthus annuus: Cultivated line (№ 3629) and wild type (№ 398941). Sample seeds were received from seed bank of the N.I. Vavilov Research Institute of Plant Industry. The origin of cultivated line is Rostov region (Russia), the ancestor of inbreeded cultivated line 3629 was high oil variety of Zhdanov Don Experiment Station collection. Since 1965 domesticated line has been cultivated at the station of Southern Federal University, in strict isolation. Wild type sunflower has its origin from California region (USA), while it has cultivated in isolation conditions of Vavilov Research Institute of Plant station in Krasnodar region (Russia) since 1977. Both types of sunflower are highly inbred lines. Domesticated sunflower has stem without lateral shoots and single large inflorescence. However, wild sunflower has fruticose habit and large number of small inflorescences (40-80).

Mitochondrial and Chloroplast DNA Isolation
Before performing DNA extraction, chloroplast and mitochondrial fractions were isolated from 10 day sunflower seedlings according to the method of Triboush et al. (1998) with our modifications. Briefly, 1 g of leaves was homogenized by mortar and pestle in STE buffer (0,4M sucrose, 50 mMTris pH 7.8, 20 mM EDTA-Na2, 0.2% bovine serum albumin, 0.2% bmercaptoethanol) and then centrifuged. The organelle fractions were isolated by centrifuging the homogenate at 2,000 g for 15 min, discarding the pellet and centrifuging the supernatant at 14,000 g for 15 min. DNA was extracted from the precipitate by PhytoSorb kit (Syntol, Russia), according to the manufacture's instruction.

NGS Library Preparation
For library preparation 40 ng of DNA were sheared using Covaris S220 system. NEBNext Ultra DNA Library Prep Kit (New England Biolabs, UK) was used for further manipulation. All library preparation steps were done pursuant to manual. According to Agilent 2100 Bioanalyzer data, NGS libraries length was, mainly, 450-550 bp. Libraries were quantified using Qubit (Invitrogen, USA) fluorimeter and qPCR, then diluted up to final concentration of 8 pM.

Sequencing and NGS Data Analysis
Diluted libraries were clustered on a paired-end flow cell using cBot instrument and sequenced in 100 cycles using HiSeq2000 sequencer with TruSeqSBSKitv3-HS (Illumina, USA). A total number of 2,806,411 100-bp paired reads were generated for domesticated sunflower and 2,058.566 reads for wild type. Quality of reads was determined by Fast QC. Trimming of adapter-derived and low quality (Q-score below 30) reads was performed with Trimmomatic software (Bolger et al., 2014). Using Bowtie2 tool (Langmead and Salzberg, 2012) sequencing reads were aligned to reference sequences (NCBI accessions NC_007977.1 and NC_023337.1). Variant calling was made by samtools/bcftools software (Li, 2011) and manually revised using IGV tool (Thorvaldsdóttir et al., 2013).

Results
Obtained NGS data allowed us to get complete sequences of domesticated and wild sunflower extranuclear genomes. The overall alignment rate for both genomes was more than 50% of total read number. The average read coverage was more than 800 for chloroplast genomes and more than 100 for mitochondrial genomes. These data were sufficient for a qualitative variant calling. Comparative analysis of chloroplast genomes of domestic and wild sunflower revealed 43 variant sites (Table 1). Among them 21 (48.8%) variations were in simple sequence repeats length and 22 (51.2%) were SNPs. Most presented polymorphic sites (20 SSRs, 16 SNPs) were located in large single copy region of chloroplast, the other 7 (1 SSR, 6 SNPs) sites were mapped in small single copy region. It is interesting to note, that inverted repeat region of chloroplast genomes had identical sequences.
About 9 SNPs were located in Intergenic Region (IGR), 4 SNPs were presented in noncoding gene regions. 7 SNPs of coding regions were synonymous and only 2 were nonsynonymous ( Table 1). Comparison of cultivated lines 3629 and HA383 (presented in GenBank) in our previous study has revealed 12 polymorphic sites-8 SSRs and 4 SNPs (Markin et al., 2015).
The variable mononucleotide repeats were presented by two poly T, one poly C and one poly A. Among single nucleotide substitutions there were 2 transition mutations: A/G and C/T. Transversion mutations were as follows: 3 G/T, 2 A/C and 1 C/G. Seven out of eight SNPs were located in IGR and the last one was mapped in coding DNA sequence. This SNP results in amino acid change at 232-d position (Ser232Tyr) of protein encoded by nad6.
Comparison of complete mitochondrion sequences of cultivated lines 3629 and HA412 (presented in GenBank) allowed detecting 6 polymorphic sites: 5 SSRs and 1 SNP.

Discussion
According to data obtained from extranuclear genomes analysis of domestic and wild sunflower, a few assumptions could be established. The first one is that mtDNA has less total number of polymorphic sites, than cpDNA. In chloroplast genome 0.146 SNP accounted for 1 kb of sequence, in mitochondrial genome this characteristics is 0.027 (5.4 fold lower). However, this may be due to conservatism of mtDNA, because, plant mitochondrial genes evolve slowly than chloroplast genes (Page and Holmes, 1998).
Sunflower mitochondrial genome contains about 22.8 kb of CDS and frequency of SNP in CDS was 0.04/1kb. In Chloroplast CDS (total CDS length is 78.5 kb) 0.11 SNP accounted for 1 kb of sequence, so the difference of SNP frequency in coding region is 2.75 fold. Another assumption could be made, that sunflower chloroplast CDS evolve approximately 2.5-3 faster than mitochondrial CDS. For real establishment of this supposition we have not enough data, but according to published data the rate of substitutions in mitochondrial and chloroplast angiosperms genes is 1:3 (Drouin et al., 2008;Duminil, 2014). It is interesting to note, that the frequency of nonsynonymous SNP is, conversely, 1.6 fold higher in mtDNA (0.04/1kb), than in cpDNA (0.025/1kb). Although this feature may be present due to lack of data.
The sequence data of genomes have been obtained using only one inbred domestic line and one wild line and could not demonstrate the variety of all wild genotypes. However, we would expect the same SNP, especially in mitohondrial DNA, in other wild types of sunflower with different origins. So the Russian cultivated line 3629 and American cultivated line HA383 (NCBI accession NC_007977.1) has only 4 polymorphic SNP sites in cpDNA and mtDNA comparison of 3629 and HA412 (NCBI accession NC_023337.1) lines revealed only one SNP. The revealed polymorphic sites could be useful for molecular markers development. In future studies we plan to investigate these polymorphic sites in wild types of sunflower with diverse ancestry.

Conclusion
The comparative analysis of domesticated and wild sunflower chloroplast genomes revealed 43 variant sites, including 21 polymorphic SSR loci and 22 SNPs. About 14 variant sites were found by collation of mitochondrial DNA (mtDNA), among them 4 SSRs, 8 SNPs and 2 deletions. About 9 SNPs were located in multiple coding regions of chloroplast DNA and the frequency of SNP in CDS was 0.11/1kb. A single SNP in mitochondrial gene was detected, so the frequency of SNP in CDS was 0.04/1kb. Only three SNPs caused amino acid changes-two cpDNA SNP and one mtDNA SNP. Despite the fact that the complete mitochondrial genome sequence is twice as long as chloroplast genome sequence, mtDNA has one third as much variant sites than cpDNA.