In Silico Analysis of Cellulose Synthase Gene (NcCesA1) in Developing Xylem Tissues of Neolamarckia Cadamba

Corresponding Author: Ho Wei Seng, Forest Genomics and Informatics Laboratory (fGiL), Department of Molecular Biology, Faculty of Resource Science and Technology, Universiti Malaysia Sarawak, 94300 Kota Samarahan, Sarawak Email: wsho@frst.unimas.my Abstract: This study reported the isolation and in silico characterization of fulllength cellulose synthase (CesA) cDNA from Neolamarckia cadamba, an important tropical plantation tree species. CesA is a member of processive glycosyltransferases that involved in cellulose biosynthesis of plants. CesA acts as a central catalyst in the generation of plant cell wall biomass or cellulose. It also plays an important role in regulating wood formation. The hypothetical full-length CesA cDNA (NcCesA1; JX134621) was assembled by contig mapping approach using the CesA cDNA sequences from NcdbEST and the amplified 5’-and 3’-RACE PCR sequences. The NcCesA1 cDNA has a length of 3,472 bp with 3,126 bp open reading frame encoding a 1,042 amino acid sequence. The predicted NcCesA1 protein contained N-terminal cysteine rich zinc binding domain, 7 putative Transmembrane Helices (TMH), 4 Umotifs that contain a signature D, D, D, QxxRW motif, an alternating Conserved Region (CR-P) and 2 Hypervariable Regions (HVR). These entire shared domain structures suggest the functional role of NcCesA1 is involved in glycosyltransferation of the secondary cell wall cellulose biosynthesis of N. cadamba. Sequence comparison also revealed the high similarity (90%) among NcCesA1 and PtrCesA2 of Populus tremuloides. This further implies the involvement of NcCesA1 that catalyzes the cellulose biosynthesis of secondary cell wall rather than primary cell wall. This full-length NcCesA1 cDNA can serve as good candidate gene in association genetics study which leads to Gene-Assisted Selection (GAS) in the N. cadamba tree breeding programme. Furthermore, the isolation of new CesA genes from tropical tree genomes is essential for enhancing knowledge of cellulose biosynthesis in trees that has many fundamental and commercial implications.


Introduction
Wood is made up of secondary xylem tissues and has a chemical complex of cellulose, lignin, hemicellulose and extractives. Cellulose, a homopolymer consisting of β-1,4-glucan chains, is the most abundant form of living terrestrial biomass makes up the major cell wall biopolymer in plants (Crawford, 1981;Kumar et al., 2009). Polymerization of cellulose chains is catalysed and synthesised by specific plasma membrane-bound Cellulose Synthase complexes (CelS) (Festucci-Buselli et al., 2007). CelS is postulated to be composed of six hexametric rosette subunits where each of the rosette subunit consists six CesA catalytic subunits. Therefore, a total of 36 β-1, 4-glucan chains are produced by a CelS in most of the higher plants. These chains will then linked by hydrogen bonds to form microfibrils which will then further bundled to form macrofibrils. CelS is proved to be encoded by cellulose synthase (CesA) or CesArelated genes. CesA family is a member of Glycosyltransferases (GTs) superfamily under CAZyme family (Ross et al., 2001).
Cellulose synthases are the only identified components of rosettes. Since woody plants are unique in their cellulose biosynthesis, CesA sequence information is useful to serve as the basis for investigating molecular regulation and mechanism of cellulose biosynthesis in tree species (Lu et al., 2008). The first gene discovered to encode for CesA was in a bacteria species, Acetobacter xylinum (Saxena et al., 1990;Wong et al., 1990). Few years later, cotton (Gossypium hirsutum) was the first higher plant species found to have homologs of the bacterial CelA gene (Pear et al., 1996). CelA refers to the cellulose synthase catalytic subunit which is a specific conserved region common to glygosyltransferases found in both bacterial and plants.
To date, there are considerable amounts of full-length CesA cDNA being published in NCBI but no such information available for Neolamarckia cadamba trees. N. cadamba or locally known as kelampayan belongs to the family of Rubiaceae. It has been selected as one of the fast growing plantation species for planted forest development in Sarawak (Tchin et al., 2012;Lai et al., 2013;Tiong et al., 2014a;2014b;Ho et al., 2014;Phui et al., 2014). The state government of Sarawak has introduced the Forest (Planted Forest) Rules (1997) to encourage the development of commercial planted forests and has set a target of 1.0 million hectares for forest plantations to be established by 2020. It is estimated that 42 million of high quality seedlings are required for the annual planting programme. N. cadamba is a large, deciduous and fast growing tree that gives early economic returns within 8-10 years. Under normal conditions, it attains a height of 17 m and diameter of 25 cm at breast height (dbh) within 9 years. It is a lightweight hardwood with a density of 290-560 kg/m 3 at 15% moisture content (Joker, 2000). It is one of the best sources of raw material for the plywood industry, besides pulp and paper production. N. cadamba can also be used as a shade tree for dipterocarp line planting, whilst its leaves and bark have medical applications (WAC, 2004). N. cadamba also has high potential to be utilized as one of the renewable resource of raw materials for bioenergy production such as cellulosic biofuels in the near future.
Hence, the objectives of this study were: (i) To obtain the full-length CesA cDNA sequences through contig mapping approach by using CesA singletons from the kelampayan tree transcriptome database (NcdbEST) and RACE PCR sequences and (ii) to in silico characterize the CesA gene from N. cadamba. The full-length CesA cDNA discovered can serve as good candidate gene for association genetics study in N. cadamba to detect the potential genetic variants underlying the common and complex adaptive traits.

RNA Isolation and First Strand cDNA Preparation
Total RNA was isolated from the developing xylem tissues of a 4-year old N. cadamba tree using RNeasy® Midi Kit (QIAGEN GmbH, Germany) with modification. 5'-and 3'-RACE ready cDNA were prepared using SMARTer TM RACE cDNA Amplification Kit (Clontech, USA).

Rapid Amplification of 5'-and 3'-cDNA Ends (RACE)
Two Gene-Specific Primers (GSPs) for CesA gene were designed for 5'-RACE PCR (CesA5'GSP, 5'-TGTAACCGTACCAGGCAGGGCTATGTC-3') and 3'-RACE PCR (CesA3'GSP, 5'-GTGCTTCGATGGGCTCTTGGTTCTG-3') using Primer Premier 5 software based on the parameters given in the protocol of SMARTer TM RACE cDNA Amplification Kit (Clontech, USA). 5'-and 3'-RACE PCR were prepared in separate tubes by mixing 34.5 µL of PCR-grade water, 5.0 µL of 10× Advantage 2 PCR Buffer, 1.0 µL of dNTP Mix (10 mM) and 1.0 µL of 50× Advantage 2 Polymerase Mix. The master mix was then added into a 0.2 mL PCR tubes containing 2.5 µL of RACE-Ready cDNA, 5.0 µL of 10× Universal Primer Mix (UPM) and 1.0 µL of GSP to make a final volume of 50.0 µL. The contents were then mixed gently and subjected to thermal cycling. Touchdown PCR amplification was used: 5 Cycles of incubation at 94°C for 30 sec and 72°C for 3 min; 5 cycles of 94°C for 30 sec, 70°C for 30 sec and 72°C for 3 min; 25 cycles of 94°C for 30 sec, 68°C for 30 sec and 72°C for 3 min. The RACE PCR amplicons were purified from agarose gel by using QIAquick ® Gel Extraction Kit (QIAGEN, Germany). Purified PCR product was ligated into pGEM ® -T Easy Vector System (Promega, USA) and transformed into competent cells, Escherichia coli JM 109. The recombinant plasmids were isolated and purified using Wizard ® Plus SV Minipreps DNA Purification System (Promega, USA) according to the manufacture's protocol. After verification, the purified plasmids were sent for sequencing in both forward and reverse direction. The sequencing reactions were performed by using ABI Prism TM Bigdye TM terminator cycle sequencing Ready reaction kit V.3.1 (Applied Biosystems, USA) and analysed on a ABI 3730XL capillary DNA sequencer (Applied Biosystems, USA). The 5'-and 3'-RACE cDNA sequences were manually edited using Chromas Lite version 2.01 programme to remove the vector sequence.

Hypothetical Full-Length CesA cDNA Sequences Assembly
Singletons of CesA gene were selected from the kelampayan tree transcriptome database (NcdbEST) . These singletons were blast again NCBI database to search for sequence homology and binding position on the respective gene. Subsequently, the singletons were grouped according to the alignment score and position on gene. Singletons which have overlapping fragment were then identified and jointed together with 5'-and 3'-RACE cDNA sequences to form a full-length sequence via contig mapping approach for N. cadamba (NcCesA1).

Sequence Analysis of Full-Length CesA Gene
Sequence homology search for NcCesA1 cDNA were performed against GenBank non-redundant nucleotide sequence using the NCBI Basic Local Alignment Search Tool (BLAST) server. ExPASy (Expert Protion Analysis System) translate tool (Gasteiger et al., 2003) was use to change the nucleotide sequence into amino acid sequence and to identify the Open Reading Frame (ORF). Amino acid sequence was used to predict the domains, motifs and hypervariable regions using EBI-ClustalW2 multiple alignment tool and CLC Main Workbench version 5.0 software (CLC bio, Denmark). The transmembrane helices in protein were predicted using TMHMM Server version 2.0 (Krogh et al., 2001). Signal peptide was predicted using SignalP 4.0 Server (Petersen et al., 2011).

Three-Dimensional (3-D) Protein Structure Prediction
There is no homologous protein structure was found in the Protein Data Bank (PDB). Thus, threading protein or ab initio protein modeling using I-TASSER server (Zhang, 2008;Roy et al., 2010) was used for protein structure and function prediction. Predicted 3-D structures of NcCesA1 were viewed using Jmol, an open-source Java viewer for chemical structures in 3-D with features for chemicals, crystals, materials and biomolecules (Jmol, 2012). Vector Alignment Search Tool (VAST) provided by NCBI was used to identify similar 3-D structures in the Molecular Modeling Database (MMDM) and compared with predicted CesA protein structures. Sequence similarity (% Id) for the parts of the protein that have been superimposed and SCORE (the VAST structure-similarity score) that reflects the quality of superimposed elements were recorded (Panchenko and Madej, 2004).

Phylogenetic Analysis of Full-Length CesA Genes
The full-length CesA genes were obtained from NCBI nucleotide database and the GenBank accession numbers were recorded. All of the CesA genes of Arabidopsis thaliana were obtained from The Arabidopsis Information Resource (TAIR) database (http://arabidopsis.org) using the gene models accession. CesA genes of rice (Oryza sativa) were obtained from OryGenesDB database (http://orygenesdb.cirad.fr/) with locus name accession (Droc et al., 2005). Protein sequence of selected genes was aligned using EBI-ClustalW2 multiple alignment tool and neighbourjoining tree was generated using MEGA version 5 software (Tamura et al., 2011).

Assembly of CesA cDNA from Singletons
A total of 13 CesA singletons were obtained from NcdbEST . These singletons showed high maximum identity value scores of more than 80% as shown in Table 1. The best hits chosen were the complete CesA coding sequences from other species, such as E. camaldulensis (HQ864585), B. luminifera (FJ410444 and FJ410444), P. tremula x P. tremuloides (AY573571 and AY573572), P. tremuloides (AY055724), P. tomentesa (HQ585870, HQ585871 and HQ585872) and S. parvifolia (GQ338420). Pairwise and multiple alignment of 13 singletons were carried out to find the suitable singletons for contig-mapping of a partial hypothetical CesA gene. Five singletons (Ncdx015G11, Ncdx017H02, Ncdx029C01, Ncdx092C07 and Ncdx007A11) were selected for contig mapping to produce a partial consensus CesA cDNA sequence, namely CesA1, with a length of 1,387 bp. The BLAST result of CesA1 against NCBI nucleotide database showed that this gene has high similarity with CesA nucleotide sequences of other tree species, such as Shorea, Populus, Betula and Eucalyptus. Table 2 shows that CesA1 covers more than 80% of other CesA sequences with high nucleotide similarity and as much as 83% identity.
Alignment of CesA1 with closely related full-length CesA cDNAs of other species showed that this hypothetical partial sequence was located at the 3'-end with a predicted stop codon (TGA) at nucleotides 1,167 to 1,169 as shown in Fig. 1. The stop codon was confirmed when the translated protein ( Fig. 2) stopped at the predicted nucleotide (nucleotide 1,166). The full-length NcCesA1 is predicted to be approximately 3,100 bp, so about 2,000 nucleotides at the 5'-end yet to be identified.
Singleton Amplification of 5'-and 3'-cDNA Ends Sequences and Full-Length NcCesA1 cDNA Assembly Since the CesA1 cDNA was incomplete, Gene Specific Primers (GSPs) were designed to find the 5'-and 3'-termini. Overlapping forward and reverse primers were designed for 5'-and 3'-RACE PCR amplification. The product size of 5'-ends and 3'-ends was predicted from the alignment with full-length CesA cDNA of other species and was expected to be about 2,400 bp and 1,060 bp, respectively. Figure 3 shows the diagrammatic representative of the location of the Gene Specific Primers (GSPs) with the predicted product size. The 5'-and 3'-RACE products were sent for sequencing in both Forward (F) and Reverse (R) direction to get the maximum nucleotide sequence reading. Full sequences of three 3'-cDNA ends, 3'CesA04, 3'CesA05 and 3'CesA18 were successfully obtained with the lengths of 1,030 bp, 1,063 bp and 1,078 bp, respectively. Alignment of these three 3'-cDNA ends showed 99% similarity in nucleotide sequence and the only variation was at the poly-A tail region. Since the nucleotide differences did not affect the protein translation, 3'CesA18 was chosen for the fulllength contig mapping because the sequence length (1,063 bp) was closest to the expected size (1,060 bp).
For 5'-cDNA ends, the complete sequences could not be generated from the forward and reverse sequencing readings due to the longer expected nucleotide sequences of 5'CesA13 and 5'CesA17, which were 1,900 bp and 2,600 bp, respectively. 5'CesA13 and 5'CesA17 were considered amplified from the same gene because both of their forward sequences showed 99% similarity when aligned together. The reverse sequences of 5'CesA13 and 5'CesA17 were not similar but they overlapped at both ends (Fig. 4). 5'CesA13 was predicted to be the 'shortened' version of 5'CesA17 at the middle part of the sequence region due to PCR errors. According to Pienaar et al. (2006), editing errors might happen during DNA polymerase-catalysed enzymatic copying or DNA fragments were subjected to thermal damage during PCR, especially for long amplification.
All the 5'-RACE sequences and 3'CesA18 were used in contig mapping of the hypothetical full-length NcCesA1 cDNA as shown in Fig. 4. Full-length cDNA of closely related CesA gene from B. luminifera (BlCesA3; FJ410445), which had high sequence coverage (83%) and high sequence similarity (82%) was used as the reference gene in the contig mapping. The 5'-cDNA ends (5'CesA13 and 5'CesA17 forward and reverse sequences) and 3'-cDNA ends (3'CesA18) were successfully mapped to produce a hypothetical full-length NcCesA1 cDNA sequence with a length of 3,784 bp.
The BLAST results of the hypothetical full-length NcCesA1 (GenBank accession number: JX134621) cDNA against NCBI nucleotide database supported this gene as cellulose synthase with the identity and coverage value up to 81 and 91%, respectively as shown in Table  3. The common genetic features in CesA protein were identified in NcCesA1 as shown in the protein alignment (Fig. 5) and the diagrammatic representation of the protein structure is shown in Fig. 6.

3-D Structure Prediction of NcCesA1
The full-length protein sequence of NcCesA1 with 1,042 amino acids was used to predict 3-D secondary protein structure using I-TASSER server (Zhang, 2008;Roy et al., 2010). The predicted protein model of NcCesA1 is shown in Fig. 7. The predicted 3-D structure of ring-finger or zinc binding domain for NcCesA1 is shown in Fig. 8.

Discussion
The hypothetical full-length CesA cDNA (NcCesA1; JX134621) was assembled by contig mapping approach using five CesA singletons (Ncdx015G11, Ncdx017H02, Ncdx029C01, Ncdx092C07 and Ncdx007A11) from NcdbEST and the amplified 5'-and 3'-RACE PCR sequences (Fig. 4). The NcCesA1 showed high sequence similarity with CesA gene of other tree species, such as Betula, Shorea, poplar or aspen (Populus) and Eucalyptus. The full-length CesA cDNA of Betula and Shorea species showed the highest sequence similarity (81%) with NcCesA1, with the coverage scored up to 90%. Alignment of the NcCesA1 sequence with other closely related full-length CesA genes from tree species had predicted the start (ATG) and stop codon (TGA) codons at nucleotides 73-75 and nucleotides 3,196-3,198, respectively. NcCesA1 has a total length of 3,472 bp, which consists of 3,126 bp of coding sequence (cds) with 72 bp of 5'-UTR and 277 bp of 3'-UTR.
The full-length NcCesA1 with a 3,123 bp of ORF was translated into a 1,042 amino acid sequence. The length of the NcCesA1 protein sequence was about the same as those closely-related CesA genes as listed in Table 3. These protein sequences were in the range of 1,032-1,041 amino acids. CesA3 protein of two Eucalyptus species, E. camaldulensis (EcCesA3; HD864585) and E. grandis (EgCesA3; DQ014507) showed the highest protein sequence similarity (90%) to NcCesA1. Other CesA proteins also showed that the amino acid sequences are highly conserved among them and scored in between 85 to 94% similarity with NcCesA1 or among each other.
CesA family genes are characterised by a conserved U-motif D, D, D, QxxRW (x represents any amino acid), which can be divided into two sub-domains: Domain A (D, DxD) and Domain B (D, QxxRW). NcCesA1 protein which contains this motif was predicted to be involved in CesA enzyme substrate binding and catalytic activities (Beeckman et al., 2002;Saxena et al., 2001). Domain A consists of three aspartic acids (D). The first D found in amino acid 374 was widely spaced from the second D (amino acid 542) in NcCesA1. This domain was predicted to be involved in both processive and nonprocessive UDP-glucose substrate binding (Richmond, 2000). Addition of a few or more sugar residues to a growing chain is catalysed by processive enzyme while non-processive enzyme only catalyses one to one addition reaction. Domain B of NcCesA1 was found at amino acid 741 for D residue, QxxRW was found at amino acid 779 to 783 (Fig. 6). This second motif was predicted to be the catalytic site in CesA and possesses only processive catalytic activity (Richmond, 2000).
A highly conserved zinc binding domain (Cx 2 Cx 12 FxACx 2 Cx 2 PxCx 2 CxEx 5 Gx 3 Cx 2 C, where x is any amino acid) was found at amino acid 37 to 82. This N-termini domain was predicted to correspond to the protein interactions in determining cell microfibril structures (Delmer, 1999). This CxxC motif which is only found in CesA proteins can distinguish cellulose synthase family from Cellulose Synthase-Like (CLS) family. Therefore, NcCesA1 was confirmed to be cellulose synthase and not CLS protein.
Two Hypervariable Regions (HVRI and HVRII) were found within the NcCesA1 open reading frame. HVRI encodes 118 amino acids at position 113 to 230 near to the N-terminus. HVRII which is positioned in between two sub-domains have shorter length (70 amino acids) at position 625-695 (Fig. 6). The hypervariable region was reported to be able to classify CesA protein into subclasses (Vergara and Carpita, 2001). HVRII variable in CesA paralogous sequences but are highly conserved between CesA orthologs (Nairn and Haselkorn, 2005;Tan et al., 2014). This suggested that HVRII can be termed as Class-Specific Region (CSR) (Vergara and Carpita, 2001). To date, lots of CesA genes have been isolated and identified, for example twelve distinct CesA genes were identified in Arabidopsis and nine in maize (Holland et al., 2000).
Eight transmembrane helices were predicted throughout the whole protein sequence of NcCesA1 and the output generated by TMHMM 2.0 server (Krogh et al., 2001) Two transmembrance helices were predicted to be formed after the plant-specific conserved region, within the amino acid 256 to 299 (Fig. 6). Another six transmembrane helices were predicted at the carboxyl terminus (near to 3'-end) starting from amino acid 779. The region between sets of transmembrane helices is known as the globular domain or soluble domain (Richmond, 2000). These six transmembrance helices will then form a loop that extends into the cytoplasm. This suggests that CesA is an integral membrane protein. Plant-Conserved Region (CR-P) is the plant-specific region that was suggested to be implicated in the cellulose biosynthesis at "rosette" complexes (Roberts et al., 2002).
The homologous modeling cannot be used to predict 3-D protein structure of NcCesA1 as to date, there is no full-length 3D structure being deposited in Protein Data Bank (PDB). Thus, native NcCesA1 secondary protein structure was predicted using threading and ab initio modeling approach using I-TASSER server (Zhang, 2008;Roy et al., 2010). Protein threading or also known as fold recognition, models proteins that have the same fold as proteins of known structures, but do not match any homologous protein structures in PDB. Ab initio or de novo protein modeling involves the prediction of 3-D protein structure based on physical principles without referring to previous solved structures.
The predicted protein model of NcCesA1 has a confidence level scored at-0.85 (-5< C-score <2). This indicated that the quality of the predicted model was acceptable. A TM-score of more than 0.5 (0.61±0.14) suggested that the topology of predicted NcCesA1 structure was correct. TM-score was proposed to measure the structural similarity between two structures (Zhang and Skolnick, 2004). NcCesA1 model as shown in Fig. 7 has the highest cluster density (0.1665) among other models, with a total of 177 decoys (low temperature replica). The more the number of structural decoys used, the higher the cluster density, which means the quality of prediction improves.
The overall 3-D structure showed that NcCesA1 contains more α-helix structures than β-strands. Residue 19 to 110 showed the highest sequence identity (75%) with ring-finger domain of Arabidopsis IXR3 (AtCesA7) (PDB ID: 1WEO). The predicted 3D structure of this domain for NcCesA1 and AtCessA7 was almost the same as shown in Fig. 8. Ring-finger or zinc binding domain was conserved at the N-terminal among CesA proteins of different species and therefore they showed high similarity in sequence and structure. In NcCesA1, this domain was located between amino acid 20 to 110 as shown in the diagrammatic representative of NcCesA1 protein in Fig. 5. This structure was suggested to be involved in determining the cell microfibril structures during protein interactions (Delmer, 1999).
Six main clades can be distinguished from the unrooted neighbour-joining tree of CesA protein sequences of various species (Fig. 9). Previous studies had shown that CesA from higher plant can be grouped into six distinct clades, such as in Populus (Liang and Joshi, 2004;Djerbi et al., 2005;Nairn and Haselkorn, 2005) and Eucalyptus (Ranik and Myburg, 2006). Vergara and Carpita (2001) had discovered that the appearance of different CesA sub-classes was primarily determined by HVRII and therefore, this region was termed as Class-Specific Region (CSR). They also showed that the topology of phylogenetic tree is mainly determined by HVRII regions, where such topology remained when either HVRII regions or fulllength protein sequences were used in the phylogenetic analysis. All of the six clades contain CesA members associated with either primary or secondary cell wall development. In this study, NcCesA1 was clustered in group III, the clade that is associated with secondary cell wall formation (Fig. 9). Another two clades that are involved in secondary cell wall development are groups V and VI. Clade I, II and IV consisted of CesA proteins associated with the developmental process in primary cell walls. Ranik and Myburg (2006) also grouped six Eucalyptus grandis CesA and other 52 CesAs from dicot, monocot and gymnosperm species into three clades of primary cell wall-associated CesAs and three clades of secondary cell wall-associated CesAs. The similar phylogenetic topology also can be observed from 18 CesA genes of P. Trichocarpa (Djerbi et al., 2005).
The involvement of CesA proteins in clade I, II and IV in primary cellulose synthesis had been discussed and proven by some expression and functional studies (Arioli et al., 1998;Beeckman et al., 2002;Desprez et al., 2002;Ellis et al., 2002;Caño-Delgado et al., 2003). A study carried out by Arioli et al. (1998) showed that clade II CesA might carry out primary cellulose synthesis. This study showed that mutated Arabidopsis AtCesA1 (radial sewlling1, rsw1) gene reduced cellulose deposition and crystallisation in young, expanding cell of seedlings. AtCesA3 or cev1 in clade I which expressed predominantly in root tissues that actively carry out primary cell wall synthesis, was shown to decrease the cellulose content in root tissues when the gene was mutated (Ellis et al., 2002). Fagard et al. (2000) also confirmed the role of AtCesA6 or PROCUSTE1 (PRC1) (clustered in clade IV of Fig. 9) in primary cellulose synthesis of roots and dark-grown hypocotyls in Arabidopsis.
CesAs that associate with cellulose biosynthesis in secondary cell walls (clades III, V and VI) also have been predicted through functional and expression studies (Taylor et al., 2002;Appenzeller et al., 2004;Djerbi et al., 2005;Nairn and Haselkorn, 2005;Ranik and Myburg, 2006). Recently, three CesA genes (ZmCesA10, ZmCesA11 and ZmCesA12) were found to be highly expressed in the stalk, where the tissues are rich in secondary cell walls (Appenzeller et al., 2004), however these three CesA genes were clustered separately in clades V, VI and III, respectively. A study of CesA gene in E. grandis also found three proteins, EgCesA1, EgCesA2 and EgCesA3 grouped in three different clades of secondary cell wall-associated cellulose synthesis, similar with clades VI, V and III, respectively as shown in Fig. 9 (Ranik and Myburg, 2006).

Conclusion
The full-length NcCesA1 (JX134621) gene with the total length of 3,472 bp, encoding a 1,042 amino acid, was successfully isolated and characterized from N. cadamba by using singletons of CesA from the kelampayan tree transcriptome database (NcdbEST) and sequences generated from the 5'-and 3'-RACE to form a hypothetical full-length CesA sequence. To the best of our knowledge, this is the first report on the assembly of full-length CesA gene in N. cadamba through a contig mapping approach. NcCesA1 was successfully characterised by its common genetic features that are conserved among CesA proteins. One of the important features, D,D,D,QxxRW, that was found in NcCesA1 has confirmed that this protein is a cellulose synthase protein and not a cellulose synthase-like protein.
Phylogenetic analysis also revealed that NcCesA1 protein is involved in cellulose biosynthesis of the secondary cell walls. We hope this newly isolated CesA gene could pave the way for identifying molecular mechanisms controlling wood formation in future and will also be candidate for association genetic studies aiming at the production of high value forests (Thumma et al., 2005;Ho et al., 2011;Tchin et al., 2012;Tiong et al., 2014a;Tan et al., 2014). Furthermore, the detailed understanding on the regulation of CesA gene could provide a greater impact on the design of future genetic improvement strategies in the production of better quality wood that is typically present in the secondary walls of xylem in N. cadamba.