A Survey of Compute Intensive Algorithms for Ribo Nucleic Acids Structural Detection

: Problem statement: Finding an accurate RNA structural alignment from primary sequence due to it is time consuming and computationally NP-hard problem is a major bioinformatics challenge. According to our investigation majority of current researches were concerned on achieving faster execution time, improving space complexity and better cache management. Recently one research introduced cache-efficient Chip Multiprocessor (CMP) algorithms with good speed-up to exploit parallelism in detection the critical path length. Our contribution in this article was a comprehensive survey of methods for solving RNA secondary structure prediction with Pseudoknots (PK) and sequence alignment in bioinformatics. The aim was to highlight the challenges related issues which would provide sufficient information to assist the new coming researchers in this field as well as a good reference guide for bioinformatics professionals. Approach: We computed various algorithms that predicted an RNA molecules secondary structure from primary sequence, without pseudoknots from one side and pseudoknotted RNA secondary structure in the other side. Furthermore, we also reviewed and compared in two tables the methods that developed for RNA structural predictions. Results: Our findings of this survey confirmed that Dynamic Programming (DP) method via CMP algorithms can be used to predict the RNA secondary structure with simple PK and it gives good results. Conclusion: The methods for predicting RNA's structural are coming in two groups: Firstly, pseudoknotted RNA structural problem is computationally complex and secondly, common methods significantly gave not accurate enough results for predicting pseudoknotted RNA.


INTRODUCTION
Bioinformatics is a computer application to manage the biological information and it uses computer to gather, store, analyze, manipulate, interpret and integrate biological, genetic information and macromolecules (Deoxyribo Nucleic Acid (DNA), Ribo Nucleic Acids (RNA), or proteins). One of the most on touched problem is to predict the three-dimensional (3D) RNA structure from the primary sequence. Nowadays, it is still a great challenge for biologist to understand RNA's functionalities, which depend on RNA 3D structural features. The main two experimental methods for structure determination are: The Nuclear Magnetic Resonance (NMR) and the Computational X-ray crystallography, which are a completely accurate method for determining the folded structure of RNA molecule [1] . But unfortunately, both NMR and crystallography are time consuming and very expensive experiments. High level of knowledge is needed to run the experiments which is lacking in the young scientists to overcome this problem [2] . Therefore, three different categories of computational methods to predict the structure of RNA were proposed, (i) Thermodynamic optimal structure or energy minimization model, (ii) comparative sequence and (iii) structure inferring methods [3] . However, these computational methods only provide approximate RNA structural models.
Proteins are an important part of nutrition (diet) to get the proper functioning of the body. Most of the dry weight of the human body and the bodies of other animals is made of protein. RNA molecules an essential ingredient to the synthesis of protein, RNA via messenger RNA (mRNA) type is transcribed from DNA and plays a central role in living cells. According to the central dogma of biology, mRNA is the intermediate carrier of genetic information between DNA and Protein Eq. 1 in a natural process called RNA interference (RNAi) that occurs to regulate the translation of genetic information into proteins. Scientists and researchers have great interest in using this process to create new medications and drugs by using Non-coding RNA (ncRNA) type. Main researches, utilize from RNAi [4] , look to discover treatment for: (i) Human Immunodeficiency Virus (HIV) that causes Acquired Immunodeficiency Syndrome (AIDS) and (ii) genital herpes virus or Human Papilloma Virus (HPV) that affects many hundreds million people worldwide by Herpes:  [5] . RNA secondary structure (determining the RNA Secondary structure) that can pair up according to the rules in WW:{(A,U),(U,A),(G,C),(C,G),(G,U),(U,G)} Watson-Crick base pairs (G ≡ C) and (A = U) and a Wobble base pair (G-U) to form a triple-, double-, or single-hydrogen bond respectively, which called valid canonical base pairs [6] . So the secondary structure of an RNA molecule is formed by base pairing between various regions of the RNA that result in a configuration of double-helical regions (stems) and single stranded loops, thus it is the collection of base pairs. Given an RNA sequence with primary structure = {A-G-G-C-C-U-U-C-C-U}, using the WW-folding to understand the RNA secondary structure, we can expect six stem loops. Figure 2 explains these six stem loops.
The thermodynamic hypothesis of the actual secondary structure of RNA sequence is the one with the Minimum Free Energy (MFE) such as the basepairs will increase the structural stability. But unpaired bases decrease that. Our goal, is to calculate the free energy of RNA secondary structure by calculate the total of the energies of all base pairs by taking account that the energy for G ≡ C, A = U and G-U are different this is summarized in equation 2 [7] ; it minimizes the total free energy: (2) (At fixed temperature + ionic concentration) Hence, the task and function of the RNA cannot be determine by secondary structure prediction alone as shown in Fig. 3; the prediction accuracy of RNA structure with the MFE method alone is usually not high, because the energy model is not accurate enough and RNA may not fold into MFE always. Also, the secondary structure of an RNA sequence must contain multiple loops to be stable in MFE. These single-stranded stem loops can be divided into two large groups: Stem-loops and pseudoknots as shown in Fig. 3b [8] . RNA Pseudoknot structure exist if the RNA 3D secondary structure contains two stemloop crossing stems or more, in fact, pseudoknots are found in almost all classes of RNA, especially in the genomes of some viruses, as a result we have to use a suitable widespread motif algorithms (strategy) for RNA structural prediction problems, this strategy should take into account the 3D RNA Secondary structure prediction with Pseudoknots and MFE (stable) and often closely related with the biological functions of an RNA sequence [4] . The tertiary structure (3D) is the complete form for RNA folded molecules enabling them to perform their functional role in the cell and is often the key to its function Fig. 3c. Generally, three-dimensional form of RNA sequences is called: 3D functional structure which characteristics are important in biology; firstly, RNA 3D structures are critical to their biological functions, secondly, RNA 3D structures properties may also help identify subsequences of nucleotides that interact with other molecules or complexes.
Consequently, in last decade, predicting the structure of RNA secondary structure prediction with simple pseudo-knots based on minimum free energy (RNA-SP based on MFE) has become biological and medical demands because RNA molecule has two important functions: Regulatory processes to the synthesis of proteins and viral replication, which it is found important in antiviral treatment design [4] .

Roadmap:
After highlighting the fundamental RNA definition, chemical structure and RNA (Primary, Secondary and tertiary 3D) structures, the basic concept for RNA secondary structure problem. Then we classify RNA methods into two groups; at first, methods that consider RNA stem-loops (w/o pseudoknots), secondly, methods for prediction RNA secondary structure with pseudoknots. Next we compare the results for the main methods. Finally, we give some concluding remarks and we present our future plan.
Problem domain: Predicting and producing RNA secondary structure from the sequence is important to understand RNA functions Eq. 3. The RNA fold recognition methods attempt to predict the accurate and more stable RNA folding structure with MFE. RNA 3D structure, in some parts, takes pseudoknots folding Fig. 4.
We will define RNA-SP with pseudo-knot as follows: • RNA sequence is viewed as a string of n characters x i = x 1 x 2 …x n where xi∈{A,U,G,C} the four bases and 1≤i≤n as shown in Fig. 4a • A single-stranded RNA secondary structure is a list of base-pairs can be viewed as a set [9] , X, form an admissible base pairs (x i , x j ) where at first, 1≤i<j≤n, secondly, j-i>t where t is a small constant, i.e. j-i ≥ 2. For all base pairs (x i , x j ) and (x i ', x j ') in X, i = i' if and only if j = j', ( i.e., such that ∀ (i, j), (i',j')∈R: i = i' ⇔ j = j' ) as shown in Fig. 4b, this means; two bases that form a pair must be located at different locations, the sequence doesn't fold too sharply on itself and each base can be paired with at most one base, respectively. Also, we allowed just WW base pairs:{(A,U), (C,G), (G,U)} • RNA include pseudoknot in X is viewed if and only if there exist base pairs (x i , x j ), (x i ', x j ')∈X (i<i') such that i<i'<j<j' (nested condition) Fig. 4c up. We can find types of pseudoknot: (simple or recursive) Fig. 4c down [9] . So, a given RNA sequence X can with maximum number of base pairs and exponential number of possible structures, Addition to the compute an RNA structure with Minimum Overall Free Energy (MFE) These complicated motifs contribute to make the general RNA secondary structure with pseudoknots prediction problem are an NP-Complete Problem, because the algorithms for solving an RNA-SP with PK prediction problem need to allow energy functions and it runs in worst case polynomial time. In fact, [9,10] proved that finding pseudoknotted RNA structure with MFE is NP-hard problem, particularly by applying the standard nearest-neighbor energy function. So, researchers of pseudoknotted RNAs are facing with three problems: First, RNA secondary structure prediction with pseudoknots is high cost computationally in run-time and memory space, which made the problem to be NP-complete problem [11] and most professional algorithms exist only for partial classes of pseudoknots, not for all kinds. Second, almost all main RNAs computational methods have been analyzed nested RNA-SP structure, either neglecting RNA pseudoknots for simplicity, or they did not know the pseudoknots side [12] . And lastly, existing RNA prediction programs are suffered from low quality and they are not very reliable.

MATERIALS AND METHODS
Overview of RNA secondary structure algorithms and methods: predicting RNA secondary structure nowadays becomes very important task in bioinformatics. Various works and many researchers made many efforts or introduced several techniques, methods and algorithms for solving RNA-SP problem, these researches can be divided into two main parts as follows: Solving RNA stem-loops group: This group of research did not consider pseudoknots in solving RNA-SP problem. For more simplicity, they neglected pseudoknots in their study for predicting RNA structure. Many methods and techniques have been implemented for solving RNA secondary structure predictions in the last three decades. Reducing run-time and space complexities and guarantying to give the MFE structure based on the free energy evaluation and thermodynamic models, but not always the lowest MFE is the correct structural RNA molecules fold. In 1978, Waterman and Smith [13,14] and Nussinov et al. [15] proposed a first simplified thermodynamic energy model using Dynamic Programming (DP) algorithms to predict RNA secondary structure. They presented DP algorithms which required O(n 3 ) run-time steps and O(n 2 ) space complexity, where n length of an RNA sequence.
Many researchers attempt to improve the DP based algorithms used in RNA secondary structure prediction [16][17][18][19][20][21] . Among these DP algorithms Zuker's Algorithm [16] is the most popular one, this algorithm explored all possible unpseudoknotted RNA secondary structure based on thermodynamic energy minimization model and required O(n 3 ) run-time and O(n 2 ) space complexities, where n is the length of an input RNA sequence. MFOLD [22] and ViennaRNA [23] packages implemented with Zuker's DP algorithm. Another approach for large RNAs was introduced by Eddy [17] used divide and conquer strategy. Eddy utilized Myers/Miller algorithm [24] , Eddy algorithm was a DP solution runs in O(n 2 logn) space complexity and made an optimal structural alignment of large RNAs with reducing the memory requirement of Stochastic Context Free Grammar (SCFG) alignments. A main Parallel DP algorithm for detecting pseudoknot-free secondary structure of an RNA molecules was introduced by Tan et al. [18] , which implemented on NUMA cluster systems by using sequential DP Algorithm and it needs Dynamic programming approaches for RNA prediction suffer from high computational running time and computing an optimal solution based on MFE in thermodynamic model. Due to these reasons many heuristic methods were proposed. STRAL was recently presented as a heuristic method for alignment of ncRNAs by Dalli et al. [19] , which is a multiple RNA alignment program that combines structural and sequence information in a 'cheap' DP Algorithms and a heuristic method for mainly alignment of ncRNAs. STRAL needs O(k 2 n 2 ) run-time and O(n 2 ) memory cost, where n is a length of RNA sequence and k is the matching bases from different two sequences, because STRAL is a heuristic method that reduces sequence structure alignment to a two-dimensional (2D) problem similar to standard multiple sequence alignment. Ideally, an ncRNAs are RNA molecules that do not code for proteins, but ncRNA are important for functional in biological processes, including localization, replication, translation, degradation and stabilization of biological macromolecules. Next, the previous Sparse Dynamic Programming (SDP) approach was used and improved from Ogurtsov et al. [20] . This was finding the optimal Multi-Branch Loop-Free (MLF) structure for evaluating and internal loops. SDP algorithm implemented in Afold tool and it has run-time of O(M*log 2 L) and work space of O(M), where M<L 2 is the number of possible nucleotide pairings and L is the length of an RNA molecule. It was improved on Lyngsø et al. [25] earlier study which time was reduced from O(n 4 )-O(L 3 ) or O(n 3 ), who used DP algorithms to find the RNA-SP with MFE and analysis internal loops.
Recently, a Co-folding DP Algorithm was developed by Ziv-Ukelson et al. [21] , that obtained runtime O(n 4 ζ(n)), where ζ(n) can converge to O(n), markedly it was developed from Sankoff's dynamic programming algorithm from [26] , Sankoff's algorithm requires O(n 6 ) time and O(n 4 ) space. And up to date, Mathuriya et al. [6] presented GTfold which is a parallel implementation multicore and scalable program for RNA-SP without Pseudoknots.
Solving RNA with pseudoknots group: All the algorithms discussed in this part consider pseudoknots in their works. In introduction, we gave a convinced reason that folding pseudoknots in RNA-SP perform essential functions in both: (i) as part from transcription machinery in cell for proteins synthesis and regulatory processes. (ii) as part from antiviral drug design because RNA activities have important results here [27] . Many researchers and study gave various techniques in RNA-SP with Pseudoknots; such as Pleij et al. [28] the first general method for Plausible RNA folding with pseudoknots, while RNA with pseudoknots noted and coined before [16,29] . Abrahams et al. [30] developed and promoted a local search method by using computer simulation. Van Batenburg et al. [31] and Gultyaev et al. [32] investigated Genetic Algorithms (GA), while Shapiro and Wu developed a parallel (GA) for detecting Hpseudoknots [33] , Lyngsø and Pedersen [10] explained that RNA-SP with pseudoknot structure prediction problem is based on difficult mathematic problems, such as NPproblem and it needs exponential time algorithms. Several earlier study introduced Dynamic Programming (DP) algorithms to find MFE structure for RNA secondary structure prediction with pseudoknots, we index them as follows: • First DP algorithm to give an optimal lowest energy prediction for RNA structure with pseudoknots called pknotsRE was introduced by Rivas et al. [34] , which is a complete model for calculating the free energy of pseudoknotted RNA secondary structure. However, pknotsRE demanded high run-time and space complexity of O(n 6 ) and O(n 4 ), respectively for RNA sequence of length n, making this algorithm infeasible to run on large RNA molecules. A pknotsRE algorithm has advantages; it considered the first one for determining the MFE and handled large two classes of RNA with pseudoknots; the arbitrary planar class and the restricted non-planar pseudoknots class • Another method considered the non-recursive class in RNA with simple pseudoknots was presented by Lyngsø and Pedersen [11] using a polynomial-time and space DP algorithm with O(n 5 ) and O(n 3 ) of time and space complexity, respectively. They then proved that predicting pseudoknotted RNA secondary structure in general is NP-hard problem. Also, in the same time a polynomial-time and space DP algorithm to compute RNA secondary structures with maximum number of base pairs with presence simple pseudoknots was designed by Akutsu [9] , cache-misses, namely cachemisses is the better cache management memory access is determined by if the accessed data block is a cache hit or a cache miss, where B is the memory block size and n is an RNA sequence length • One partition DP function algorithm called NUPACK for Nucleic Acid was transformed by Dirks and Pierce [35] . NUPACK was extended to include the most physically relevant pseudoknots for the standard secondary structure energy model, it is computing and calculating the partition function of base-pairing probabilities RNA with or w/o pseudoknots and single-stranded DNA (ssDNA) molecules and required O(n 5 ) run-time and O(n 4 ) space complexity • Many reasons leaded Pseudoknotted RNA secondary structure researchers to adopt Heuristic Approaches. These reasons that guided to go to the heuristic methods are; (i) that most of the DP methods are impractical because theirs computational high cost, they required for run-time (from O(n 4 ) to O(n 6 )) and for time space complexity (from O(n 2 ) to O(n 4 )). (ii) the practical solution needs side. These reasons guide the researchers to go to heuristic part for reducing theirs run-time and space complexities. While many heuristic approaches for predicting pseudoknotted RNA are simulate a hypothetical process of folding, the main early heuristic algorithms are presented [30][31][32] . The most popular heuristic DP algorithm one called Iterated Loop Matching (ILM) algorithm was produced by Ruan et al. [36] . It was based on stem zone developed for the Loop Matching (LM) algorithm (Nussinov et al. [15] ). ILM method can predict pseudoknotted RNA for both aligned and individual sequences and can use either thermodynamic or comparative models or both with O(n 4 ) time and O(n 2 ) space complexity. ILM is also minimizing free energy model in the average runtime of O(n 3 ) without changed in space complexity. Subsequently, HotKnots Heuristic algorithm was presented by Ren et al. [37] , which was out-performed the heuristic ILM algorithm.
Recently, pseudoknotted RNA detection Heuristic algorithm called KnotSeeker was presented by Sperschneider and Datta [27] , which was used a hybrid sequence matching and Minimum Free Energy (MFE) to obtain more accurate in RNA secondary structure with pseudoknots detection, especially for long sequences. Latest heuristic pseudoknotted RNA detection algorithm was presented by Li [38] to predict main arbitrary RNA including pseudoknots and maximize stems. It required O(n 3 ) time and O(n) space complexity and it got more improvement results in sensitivity and specificity • DP algorithm to predict RNA with simple pseudoknots based on using standard thermodynamic parameters was made by Deogun et al. [39] , it made improvement on Akutsu research [9] in worst case time and space complexities of O(n 4 ) and O(n 3 ), respectively • Extending from [34] pknotsRE Rivas work a good DP algorithm called pknotsRG-mfe was developed by Reeder and Giegerich [40] . A pknotsRG-mfe is an augmented version from pknotsRE and predicting restrict class of simple nested pseudoknotted RNA structure and provided suboptimal structures and it has reduced the run-time and space complexities to O(n 4 ) and O(n 2 ), respectively • A DP algorithm was developed by Li and Zhu [41] developed for predicting RNA including: (nested and subclass of crossed Pseudoknots) with O(n 4 ) time, O(n 3 ) space. This algorithm has same power of Rivas Algorithm [34] for predicting the planar pseudoknots and can predict more complex Pseudoknotted RNA comparing with PknotsRG Reeder Algorithm [40] , too • Pseudoknot Local Motif Model and Dynamic Partner Sequence Stacking (PLMM_DPSS) algorithm was introduced by Huang and Ali [42] . PLMM_DPSS algorithm used a modification of Needleman and Wunsch work in the DP for RNA sequence alignment algorithm [43] • An applicable DP and parallel algorithm for string problem called Cache-Oblivious (CO) algorithm was presented by Chowdhury et al. [44] , which matched good run-time O(n 4 ), made improvement in space complexities to O(n 2 ), gained better cache- parallel steps when executed in P processors, where n is an RNA sequence length, M is a cache of size and B is the memory block size. Also, Chowdhury et al. [45] presented new version of CO DP algorithm for solving RNA-SP with pseudoknots prediction, which it made improvement for Akutsu algorithms [9] in space and cache to O(n 2 ) and 4 n O( ) B M respectively, with keeping its time complexity same in O(n 4 ), where M is a cache of size, B is the memory block size, we know always n is the length of an RNA sequence • An improvement DP algorithm called Hierarchical Fold (HFold) worked by Jabbari et al. [46] , it required O(n 3 ) running time and O(n 2 ) space complexity. This approach can predict a wide range of biological MFE pseudoknotted RNA secondary structures and made a good improvement in running time;(from O(n 6 ) to O(n 3 )), for predicting MFE nested kissing hairpins from the previous well known Algorithm Rivas and Eddy [34] • A cache-efficient DP Chip Multiprocessor (CMP) algorithm was presented by Chowdhury and Ramachandran [47] , this algorithm obtained a good amount of parallelism on cache-efficient critical path. They used and combined this algorithm to serve RNA secondary structure prediction with simple pseudo-knots; they got O(n 4 ) in sequential running time, in cache-efficiency and O(n) in amount of parallel, this mean they improved in critical path length from their previous study that mentioned in number(9) [44] . Where the variables n is the length of RNA sequence, B is the memory block size

RESULTS AND DISCUSSION
Many researchers made RNA secondary structure predictions w/o pseudoknots methods to solve RNA-SP problem and their consequences were promising as demonstrated in Table 1, for the RNA secondary structure predictions with pseudoknots problem many scientists attempted to solve RNA-SP problem and also they obtained promising results as illustrated in Table 2. non-coding RNA using base pairing alignment prog. that combines in a 'cheap' DP Alg.
information in a 'cheap' DP Alg. 4 Parallel DP alg. For Load Balancing Algorithm in Cluster-O (n 4 /P) O(n 3 /P) A parallel alg. for RNA-SP RNA-SP.
based RNA secondary structure in NUMA cluster systems Prediction, Tan et al. 2005 [18] . by using sequential DP Algorithm. 5 DP Solution for large A memory-efficient dynamic -O(n 2 log n) A DP Solution to the RNA-SP RNA-SP by using programming algorithm for optimal problem for a large by using divide and conquer Alg. alignment of a sequence to an RNA divide and conquer strategy. secondary structure, Eddy 2002 [17] . 6 A method to evaluate Fast evaluation of internal loops in, O (n 3 ) -A method to find part of structure internal loops by using RNA secondary structure prediction prediction from RNA by using energy rules. Lyngsø et al., 1999 [25] . energy rules to evaluate internal loops. 7 Zuker DP Alg. for Optimal computer folding of large O (n 3 ) O(n 2 ) A DP Alg. for folding non RNA sequence with RNA sequences using thermo -pseudoknotted RNA sequence minimum energy -dynamics and auxiliary information, with minimum energy structure structure. Zuker et al., 1981 [16] . in thermodynamic model.  [36] . and individual sequences. 10 NUPACK DP Alg.
Dynamic programming algorithm O (n 4 ) O (n 3 ) --A DP Alg. For RNA-SP with simple For RNA-SP.
-recursive class from RNA with simple PKs. 14 PknotsRE The First A dynamic programming O (n 6 ) O (n 4 ) --A PknotsRE DP Alg. to predict DP Alg. to predict algorithm for RNA structure RNA that can handle a large an optimal prediction including PK's, Rivas class of arbitrary planar and RNA-PK. and Eddy 1999 [34] . restricted non-planar of special PK.

CONCLUSION
In preceding years, several challenges of bioinformatics appeared, main one is the predicting of the correct and accurate RNA secondary structure prediction with pseudoknots from primary sequence alignment. Many methods have been successfully done to solve this problem from computational side. In this study, we present the main general methods can be used for solving RNA-SP problem.
The aim research of this study primarily focuses on two features of RNA structural alignment issue: first are the methods deals with RNA folding and second is that methods solve RNA secondary structure prediction with pseudoknots problem. Hence, the RNA secondary structure problem with simple pseudoknots can be solved by using DP algorithms utilizing parallel computing platform on CMP. Thus, developing an efficient parallelization of DP algorithms with accurate method for predicting RNA secondary structure with pseudoknot will be the prominent idea for our future research.