Applications of Nature-Inspired Algorithms in Different Aspects of Semantic Web

: Nature has always inspired us all the waggle dance of Honey bee, the school of whales and the swarm of ants, each element when observed carefully has the abundance of teachings. If we carefully observe nature, we find that although Nature seems to be very simple and systematic, it hides many complexities underneath it. As technology also follows the same principle of ‘simple-yet-complex’, the researchers have always tried to apply the learning from Nature to complex technological Algorithms used to solve few real life human problems. Since the past decade, there has been a rapid increase of research in this field. Today Nature Inspired algorithms have permeated into almost all areas of sciences. Although it had been applied to various areas of sciences, the scope of this paper is limited to its application in the domain of The Semantic Web. The main objective of Semantic web applications is to obtain, manage and utilize the huge amount of information that is available in either structured semistructured or unstructured databases in distributed environment. This is an emerging domain and is advancing towards more and more intelligent and human oriented applications. This paper presents a survey of vital nature-Inspired techniques that can be used for optimizing various areas of Semantic web applications such as knowledge base, content filtering, Information Retrieval and Inference mechanism.


Introduction
Nature Inspired Computing is an alliance of various loosely coupled subfields that showcase some kind of social behavior and imitates the natural behavior found in small entities like honey bees, ants, fishes etc. It is a multidisciplinary field which has in it the traits of domains like biology, mathematics, Artificial Intelligence and machine learning etc. The idea behind the origin of this field is that intelligence not only exist in humans but also in cells, bodies and societies of tiny living beings and these simple yet complex behaviors can be applied to few complex real world problems that are hard to solve otherwise. Natural system has many powerful capabilities like Self Organization, decentralization, learning while doing (adaptability), strong communication etc. These capabilities when applied to any algorithm make it more scalable, reliable and efficient while keeping it simple.
Nature Inspired Algorithms also known as stochastic algorithms are classified under two techniques; heuristic and meta-heuristic (an advance version of heuristics algorithms only) where heuristic means 'to find' or 'to discover using trial and error' and Meta means 'beyond' or 'higher level'. These categories of algorithms can perform better where quality solutions are to be found in reasonable amount of time. By applying these algorithms one can find a good solution in the reasonable amount of time which may or may not be optimal (Yang, 2010). At this point, it should be made clear that in literature there is no exact definition for Nature Inspired Meta-heuristic algorithms. However, all stochastic algorithms which require randomization and local search fall under this category.
Meta-heuristic algorithms can be classified into population-based and trajectory based. The algorithms which make use of multiple agents or set of strings can be classified under population-based algorithms and the algorithms which use a single agent or solution which roams in the design space in piecewise style for a better solution falls under the category of trajectory-based algorithms. Few nature inspired algorithms are Artificial Neural Networks, Genetic Algorithms and Swarm Intelligence.
Nature Inspired Algorithms can be implemented in many domains but the scope of this study is limited to Semantic Web domain. The Semantic web is an ever changing domain rather than a static entity. In 2001 the World Wide Web formed a consortium with the objective to enhance the current web (Sharma, 2016), in which the information was given a well-defined meaning in order to make the system more cooperative and simple for humans and machines to understand. Semantic Web follows a layered architecture where each layer is assigned a special role and it makes full use of the capabilities of the layer below it (Berners- Lee et al., 2001). The bottom most layer ( Fig. 1) is the Unicode and URI layer this layer is responsible for the unique identification of the physical entities. The XML layer provides the schema definition and integrates the various XML standard documents across the web. RDF is the data modeling language which provides relationships between various physical objects using the URI and in the form of triples. The triple is a combination of Subject, Predicate and Object. The RDFS and the OWL Ontology layer are used to define the Vocabularies where additional information can be added to the triples for the more clear description of the objects (Auer et al., 2007). For support of inference mechanism, additional rules can also be added to these ontologies. The SPARQL the RDF query language is used to answer the queries of various users.
The Semantic web domain is growing at a rapid pace and presents some difficult challenges and also various research opportunities (Höffner et al., 2017). This paper is an attempt to present the research work done by various researchers to obtain a reasonable solution for some of the difficult problems, which includes Ontology Management, Information Retrieval and Knowledge Extraction, Ontology Mapping, Semantic web reasoning, Load Balancing strategies and web allocation methods. The structure of the paper is as follows. Section 2.1 presents few applications of Nature Inspired algorithms in Semantic Web domain; Section 2.2 reviews the application of Artificial Neural Network (ANN) in The Semantic Web domain. Section 2.3 showcases how Genetic Algorithms can be used to find solutions for Ontology Alignment and Knowledge extraction. Section 2.4 presents the working principles of Ant Colony Optimization in the area of Semantic Web Reasoning and also states that there is a dire need of producing new reasoning algorithm based on Particle Swarm Intelligence. Section 2.5 presents how Neuro Fuzzy techniques can be used to provide a solution in the Semantic web. Section 3 summarize the efforts and also share the implications for further development.

Literature Review
The characteristics of Semantic web applications and all of its associated problems are absolutely those that can be addressed using the Nature Inspired algorithms. The evidence to support the above claim is provided here with an objective to analyze the key processes required for building semantic web applications. Table 1 presents an overview of the key processes along with some conventional algorithms required to process those.

Nature Inspired Algorithms and its Applications in Semantic Web
In this section, the characteristics of various Nature-Inspired Algorithms are discussed along with their applications in Semantic web Domain. Nature Inspired algorithms are enthralled by the social demeanor of physical entities like ants, honey bee, birds and insects. These algorithms find the answers to the hard problems in polynomial time but do not guarantee the optimal solutions. These algorithms have the capabilities to find the solution to the unanswered problems in semantic web domain and deal with the abundance of data scattered across the internet and thus build highly scalable applications. The next section describes that how the working principles of Artificial Neural Networks are applied in the Semantic Web Domain.

Artificial Neural Network (ANN) and the Semantic Web
The information processing capabilities of the artificial neural network are highly inspired by the basic principles of the biological system like the brain. The ANN mimics the working model of a brain which takes the weighted inputs, process it and if the results are significant then they fire the output. Few algorithms which are based on Artificial Neural Network are Hopfield algorithm; Kohonen Self Organizing Maps; Multilayer Perceptron; Back propagation Learning. These algorithms are used for classification, optimization, clustering and decision making. In semantic web these algorithms can be used in many aspects, few of them are discussed as follows.

Optimization of Load Balancing Strategies
The ANN is used for balancing a load of few heavily trafficked websites by allocating the web pages to the closest possible web server. Phoha et al. (2002) proposed a web page allocation algorithm based on ANN. In this algorithm, each server acted as a processing node and was ready to handle the object request. The requested web page was allocated to the server which was close to that object.

Content Filtering and Classification
The ANN algorithms can also be used for classifying the web pages in categories like Audio and Video. These algorithms can also be used for filtering the pornographically web content by blocking these sites. One such algorithm was developed by Lee et al. (2002). The designed classification engine makes use of neural networks' learning capabilities to classify the pornographic content with the non-pornographic web pages.

Ontology Management
The Ontology management consists of several sub tasks like ontology categorization, its classification and matching certain parameters with respect to a given task. This task requires human intervention and is very tedious. For Ontology matching we classify ontology across two dimensions the Schema Based and the Instance based. The role of ANN here is to cluster the inputs into a given schema-level and instance-level information; it is at times useful to cluster the inputs into classes in order to reduce the computational complexity for further updating across data. Doan et al. (2004) developed a GLUE system where the ANN was used to create semi automatic mapping among ontologies to find the relation between them. In ontology mapping, the major challenges are; to achieve semantic interoperability in building web applications (Djeddi and Khadir, 2013), to find semantic relationships between similar elements of different ontologies. Mao et al. (2010) in their study has proposed a novel, universal and robust ontology mapping technique called the PRIOR+. This technique was based on propagation theory, IR and AI.

Information Retrieval and Knowledge Extraction
Web is an ocean of information and in semantic web domain not only information has to be extracted but also we need to retrieve the knowledge. Caliusco and Stegmayer (2010) discussed a novel approach which defines a Knowledge Source discovery (KSD) agent for finding the appropriate node for query answering and uses the ANN-based supervised learning for ontology matching and information retrieval. Cerón-Figueroa et al. (2017) the authors have introduced a new model for ontology matching in an educative domain which has improved the homogeneity of resources in e-learning.

Security
Security plays a vital role when it comes to the web. Hackers today are more interesting in stealing data and valuable information through attacks like SQL injection. These attacks may lead to the damage of client server, stealing of valuable information and circumvent the authentication process. Many ANN based algorithms are used to avoid such vulnerabilities. Coleman et al. (2007) optimized security level of the web by using ANN based encryption and decryption strategies. Moosa (2010) developed a firewall named ANNbWAF with the purpose to watch such attacks. In this approach, a trained ANN is embedded in the firewall applications where the normal and malicious data is used to give training to the neuron (Sajja and Akerkar, 2013). Many researchers have also designed ANN based algorithms for intrusion detection and proper authentication.
The next section highlights how Genetic Algorithms can be used to optimize various functionalities of the Semantic Web.

Genetic Algorithms and Semantic Web
Genetic Algorithm (GA) can be defined as a heuristic search algorithm based on the concept of Natural selection. GA is based on the 'Survival of Fittest' approach and used whenever there is a large and complex search space, domain knowledge is rarely available and expert knowledge is hard to code (Coello et al., 2007). In GA the solution is encoded using chromosomes which are represented using alphabet and symbols. These genes are divided into traits called genotype and phenotypes. Much like the natural evolution process, these genes form initial population which is apprised by using a fitness function and as according to the survival of the fittest principles the poor genes die and are removed from the population. The stronger genes repeat the process by applying operators like crossover and mutation and a new set of a population is generated. GA consists of many characteristics like they can very easily shrink the search space; solve many hard problems where there is no traditional solution available by their intelligent behavior. In Semantic Web where there is a large pool of available data that too in heterogeneous sources query answering is an open challenge and many researchers have used GA based algorithms to find the solution to this. These algorithms are used to find the optimized query path which in turn determines the strategies for query execution. If the query paths are optimized then definitely the query will be executed in less time. Alippi et al. (2009) have also used the Genetic Algorithms for the discovery of multi relational association rules in semantic web.
Below are some of the areas where GA has been successfully tested and implemented. Hsinchun et al. (1998) utilized GA to develop a personalized search agent. Their results proved that GA can avoid the search agents from being captured in local optima and thus can improve the quality of web search. Multimedia content can also be annotated and retrieved efficiently using GA. Infospider developed by Menczer et al. (2004) is another multi agent tool used to perform a dynamic web search. This tool uses both Genetic Algorithm and ANN. Pant and Menczer (2002) implemented GA to manage the initial population for autonomously surfing the web. The tool in this case, was named as MySpiders. In this tool, every agent works as a client motivated by the linking of certain clues in already crawled pages. The clues here are the already crawled links near to a required source. This tool is publically available as a java applet. Yohanes et al. (2013) also implements GA for web crawling and finds the requested web pages. They also proved that GA is better than the traditional crawling methods (Sajja and Akerkar, 2013). Dounias et al. (2006) have designed a hybrid technique for image processing and analysis by use of Genetic Algorithms. In this approach, they have firstly applied the segmentation which generates partitions and then fuzzy relations are extracted for the generated segments (Alippi et al., 2009). Wang et al. (2006) developed a solution for ontology mapping. This approach was based on feature extraction process. In semantic web ontology creation, management, alignment and integration are the few challenging task. Martinez-Gil et al. (2008) proposed Genetic algorithm based approach for alignment of ontology (GOAL). This approach was able to calculate the optimal ontology alignment function for a given input. This approach also maximized the precision of alignment. The initial population consists of input ontologies. Mutation and crossover on these trees can be carried out to evolve new ontology. Naya et al. (2010) devised a novel approach where they used the crossover and mutation operators on the input (Ontology set) which gave birth to a new ontology. They also used genetic algorithm for encoding and alignment of the ontologies. Rachlin et al. (1998) presented A-teams algorithm. The outcome of this research was the agent based system which automatically generates sequential, parallel and synchronized Semantic Web services.

Agent-Based Automatic Generation of Semantic Web Services
Section 2.4 showcases how Swarm Intelligence based algorithms can affect the working of The Semantic Web.

Swarm Intelligence and Semantic Web
From evolution period itself, the biological entities work on the principles of self-organization which shows the capabilities of solving complex problems through communication between the group members for their survival. They exhibit properties of information sharing and communication, their collective behavior to achieve goals and their ability to form colonies which are highly secured. Few very popular examples of the same are honey bee societies, ant colonies, school of fish and flock of birds. Swarm Intelligence (SI) is a discipline based on the principle of social interaction between live entities. These entities are represented as agents/swarms (Sajja and Akerkar, 2013). Therefore, SI is defined as collective behavior of the groups of agents communicating locally with the environment resulting in global patterns. Few popular methods which are based on the principle of swarm intelligence are ant colony optimization and particle swarm optimization. Researchers have done a lot of work in the domain of semantic web reasoning using ant colony optimization method. The semantic web works on the resources which are distributed and dynamic in nature. In the next section few areas are defined where Swarm Intelligence methods have been used:

RDF Graph Traversal and Semantic Web
SI is used for RDF Graph Traversal. Few key properties of swarms are that they are adaptive, robust and scalable. They work on three concepts no central control, their locality and simplicity. SI is also used for optimizing the reasoning performance. The role of SI is to reduce the computational cost of traversing the distributed RDG graph in order to calculate the closure with respect to the RDF semantics. In order to calculate the semantic closure of the RDF Graph a set of rule is to be applied on the triples repeatedly. These rules are represented by small live entities called ants which are partially instantiated. These live entities communicate with each other only locally and indirectly. Whenever the condition of a rule matches the node an ant is fired and it locally adds the newly derived triple to the graph. Because of some transition capabilities between the graph boundaries, this method converges toward the closure. Dentler et al. (2009) described the use of ant colony optimization for RDF graph traversal. This index-free methodology is obtained because of the by self-organizing principles swarms, these light-weight entities traverse RDF graphs by following certain paths with the objective to instantiate pattern-based inference rules. Wu and Aberer (2003) used SI to create a model for the dynamic interactions between web servers and users for web pages rankings.  designed and implemented "Divon," a swarm that emulates a user profile driven approach for Semantic Web information presentation. Wang et al. (2012) implemented ACO for automatic composition of Semantic Web services. The ACO algorithm is used in many different aspects of the semantic web like web page classification, content mining and also for organizing the web content dynamically. Rana (2011) described ACO based algorithm for searching resources in unstructured ants-based control. Rana et al. (2012) proposed a query interface for Semantic Web using ant colony algorithm.

Ant Colony Optimization and Semantic Web
Although a lot of research has been carried in this direction using the Ant Colony optimization algorithm still there is the lot of scope for the researchers to use Particle swarm optimization method in different areas of the semantic web. One such area could be optimizing the reasoning through RDF Graph traversal using Particle Warm Optimization method (PSO).
Section 2.5 discusses the applications of Neuro-Fuzzy algorithm in various areas of the Semantic Web.

Neuro-Fuzzy Algorithm and Semantic Web
To show the working of Nature Inspired Algorithm in the field of Semantic web one has to adopt the approach for hybridization. Using this approach the Fuzzy Logic (FL) and ANN technique are integrated for optimizing various areas of the Semantic Web.

Web Content Filtering
This process of web filtering is carried in the following manner, in the starting phase the web publisher provides some set of specifications or metadata for the webpage itself. This metadata restricts the access of the web page to the selected audience (Spivack et al., 2008). A list of some Blocked URLs is also provided along with. The URL is checked with this list before displaying the page to the user. Such lists are popularly known as black lists. The content filtering in Semantic Web is done by matching the website keywords or metadata and then considering the frequency of such items. If the harmful word appears on the page, content will be blocked. In the hybrid approach, both the techniques are used. The use of Neuro-fuzzy model is to filter out the content in an intelligent manner and the fuzzy logic deals with the user vague information which is about choices, preferences and interests.

Discussion, Conclusion and Future Work
The semantic web is an addition to the current web where information is given a specific meaning. This is a domain where researchers are building intelligent websites which are interpretable by both humans and machines. Here, the machines have the capabilities to intelligently process the information if the necessary semantics are attached with. Therefore semantic web has the capability to share and reuse the data across various applications. By processing this information a huge amount of knowledge can be generated. In this study, various Nature-inspired Models has been presented which addresses the emerging Semantic Web Problems as depicted in Table 2.
These algorithms have been successfully implemented by various researchers in different aspects which includes Content Filtering, Ontology Management and semantic web reasoning. A lot of research has already been carried in this direction but still, there are few untouched areas. One such area is the use of Particle Swarm Optimization algorithm on semantic web reasoning which is the future scope of research in this direction. Tasks Nature Inspired Algorithms/tools Querying and 1) KSD-An ANN based method information surfing for Information retrieval.
3) Divon -A SI based emulator for information retrieval and presentation. Ontology 1) GLUE, PRIOP+ -An ANN based management tool for Ontology Mapping.
2) GOAL-A GA based tool for Ontology Alignment in semantic web. Entailments An Ant colony based optimization techniques proposed by (Dentler et al., 2009) for RDF graph traversal and entailments in semantic web.