Towards Semantic User Query: A Review

Corresponding Author: Hui-Hui Wang Department of Computing and Software Engineering, Faculty of Computer Science and Information Technology, Universiti Malaysia Sarawak, Kota Samarahan, Sarawak, Malaysia Email: hhwang@unimas.my Abstract: This paper attempts to discuss the image query mechanisms and user needs for image retrieval. The explosive growth of image data leads to the need of research and development of Image retrieval. Image retrieval researches are moving from keyword, to low level features and to semantic features. Drive towards semantic features is due to the problem of the keywords which can be very subjective and time consuming while low level features cannot always describe high level concepts in the users’ mind. This paper also highlights both the already addressed and outstanding issues.


Introduction
An image can be interpreted and understood differently by different users (Yoon, 2011). It is important to understand the context of the users in an image seeking process for designing an image retrieval system (Choi, 2010). In some situations, users can imagine what they desired. However, they are unable to express their desire in precise wording (Datta et al., 2008;Zha et al., 2010). In such situations, the difficulties of the user are the query formulation problem, whereby they could not formulate and communicate the needed information effectively (Urban and Jose, 2006). Thus, understandings of the way the users express their search needs, which may lead to the provision of appropriate access points to visual materials for retrieval, remain important (Choi and Tasmussen, 2003).
Query mechanisms play an important role in bridging the semantic gap between users and image retrieval (Gudivada and Raghavan, 1995). The user query is used to express the user's intention and need to retrieve images from the collection of database that conform to human perception. The image search results and their accuracy are directly affected by the submitted queries (Zha et al., 2010). Researches have been focusing on reducing the semantic gap in image retrieval, however in order to define a semantic meaning and representation of the input query that can precisely describe the intent of the user queries as well as the domain coverage are the major challenges (Hu et al., 2009;Hua et al., 2013). An image retrieval system should have an effective user interface that allows users to precisely express their need and intent for searching the images that are conformed to human perception (Hu et al., 1999). To help users find the desired images, image search has been intensively studied (Niblack et al., 1993;Flickner et al., 1995;Lew et al., 2006;Hua et al., 2013).

Evolution of User Query
Researchers have proposed a variety of ways to search images based on their intent from large image databases. Mostly, it can be classified into three levels (Eakins et al., 1996;Eakins, 2000). Each level represents different level of abstraction as illustrated in Fig. 1.
Level 1 comprises of retrieval process by primitive features, such as colour, texture, shape or the spatial location of image elements. For example, find pictures of an orange circle. At this level, image is retrieve directly derivable from the images themselves, without the need to refer to any external knowledge base.
Level 2 comprises of retrieval process by derived features, involving some degree of logical inference about the identity of the objects depicted in the image. For example, finding pictures of an object ('bus'). At this level, reference to some external knowledge base is normally required.
Level 3 comprises of retrieval process using abstract attributes and involving a significant amount of high-level reasoning about the meaning and purpose of the objects or scenes depicted. For Example, find pictures of a 'happy and cheerful girl'.

Fig. 1. Different level of abstractions
During the 1990s, Content-Based Image Retrieval (CBIR) has been introduced by providing user query by sample image besides using simple keywords. Initially, simple keywords have been used as image descriptors to index an image in initial CBIR. Texts are used to describe the content of the image which often causes ambiguity and inadequacy in performing an image database search and query processing. This problem is due to the difficulty in specifying exact terms and phrases in describing the content of images as the content of an image is much richer than what any set of keywords can express. Since the textual annotations are based on human language, variations in annotation pose challenges to image retrieval.
Besides, user query by providing the sample image and the visual features for CBIR system are extracted from the image itself. Visual features including color, texture, shape and spatial relations are used. Excellent survey on low-level image feature extraction in CBIR system can be found in (Wang et al., 2006). Although there are many sophisticated algorithms to describe color, shape and texture features approaches, these algorithms do not satisfied and comfort to human perception. Main reason is due to the unavailability of low level image features in describing high level concepts in the users' mind. An example is the task of finding an image of a car in the middle of the road. The only way a machine is able to do this is by automatic extraction of the low level features that are represented in the low level features from the images with a good degree of efficiency.
Thus, semantic gap issue where the lack of correlation between the semantic categories that a user requires and the low level features that CBIR systems offer has been explored. The semantic gap issue has been investigated for years but still remains a big challenge. This is due to the fact that the visual image feature descriptors extracted from an image cannot (as yet) be automatically translated reliably into high-level semantics (Hu et al., 2009).
In 2000s, semantic based image retrieval has been introduced. This is due to neither a single features nor a combination of multiple visual features could fully capture high level concept of images. Besides, the performance of image retrieval system based on low level features are not satisfactory, there is a need for the mainstream of the research converges to retrieve based on semantic meaning by trying to extract the cognitive concept of a human to map the low level image features to high level concept (semantic gap). In addition, representing the image content with semantic terms allows users to access images through text query which is more intuitive, easier and preferred by the front end users to express their mind compare with using images.
In order to improve the effectiveness and accuracy of CBIR systems, research direction has shifted from the designing of sophisticated low-level feature extraction algorithms to reducing the 'semantic gap' between the extracted visual features and the richness of human semantics (Vijay Kumar et al., 2012). It's actually moving from the user query level 1 toward level 3.

User Intent
The concept of "intentions" is context-dependent, hence, hard to measure explicitly. A substitute of measuring user intentions is to study user search queries and browsing behavior for text-based search (Broder, 2002;Jansen et al., 2008;Yin and Shah, 2010;Kumar and Tomkins, 2010) and image search (Kofler and Lux, 2009;Lux et al., 2010).

Intention Gap
The aim of CBIR systems is to provide maximum support in reducing the semantic gap between the simplicity of extracted visual features and the richness of the user semantics (Smeulders et al., 2000). The intention gap exists in CBIR with query-by-example between user's search intention and the query  which often leads to unsatisfactory search results (Zha et al., 2010). Intention gap may due to query image consists of many regions or components that are not part of user's interest or fails to capture important properties of the intended objects. Ability to select region-of-interest in a query image potentially solved the former  while query suggestion solved the later. User feedbacks are also widely exploited at inferring user search intent Yuan et al., 2011).

Towards Semantic User Query
Querying by visual example (Cox et al., 2000;Kelly et al., 1995;Tieu and Viola, 2000) is a paradigm that is used to express perceptual aspects of visual features of image content (Kelly et al., 1995) and it is refer to query level 1. The queries are based on colour, texture and shape features that can be formulated depending on the feature (s) extracted during image extraction stage. The user is required to select a candidate image through the query interface and the system converts the image into visual features for image representation. Finally, a list of similar images will be retrieved based on the image similarity criteria. Efforts have been made to extend the query by visual example to query by region selection (Chang et al., 1998;Carson et al., 1997) and sketch (Kato et al., 1992;Daoudi and Matusiak, 2000;Chans et al., 1997;Lai, 2000;Egenhofer, 1996). Users are allowed to select their "region of interest' in the image or draw their desired content. Query by sketch allows users to sketch the desired image with image editor software. The image properties of colour, texture, shape, sizes and spatial locations of the desired objects can be specified.
This search method often fails to capture similarity that could be inferred by humans, a phenomenon that is now commonly referred to as the semantic gap (Smeulders et al., 2000). Furthermore, in many cases locating a suitable example for a search may be a difficult task in itself (Rodden, 1999).
User usually prefers using keywords to indicate what they want (Khefi et al., 2004;Fu, 2007). Researchers believe textual queries usually provide more accurate description of users' information needs as it allows users to express their information needs at the semantic level and high level abstractions instead of limiting to the level of preliminary image features. Many popular search engines (e.g., Google, Yahoo) have developed technologies that allow users to search the Internet's images using keywords.
A retriever must represent the target image with verbal keywords in an image retrieval system which uses a keyword interface. Interpretation from an image to verbal keywords is difficult for humans and even if they can find the keywords, they have difficulty in finding an appropriate image when they have not been annotated or annotated with various keywords in the system. One example is that Google search engine offers the possibility to search for images using surrounding text and file names instead of image semantic meaning. So, database images that are not annotated with related keywords or not similar to the searching keyword will be not retrieved as similar images. That is why sometimes the search performed does not lead to satisfactory results.
This keywords image search also works well only when all the images are annotated with accurate textual information. Also, there are times and situations when users can imagine what they desire, but are unable to express this desire in precise wording (Niblack et al., 1993;Zha et al., 2010;Gerrig and Zimbardo, 2001). Moreover, keyword queries are usually ambiguous especially when they are short (one or two words) and thus cannot reflect users' intents precisely (Gerrig and Zimbardo, 2001).

User's Needs
An image can be interpreted and understood differently by different users (Yoon, 2011). It is important to understand the context of the users in an image seeking process for designing an image retrieval system (Choi, 2010). In some situations, users can imagine what they desire. However, they are unable to express their desire in precise wording (Datta et al., 2008;Zha et al., 2010). In such situations, the difficulties of the user are the query formulation problem, whereby they could not formulate and communicate the needed information effectively (Urban and Jose, 2006). Thus, understandings of the way the users express their search needs, which may lead to the provision of appropriate access points to visual materials for retrieval, remain important (Choi and Tasmussen, 2003).
Users often have a difficult time and unable to clearly articulate their information needs (Kelly and Fu, 2007;Belkin, 2000) and that users typically pose very short queries, usually between two and three words in length (Jansen et al., 2000). The short queries are due to the difficulty that users have in identifying and articulating their information needs and also conventional search interfaces encourage them to do so (Belkin et al., 2003).
Besides, with the different query language and the language associated with the images will also cause the inaccurate retrieval results (Ménard, 2011). The keyword queries are usually ambiguous especially when they are short and thus limited in reflecting users' intents precisely (Zha et al., 2010).

The Trend
Nowadays, the mainstream of image search techniques converges to retrieval that focuses on extracting and understanding user's search intent and the search query. The methods towards semantic user query are categorised into the following classes.

Relevance Feedback
Relevance Feedback (RF) is the key technique to reduce the intention gap in CBIR by exploiting user interactions (Zhou and Huang, 2003). The presence of the intention gap actually limits the understanding of user search intent by CBIR system, where the results from automatic retrieval system often do not satisfy users' information needs. Therefore, the practical way to identify the user search intent is to include the user into the retrieval process. Retrieved images are flagged as being either "relevant" or "irrelevant" or multi-relevance levels (Yang et al., 2002;Wu et al., 2004;Yuan et al., 2011). Users' feedbacks are then exploited by a relevance feedback algorithm to refine the search model. Through iterative interactions, relevance feedback attempts to capture users search intent and improve the search results. An interactive scheme, attribute feedback allows user to provide feedbacks on semantic features in order to capture user's search intent Cai et al., 2012).

Query Suggestion
One of the solutions to fill the intention gap is query suggestion, which allows interaction between users and search engines. A list of queries based on users' inputs is used to help users narrow down to precise queries. Most works focused on learning query suggestion model from user search queries logs or textual query suggestion (Baeza-yates et al., 2004;Mei et al., 2008;Carpineto and Romano, 2012). However, users usually perceive information in image more quickly than textual suggestions. Therefore, combination of visual and textual query suggestion can express the search intent better than using text suggestion alone.
Query suggestion for CBIR with query-by-example only has gained interest recently (Zha et al., 2010;Bian et al., 2012). The technique firstly provides a list of both representative images and keywords from user's initial textual query. User selected keyword-image from the list provided are used to refine the initial search results which helps user formulate a more precise search intents.

Interactive Interface
The Intention gap is closely related to user interface design of the image search to effectively express what is in the user's mind. Database researchers are concerned with indexing and querying images, whereas image processing experts tend to worry more on extracting the appropriate image descriptors. However, researchin providing user interface support for CBIR systems is scarce (Pečenovió et al., 2000;Santini and Jain, 2000;Nazakato et al., 2003;Heesch, 2008) until recently. Search engines designed to bridge the intention gap tend to integrate textual (keywords or tags) and visual features (Chen et al., 2013;Zhu et al., 2014;Craggs et al., 2014).
Friendly user interface not only means the interface should be as easy as possible to use, but also indicates that it can satisfy user's various searching needs. The easiest way to search similar images is different under different searching scenarios or users' searching needs; therefore, the corresponding user interfaces should be designed to fit users' various searching needs (Zhang, 2010). For image databases user, the ability to retrieve images based on their semantic content is important (Eakins et al., 2004). Besides, the user interface created for image searching may need to provide additional functionality to embrace the rich descriptions of the image, as opposed to the limited information provided by the usually short queries. The image attributes might be employed in user query. The fundamental idea of emergent image semantics is reflected in image searches. During the search, based on the previous result set, the user guides his/her queries towards the intended group of images. Besides this, the user could also judge the relevance of the images based on the features that differentiate these images (Westman, 2009).
An interactive and multi-modal mobile visual search application has been presented . A real-world image search prototype has been implemented where query refinement suggestion are derived from relevance feedback in an image retrieval system (Leiva et al., 2011).

Miscellaneous Techniques
Other techniques could have potential in inferring semantic user query. A contextual object retrieval model employing probabilistic reasoning has been proposed . The models are estimated using the visual words from both the region-of-interest and the visual context. These visual words are weighted using the search intent scores estimated using probabilistic techniques where the score indicates how likely the image region represented by the visual word reflects the search intent of the user. Significant performance improvement of QBE-based web image retrieval integrating global and local query context is reported (Yang et al., 2012).
An image knowledge base employing a graph representation that reflects the distribution of both images on the web and users' interests has been presented (Wang et al., 2012). The image knowledge base is automatically constructed via both bottom-up and top-down approach that matches billion of web images onto an ontology of human knowledge. This image knowledge base can be used to order images of high quality and relevant to user interest and hence, more accurate images are returned for a user query. A similar system to build a large visual knowledge base automatically is reported .
Recently, exploratory search which combines browsing and searching are gaining interest (Lu et al., 2014). The system developed allows users to specify object of interest by circling the visual objects while being browsed and the system automatically compiles the selected visual entities to represent users' underlying intent. The visual entity is scalable to textual context and visual attributes, hence, aid users gaining additional knowledge and form more accurate queries.
Issues Success in answering queries at level 3 in Fig. 1 required some sophistication on the part of the searcher. Complex reasoning and often subjective judgment, can be required to make the link between image content and the abstract concepts it is required to illustrate. Queries at this level, though perhaps less common than level 2, are mostly encountered in both newspaper and art libraries. Recently researchers are focus on the user query at level 2 whereas it's only able to find and derived the semantic features of images instead of semantic concept. They ignore a further type of image query-retrieval by associated metadata such as, who created the image, where and when. This is not because such retrieval is unimportant. It is because (at least at present) such metadata is exclusively textual and its management is primarily a text retrieval issue. For example, find images of a happy and cheerful girl (level 3) instead of finding images of a girl (level 2). A user study confirms that users have difficulties in selecting search terms and finding images which illustrate abstract concepts (Besseling, 2011).
Even though the image retrieval research is moving towards semantic concept, much initial research in semantic image retrieval is focusing just on the simple semantic retrieval such as retrieval of objects of a given type. Little attention is on the retrieval by abstract attributes which involve a significant amount of highlevel reasoning about the meaning and purpose of the objects or scenes depicted. In other words, the retrieval by abstract attributes was still not satisfied to human perception. Moreover, the retrieval involved human interference and is time consuming besides inconsistency. There is a need to further increase the confidence in image understanding and to effectively retrieve similar images that are conformed to human perception and without human interference.
There are still some spaces which need to be improved besides the challenges that are associated with mapping low level to high level concepts. Researchers are moving towards to intelligent image retrieval that are also supports more abstract in concept by understanding the image content in terms of high level concepts, which is closely related to the problem of computer vision and object recognition besides more intelligent system. The domain should be not specific but broad where all the extracted semantic features are applicable for any kind of images collection.

Conclusion
This paper provides a study of image retrieval work towards semantic based image retrieval. Recent works are mostly lack of semantic features extraction and user behaviour consideration. Therefore, there is a need of image retrieval system that is capable to interpret the user query and automatically extract the semantic feature that can make the retrieval more efficient and accurate to bridge the semantic gap problem in image retrieval.