Please login to be able to save your searches and receive alerts for new content matching your search criteria.
We design, implement and assess in this paper a new architecture of a possibilistic mono- and cross-language information retrieval (IR/CLIR) system. The latter is useful to experiment query disambiguation, expansion and translation processes in both IR and CLIR frameworks. We take advantage of possibility theory to overcome the problems of query disambiguation and expansion in an uncertain and imprecise IR/CLIR context. We investigate the impact of combining possibilistic query disambiguation with expansion on IR/CLIR efficiency. A co-occurrence graph representation is exploited to quantify the similarity between query terms and their semantically close words (expansion task) or between query terms and their possible meanings (disambiguation task). We extend the possibilistic mono-language query disambiguation approach to a cross-language framework. We conduct a set of experiments using ROMANSEVAL data collection, the French–English parallel text corpus Europarl and the CLEF-2003 French–English IR/CLIR test collection. Results highlight some statistically significant improvements of our possibilistic approaches when compared to some state-of-the-art IR/CLIR works.
User intent analysis is a continuous research hotspot in the field of query expansion. However, the big amount of irrelevant feedbacks in search log has negatively impacted the precision of user intent model. By observing the log, it can be found that tentative click is a major source of irrelevant feedback. It is also observed that a kind of new feedback information can be extracted from the log to recognize the characteristics of tentative clicks. With this new feedback information, this paper proposes an advanced user intent model and applies it into query expansion. Experiment results show that the model can effectively decrease the negative impact of irrelevant feedbacks that belong to tentative clicks and increase the precision of query expansion, especially for those informational queries.
The rapid growth of contents on the Web in different languages increases the demand of Cross-Lingual Information Retrieval (CLIR). The accuracy of result suffers due to many problems such as ambiguity and drift issue in query. Query Expansion (QE) offers reliable solution for obtaining suitable documents for user queries. In this paper, we proposed an architecture for Hindi–English CLIR system using QE for improving the relevancy of retrieved results. In this architecture, for the addition of term(s) at appropriate position(s), we proposed a location-based algorithm to resolve the drift query issue in QE. User queries in Hindi language have been translated into document language (i.e. English) and the accuracy of translation is improved using Back-Translation. Google search has been performed and the retrieved documents are ranked using Okapi BM25 to arrange the documents in the order of decreasing relevancy to select the most suitable terms for QE. We used term selection value (TSV) for QE and for retrieving the terms, we created three test collections namely the (i) description and narration of the Forum for Information Retrieval Evaluation (FIRE) dataset, (ii) Snippets of retrieved documents against each query and (iii) Nearest-Neighborhood (NN) words against each query word among the ranked documents. To evaluate the system, 50 queries of Hindi language are selected from the FIRE-2012 dataset. In this paper, we performed two experiments: (i) impact of the proposed location-based algorithm on the proposed architecture of CLIR; and (ii) analysis of QE using three datasets, i.e. FIRE, NN and Snippets. In the first case, result shows that the relevancy of Hindi–English CLIR is improved by performing QE using the location-based algorithm and a 12% of improvement is achieved as compared to the results of QE obtained without applying the location-based algorithm. In the second case, the location-based algorithm is applied on three datasets. The Mean Average Precision (MAP) values of retrieved documents after QE are 0.5379 (NN), 0.6018 (FIRE) and 0.6406 (Snippets) for the three test collections, whereas the MAP before QE is 0.37102. This clearly shows the significant improvement of retrieved results for all three test collections. Among the three test collections, QE has been found most effective along with Snippets as indicated by the results with the improvements of 6.48% and 19.12% over FIRE and NN test collections, respectively.
Performance of information retrieval (IR) systems greatly relies on textual keywords and retrieval documents. Inaccurate and incomplete retrieval results are always induced by query drift and ignorance of semantic relationship among terms. Expanding retrieval approach attempts to incorporate expansion terms into original query, such as unexplored words combing from pseudo-relevance feedback (PRF) or relevance feedback documents semantic words extracting from external corpus etc. In this paper a semantic analysis-based query expansion method for information retrieval using WordNet and Wikipedia as corpus are proposed. We derive semantic-related words from human knowledge repositories such as WordNet and Wikipedia, which are combined with words filtered by semantic mining from PRF document. Our approach automatically generates new semantic-based query from original query of IR. Experimental results on TREC datasets and Google search engine show that performance of information retrieval can be significantly improved using proposed method over previous results.
To make the code search (CS) become more effective, a novel query expansion with intents (QEI) is proposed, in which the intent refers to the common subsequent modifications of the search results. The intent is extracted from the modification history. Within the intent scope, the CS is speeded up based on the semantic and structural matches. The precision of the search results is also increased by expanding the query with the intent. Compared with CodeHow and Google CS, QEI outperforms them by 28.5% with a precision score of 0.846. (i.e. 84.6% of the first results are accepted directly by users).
Understanding user’s search intent in vertical websites like IT service crowdsourcing platform relies heavily on domain knowledge. Meanwhile, searching for services accurately on crowdsourcing platforms is still difficult, because these platforms do not contain enough information to support high-performance search. To solve these problems, we build and leverage a knowledge graph named ITServiceKG to enhance search performance of crowdsourcing IT services. The main ideas are to (1) build an IT service knowledge graph from Wikipedia, Baidupedia, CN-DBpedia, StuQ and data in IT service crowdsourcing platforms, (2) use properties and relations of entities in the knowledge graph to expand user query and service information, and (3) apply a listwise approach with relevance features and topic features to re-rank the search results. The results of our experiments indicate that our approach outperforms the traditional search approaches.
This paper describes a Web information search tool called WebYacht. The goal of WebYacht is to solve the problem of imprecise search results in current Web search engines. Due to incomplete information given by users and the diversified information published on the Web, conventional document ranking based on an automatic assessment of document relevance to the query may not be the best approach when little information is given as in most cases. In order to clarify the ambiguity of the short queries given by users, WebYacht adopts cluster-based browsing model as well as relevance feedback to facilitate Web information search. The idea is to have users give two to three times more feedback in the same amount of time that would be required to give feedback for conventional feedback mechanisms. With the assistance of cluster-based representation provided by WebYacht, a lot of browsing labor can be reduced. In this paper, we explain the techniques used in the design of WebYacht and compare the performances of feedback interface designs and to conventional similarity ranking search results.
Data integration becomes even more necessary given the increasing availability of data from distributed and heterogeneous sources. To address such heterogeneity, crisp ontologies have been employed in order to represent the semantics of integrated data. However, it is interesting to use fuzzy logic concepts in these ontologies for a more expressive representation of vague information relevant to some domains. In this context, this paper presents DISFOQuE system for data integration based on fuzzy ontology, which provides a homogeneous view of data sources and also performs query expansions in order to retrieve more comprehensive answers for the user. We have executed a real experiment in the domain of watershed analysis, which provided a homogeneous view of the watershed data sources and more effective answers to researchers.
This paper researches on the application of a user interest model in personalized search. For a given user, we view his query history and documents of interest so as to mine user interest data from him or her. According to the vector space model, we represent the user profile as a vector, which is formed through classification of a select group of user query words. The new approach used in this paper is that the vector of user interest is constructed from the user profile. Experimental results demonstrate that this approach exhibit good description performance with regards to user interest and it lays a reliable foundation for future research on personalized search.