Please login to be able to save your searches and receive alerts for new content matching your search criteria.
The World Wide Web provides an immense source of information. Accessing information of interest presents a challenge to scientists and analysts, particularly if the desired information is structural in nature. Our goal is to design a structural search engine that uses the hyperlink structure of the Web, in addition to textual information, to search for sites of interest. Our structural search engine, called WebSUBDUE, searches not only for particular words or topics but also for a desired hyperlink structure. Enhanced by WordNet text functions, our search engine retrieves sites corresponding to structures formed by graph-based user queries. We hypothesize that this system can form the heart of a structural query engine, and demonstrate the approach on a number of structural web queries.
We present a new approach in web search engines. The web creates new challenges for information retrieval. The vast improvement in information access is not the only advantage resulting from the keyword search. Additionally, much potential exists for analyzing interests and relationships within the structure of the web. The creation of a hyperlink by the author of a web page explicitly represents a relationship between the source and destination pages which demonstrates the hyperlink structure between web pages. Our web search engine searches not only for the keywords in the web pages, but also for the hyperlink structure between them. Comparing the results of structural web search versus keyword-based search indicates an improved ability to access desired information. We also discuss steps toward mining the queries input to the structural web search engine.
Short and ambiguous queries are the major problems in search engines which lead to irrelevant information retrieval for the users’ input. The increasing nature of the information on the web also makes various difficulties for the search engine to provide the users needed results. The web search engine experience the ill effects of ambiguity, since the queries are looked at on a rational level rather than the semantic level. In this paper, for improving the performance of search engine as of the users’ interest, personalization is based on the users’ clicks and bookmarking is proposed. Modified agglomerative clustering is used in this work for clustering the results. The experimental results prove that the proposed work scores better precision, recall and F-score.
In this paper, we present a new ranking algorithm and an intelligent Web search system using data mining techniques to search and analyze Web documents in a more flexible and effective way. Our method takes advantage of the characteristics of Web documents to extract, find, and rank data in a more meaningful manner. We utilize hyperlink structures with Web document content to intelligently rank the retrieved results. It can solve ranking problems of existing algorithms for multi-frame Web documents and unrelated linked documents. In addition, we use domain specific ontologies to improve our query process and to rank retrieved Web documents with better semantic notion. Furthermore, we use association rule mining to find the patterns of maximal keyword sets, which represent the main characteristics of the retrieved documents. For subsequent queries, these keywords become recommended sets of query terms for users' specific needs. Clustering is used to group retrieved documents into distinct sets that can help users make their decisions easier and faster. Experimental results show that our Web search system is indeed effective and efficient.
Topic distillation is the process of finding representative pages relevant to a given query. Well-known topic distillation approaches such as the HITS algorithm have shown to be useful for topic distillation. Many succeeding researchers focus on augmenting HITS with further content analysis to alleviate the steady deterioration of distillation quality suffered by HITS. In this paper, we attempt to revisit the behavior of HITS from a different point of view. Namely, a similarity-based analysis model is applied to observing the distillation procedure. By defining a generalized similarity, an algorithm is proposed, which can improve the quality of distillation only using the information of hyperlinks. The experimental results reveal that the new algorithm improves distillation quality without utilizing any content information of pages.
We investigate the automatic generation of topic pages as an alternative to the current Web search paradigm. Topic pages explicitly aggregate information across documents, filter redundancy, and promote diversity of topical aspects. We propose a novel framework for building rich topical aspect models and selecting diverse information from the Web. In particular, we use Web search logs to build aspect models with various degrees of specificity, and then employ these aspect models as input to a sentence selection method that identifies relevant and non-redundant sentences from the Web. Automatic and manual evaluations on biographical topics show that topic pages built by our system compare favorably to regular Web search results and to MDS-style summaries of the Web results on all metrics employed.
Web search could be much facilitated if we can better relate the user intention with the meaning of the web content. In this paper, we first survey the various existing methods, focusing on the dilemma that obtaining high accuracy results usually sacrifices the response time. We then propose a novel information retrieval framework to combine keyword-based search and search based on syntactical information. In particular, we design a sequential structure called LSC (Language Sequential Component) to encode syntactical information. Given a sentence, LSC provides a bridge from its syntactical representation and semantic meaning. We also propose a learning algorithm to obtain the LSCs from a training set, a classification algorithm to find the relevant LSCs from a user query to interpret the intentions of the user, and a search framework (called Semantic Search Engine) to incorporate syntactical information into a keyword based search system. Our experiments show the Semantic Search Engine outperforms the keyword-based approach significantly.
In this paper, we categorize “semantics” into “taxonomical semantics”, “syntactical semantics” and “formal semantics”. We propose a declarative meta-language SCDL-NL as the foundation of a general annotation language in which “taxonomical and syntactical semantic” information of a sentence can be clearly defined. Since pure natural language is too complicated to be used as a general annotation language, the annotation language imposes some restrictions on the English grammar so that it can be easily translated into SCDL-NL to facilitate information retrieval.
Nowadays, there are a large number of users who post complaints about a certain service on the Internet. Because users have various values and views, even if they receive the same service, they may complain in different ways. However, it is quite difficult to respond to various user demands for service in real time and there are almost no direct solutions when users feel dissatisfied with a certain service. Therefore, in this paper, we propose a web search method by analyzing complaint data from Fuman Kaitori Center. First, the system generates query keywords according to various user complaints about a certain service by calculating the score of each query. Then suitable web pages containing advice are recommended from the results of the query. This advice could address users' dissatisfaction and respond to their various demands in a comprehensive way. Also, we verify the usability of proposed system by using a questionnaire survey evaluation.
We propose a method for processing aggregate queries by analyzing indices of Web search engines. An aggregate query is a form of query such as "What are typical topics on Kyoto?" and the answer is a set of pairs of a topic term and its statistical score. Using search engine indices to process aggregate queries has two advantages. One is that we can realize quick and instant processing by using indices and minimizing the access to actual contents. The other is that results are statistically reliable since they are based on a huge amount of Web data. Our method consists of two phases. The probe phase returns a set of terms that satisfy a certain relation to a given keyword. The validation phase determines whether given terms satisfy a certain relation. We describe examples of aggregate queries: finding typical topics, finding siblings/rivals, and finding landmarks.