This review volume introduces the novel intelligent Web theory called computational Web intelligence (CWI) based on computational intelligence (CI) and Web technology (WT). It takes an in-depth look at hybrid Web intelligence (HWI), which is based on artificial biological and computational intelligence with Web technology and is used to build hybrid intelligent Web systems that serve wired and wireless users more efficiently. The basic principles of CWI and various e-applications of CWI and HWI are discussed. For completeness, six major CWI techniques — fuzzy Web intelligence, neural Web intelligence, evolutionary Web intelligence, granular Web intelligence, rough Web Intelligence and probabilistic Web intelligence — are described. With the huge potential for intelligent e-business applications of CWI and HWI, these techniques represent the future of intelligent Web applications.
https://doi.org/10.1142/9789812562432_fmatter
The following sections are included:
https://doi.org/10.1142/9789812562432_0001
We discuss some methods for constructing recommender systems. An important feature of the methods studied here is that we assume the availability of a description, representation, of the objects being considered for recommendation. The approaches studied here differ from collaborative filtering in that we only use preferences information from the individual for whom we are providing the recommendation and make no use the preferences of other collaborators. We provide a detailed discussion of the construction of the representation schema used. We consider two sources of information about the users preferences. The first are direct statements about the type of objects the user likes. The second source of information comes from ratings of objects which the user has experienced.
https://doi.org/10.1142/9789812562432_0002
Retrieving relevant information is a crucial component of cased-based reasoning systems for Internet applications such as search engines. The task is to use user-defined queries to retrieve useful information according to certain measures. Even though techniques exist for locating exact matches, finding relevant partial matches might be a problem. The objective of this paper is to develop an intelligent computer system with some deductive capabilities to conceptually cluster, match and rank pages based on predefined linguistic formulations and rules defined by experts or based on a set of known homepages. The Conceptual Fuzzy Set (CFS) model will be used for intelligent information and knowledge retrieval through conceptual matching of both text and links (here defined as “Concept”). The selected query doesn't need to match the decision criteria exactly, which gives the system a more human-like behavior. The model can be used for intelligent information and knowledge retrieval through Webconnectivity-based clustering. We will also present the integration of our technology into commercial search engines such as Google™ as a framework that can be used to integrate our model into any other commercial search engines.
https://doi.org/10.1142/9789812562432_0003
Most of the elaborate methods in natural language processing are not readily applicable to the automated acquisition of answers to questions asked by the Web user due to the particular problems of the Web domain, especially the stringent time constraint. A fuzzy sequence method is presented that takes advantage of the power of approximation in detecting sequence similarities in natural language. Linguistic and ontological elements are incorporated on top of the fuzzy sequence foundation for enhanced performance. This approach allows scaleable implementation such that the level of complexity can be adjusted to satisfy various constraints. The results show that fuzzy sequence solutions can accomplish a great deal of refinement of search engine results and yield a small set of evidence with which to form answers using sophisticated generation techniques.
https://doi.org/10.1142/9789812562432_0004
To accommodate the exponential growth of web traffic, Content Distribution Networks (CDN) have been designed and deployed to transparently and dynamically redirect user requests to different cache servers. Server selection is crucial to the functionality and performance of CDN. An appropriate server can be chosen according to the parameters such as user location, measured round trip time (rtt), and advertised server load. However, it is impractical to obtain accurate measurements of the above parameters. In addition, these parameters may have conflicting effects on decision making. In this chapter, a fuzzy inference system is designed to select servers based on partial measurements of rtts and historical information of servers' load. The performance of the fuzzy inference scheme is evaluated and compared with several existing schemes by simulations. The simulation results show that the fuzzy inference scheme can achieve higher network utilization, and provide better user perceived quality of service (QoS). In addition, the fuzzy inference scheme is fairly flexible to deal with network dynamics, due to its intrinsic capability of handling multiple, inaccurate and inconsistent decision inputs.
https://doi.org/10.1142/9789812562432_0005
This chapter first defines a recommendation process, which helps the user to select products fkom a large number of candidates according to personal preference. Though both conventional database and fbzzy database have been used for this task, none of the two provide a proper solution. A new approach is introduced, which is easy to design and maintain, and provides well justified results. The central idea of this approach is to interpret user preference as consisting of multiple criteria, each of which is relative to available data. The procedure introduced here forms a membership hction at run-time according to user request and available data, then use it to rank the candidates.
https://doi.org/10.1142/9789812562432_0006
In this chapter, we present our on-going work on a web-based online database for fabric sourcing. The objective of the database and its intelligent search engine is to address the urgent needs of cloth manufacturers, clothing designers and retailers to quickly find suitable fabric materials and fabric manufacturers. The intelligent search engine allows users to enter flexible queries in search for fabrics according to fabric mechanical property, the intended garment appearance, as well as tailorability of the fabrics. We also apply fiuzzy linear clustering method of Hathaway and Bezdek to the task of fabric drape property prediction which is useful for selecting suitable fabrics. We show that the fuzzy linear clustering method is quite effective for this task. A hybrid method combining K-nearest neighbor method and fuzzy linear clustering improves the prediction accuracy further.
https://doi.org/10.1142/9789812562432_0007
Noise and incompleteness are two of the major issues with web related data. Fuzzy set and rough set theories provide complimentary methods for accommodating, noisy, incomplete, and approximate information. Non-conventional clustering techniques based on fizzy set and rough set theories may be more suitable for web mining applications. This chapter compares the results of clustering obtained using the conventional K-means algorithm, a modified K-means algorithm based on rough set theory, and a fuzzy clustering algorithm. The modifications to the K-means algorithm are based on the properties of rough sets. The resulting clusters are represented as interval sets. The paper describes results of experiments used to create conventional, fuzzy, and interval set representations of clusters of web users on three educational websites. The experiments used secondary data consisting of access logs fiom the World Wide Web. This type of analysis is called web usage mining, which involves applications of data mining techniques to discover usage patterns fiom the web data. Analysis shows the advantages of fuzzy and interval set representation of clusters over crisp clusters.
https://doi.org/10.1142/9789812562432_0008
Several researchers have suggested that Bayesian networks be used in web search (information retrieval). One advantage of this approach is that Bayesian networks are more general than the probabilistic models previously used in information retrieval. In practice, experimental results demonstrate the effectiveness of the modern Bayesian network approach. On the other hand, since Bayesian networks are defined solely upon the notion of probabilistic conditional independence, these encouraging results do not take advantage of the more general probabilistic independencies recently proposed in the Bayesian network community. In this chapter, we begin by demonstrating that probabilistic inference with context specific independence (CSI) can be more efficient than a traditional Bayesian network approach. The use of CSI involves the union-product operator for combining partial distributions. We then show that the union-product operator is not sufficient for factorizations based on contextual weak independence (CWI), which is a generalization of CSI. We subsequently introduce a new operator, called weak-join, for CWI factorizations and show that it is both associative and commutative. Finally, we explicitly demonstrate that probabilistic inference using CWIs can be more efficient than when CSIs are used.
https://doi.org/10.1142/9789812562432_0009
The present trend in vehicle fault diagnosis is toward automation. Modern motor vehicles can often be modeled as a complex system made up of many components, making fault diagnosis difficult. Traditionally, effective vehicle fault diagnosis relies heavily on the experience and knowledge of human experts. This chapter presents the development of an expert system whose aim is to provide useful aid to human users in their attempts at vehicle fault diagnosis, even at remote locations via the WWW. The system employs a hybrid data mining process to effectively mine data stored in a vehicle service database, which contains past service records. Through the learning capability of a neural network, the system is able to generalize knowledge stored in the database. Performance evaluation of the system confirms its effectiveness both in terms of speed and accuracy.
https://doi.org/10.1142/9789812562432_0010
The digital networked world is enabling and requiring a new emphasis on personalized document creation. Traditional methods of producing documents have become insufficient, since most are aimed at producing static results, and require time-consuming and knowledgeintensive processes. As new untrained users start producing documents for a wide audience, the old publishing tools prove too demanding. The new, more dynamic digital environment demands tools that can find, filter, and layout the content automatically, tailored to personal needs and transformed for the presentation device, and can enable novices or entirely automated workflows to easily create such documents. In order to enable such automated document assembly, we have formalized custom document creation as a constrained optimization problem, and use a genetic algorithm to assemble and transform compound personalized documents. This enables new personalized document workflows that are necessary to be productive in the wired world.
https://doi.org/10.1142/9789812562432_0011
Nowadays Internet has become a universal repository of human knowledge and information, it turns the Web into a new publishing medium accessible to everybody. As an immediate consequence, any Web user can look for information, pushes his personal data with little effort and almost without costs. This universe without frontiers has attracted millions of people causing a revolution in the way people use computers and perform their daily tasks. For instance, e-commerce and home banking are becoming very popular and have generated meaningful revenues. Anyway, despite so much success, the web suffers of some problems of its own, making the reliability of Internetbased retrieval systems more complex; finding proper information on the Web becomes a troublesome activity. Sometimes, to satisfy user's information need, a meticulous surfing in the WWW hyperspace becomes a fundament activity, in order to search for information of interest. However, since the hyperspace is vast and almost unknown, such a navigation task is usually inefficient, because information definition and structure is frequently of low quality. There are different problems concerning Web searching activity, one among these falls in the query phase. Often, the searching process returns a huge list of answers that are irrelevant, unavailable, or outdated. The tedium of querying, due to the fact the queries are too weak to cope with the user's expressiveness, has stimulated the designers to enrich the human-system interaction with new searching metaphors, new criteria of matching: as instance, defining a kind of cornparition among “similar” pages, as offered by Google, Yahoo and others. The idea is very good, since the similarity gives an easy and intuitive mechanism to express a complex relation. We believe that this approach could become more effective if the user can rely on major flexibility in expressing the similarity dependencies with respect the current and available possibilities. In this chapter, we introduce a novel method for considering and processing the userdriven proximity/similarity during Web navigation. We define an extension Fuzzy C-Means algorithm, namely P-FCM (Proximity Fuzzy C-Means) incorporating a user judgment in term of measure of similarity or dissimilarity among clusterized data. We present the theoretical framework of this extension and then we observe, through a suite of Web-based experiments, how significant is the impact of user's feedback during P-FCM functioning. These observations suggest that the P-FCM approach can offer a relatively simple way of improving the Web page classification according with the user interaction with the search engine.
https://doi.org/10.1142/9789812562432_0012
Web usage mining attempts to discover useful knowledge from the secondary data obtained from the interactions of the users with the Web. Web usage mining has become very critical for effective Web site management, creating adaptive Web sites, business and support services, personalization, network traffic flow analysis and so on. This chapter presents a divide and conquer approach (i-Miner) to optimize the concurrent architecture of a fuzzy clustering algorithm (to discover web data clusters) and a fuzzy inference system to analyze the Web server traffic trends. A hybrid evolutionary fuzzy clustering algorithm is used to optimally segregate similar visitors. The clustered data is then used to analyze the trend patterns using a Takagi-Sugeno fuzzy inference system learned using a combination of evolutionary algorithm and neural network learning. Proposed approach is compared with other clustering and function approximation techniques. The results are graphically illustrated and the practical significance is discussed in detail. Empirical results clearly show that the proposed Web usage-mining framework is efficient.
https://doi.org/10.1142/9789812562432_0013
Content-based technology has emerged from the development of multimedia signal processing and wide spread of web application. In this chapter, we discuss the issues involved in the content-based audio classification and retrieval, including spoken document retrieval and music information retrieval. Further, along this direction, we conclude that the emerging audio ontology can be applied in fast growing Internet, digital libraries, and other multimedia systems.
https://doi.org/10.1142/9789812562432_0014
There is an increasing number of non-English resource available on the Web. However, most information retrieval techniques have been developed for English and other Western languages. As the second largest Internet language, Chinese provides a good setting for study of how search engine techniques developed for English could be generalized for use in other languages to facilitate Internet searching and browsing in a multilingual world. We present our research in developing a Chinese Web portal in the Medical domain: CMedPort. Our portal integrates focused collection building technique, metasearch engines, cross-regional search technique, as well as post retrieval analysis technique, such as summarization and categorization. Pilot user studies were conducted to compare the effectiveness and efficiency of CMedPort with those of three major Chinese search engines. Preliminary results from these user studies indicate that CMedPort achieved similar precision, but higher recall and higher efficiency than the regional search engines.
https://doi.org/10.1142/9789812562432_0015
Existing popular algorithms for user preference retrieval, such as Rocchio's similarity-based relevance feedback algorithm and its variants [Rocchio (1971); Ide (1971a)], the Perceptron algorithm [Rosenblatt (1958)l and the Gradient Descent Procedure [Wong et. al. (1988)], are based on linear additions of documents judged by the user. In contrast to the adoption of linear additive query updating techniques in those algorithms, in this chapter two new algorithms, which use multiplicative query expansion strategies to adaptively improve the query vector, are designed. It is shown that one algorithm has a substantially better mistake bound than the Rocchio and the Perceptron algorithms in learning a user preference relation determined by a linear classifier with a small number of non-zero coefficients over the realvalued vector space [0, 1]n. It is also shown that the other algorithm boosts the usehlness of an index term exponentially, while the gradient descent procedure does so linearly. Applications of those two algorithms to Web search are also presented.
https://doi.org/10.1142/9789812562432_0016
Many biological results are published only in plain-text articles and these articles or their abstracts are collected in online biomedical literature databases such as PubMed and BioMed Central. To expedite the progress of functional bioinformatics, it is important to efficiently process large amounts of biomedical articles and extract these results into a structured format and store in a database so these results can be retrieved and analyzed by biologists and medical researchers. Automated discovery and extraction of these biological relationships from biomedical literatures has become essential because of the enormous amount of biomedical literature published each year. In this paper we present a scalable learning method to automatically extract biological relationships from biomedical literature. Our method integrates information extraction and data mining together, automatically learns the patterns based on a few user seed tuples, and then extract new tuples from the biomedical literatures based on the discovered patterns. A novel framework Biological Relationship EXtract (BRExtract) is implemented and tested on the PuBMed to find the protein-protein interaction and the experimental results indicate our approach is very effective in extracting biological relationships from large collection of online biomedical literature.
https://doi.org/10.1142/9789812562432_0017
Owing to the rapid development of technology, especially in the field of Internet systems, there is an increasing demand both for intelligent, mobile and autonomous systems and for the usage and conveyance of multi-media information through cyberspace. In this paper, we propose an innovative, intelligent multi-agent based model, namely iJADE–(intelligent Java Agent Development Environment), to provide an intelligent agent-based platform in the e-commerce environment. From the implementation point of view, we introduce an intelligent multimedia processing system known as “iMASS” – an intelligent Multi-resolution Agent-based Surveillance System - based on the integration of the following modules: a) an automatic coarse-to-fine figure-ground scene segmentation module using the CNOW (Composite Neuro-Oscillatory Wavelet-based) model; b) an automatic human face detection and extraction module using an Active Contour Model (ACM) with facial ‘landmarks’ vectors; c) invariant human face identification based on the Elastic Graph Dynamic Link Model (EGDLM). In order to conform to the current (and future) multimedia system standard, the whole iMASS is implemented using the MPEG7 system framework – with comprehensive Description Schemes (DS), feature Descriptors (D) and model framework.
https://doi.org/10.1142/9789812562432_0018
Like any self-regulating environment, the Internet is prone to abuse by individuals who use it for illegal or immoral purposes. The ability to discreetly intercept and analyze Internet traffic has tremendous potential both to protect unsuspecting users from harmful Web materials, and for the fight against organized crime and terrorism. Using the intelligent filtering and monitoring of pornographic Web pages as a case study, we have investigated strategies for monitoring Internet traffic. Based on our analysis, we have developed a Web content filtering system that decouples the monitoring process from Web content analysis process to achieve low latency and high accuracy. In particular, an Online Monitoring Agent performs the monitoring and filtering process, whereas an Offline Classification Agent that employs machine intelligence for classification of Web pages performs Web content analysis. We have implemented and tested the system and found that it consistently outperformed other, commercially available, Web filtering systems.
https://doi.org/10.1142/9789812562432_0019
Most of the existing search systems (software) are modeled using crisp logic and queries. In this chapter, we introduce fuzzy querying and ranking as a flexible tool allowing approximation where the selected objects do not need to exactly match the decision criteria resembling natural human behavior. The model consists of five major modules: the Fuzzy Search Engine, Application Templates, the User Interface, the Database, and Evolutionary Computing. The system is de-signed in a generic form to accommodate more diverse applications and to be delivered as stand-alone software to academia and businesses.
https://doi.org/10.1142/9789812562432_0020
Finding relevant pages in response to a user query is a challenging task. Automated search engines that rely on keyword matching usually return too many low quality matches. Link analysis methods can substantially improve the search quality when they are combined with content analysis. This chapter surveys the mainstream work in this area.
https://doi.org/10.1142/9789812562432_0021
“Agents” of various species have been an active field of research and development for a decade now. Mobile agent technology that allows programs to move autonomously through a network has emerged as a new approach to distributed computing. It supports a unified and scalable framework for various kinds of applications such as electronic commerce, parallel computing, and information retrieval, as well as for advanced technologies such as Web Services and Grid Computing operating in widely distributed heterogeneous open networks including the Internet. This chapter introduces some basic concepts related to mobile agents, provides an overview of mobile agent technology and its applications in Internet computing, discusses, and surveys some current mobile agent systems.
https://doi.org/10.1142/9789812562432_0022
Recent advances in computer graphics technology have led to the introduction of virtual reality technology to a wider audience. The relatively new trend of intelligent agents was quickly applied to virtual reality systems, producing a new research area, Intelligent Virtual Agents that is a meeting point for Artificial Intelligence and Virtual Reality. Nowadays, Intelligent Virtual Agent applications have started appearing on the web, aiming to take advantage of the potential of the new medium. In this paper we attempt to give a concise presentation of this fascinating new research area, providing a brief historical overview and discussing essential concepts related to IVA's, Web 3D and their combination.
https://doi.org/10.1142/9789812562432_0023
In this chapter, we discuss some key problems in network security and present some possible ways to solve these problems, especially some data mining based methods. The problem of network security and intrusion detection is discussed at first. Then, data mining technique is briefly introduced. The problems, possibilities, and methods of data mining solutions for intrusion detection are further analyzed. Several new techniques such as data reduction, incremental mining, uncertain data mining, and initiative data mining are suggested to solve the problems of intrusion detection systems.
https://doi.org/10.1142/9789812562432_0024
Since its proposal, Web Intelligence (WI) has quickly grown into an extensively concerned research direction in the computer science community. WI aims at developing and exploring the next generation of the World Wide Web to satisfy the increasing needs of human. In WI research, resource optimization is a fundamental and key problem, which involves two main requirements, a right infrastructure of distributed resources and the corresponding algorithmic support to mobilize resources. Due to the advantages of grids, in particular peer-to-peer grids, such as decentralization and robustness, the next generation of the Web will mainly rely on gridlike resources. This chapter will based on the peer-to-peer infrastructure of gridlike resources explore the corresponding algorithmic support to mobilize resources. Specifically, in this chapter we present a model to characterize the process of task handling among peer-to-peer grid nodes. Our model consists of functional differential equations. Through case studies, (1) we show that our model are effective in characterizing the process of task handling with balanced loads among nodes; (2) we examine the effects of time delay, service time, etc. on the global performance of a peer-to-peer network of grid nodes and show some interesting observations. Based on our model, we further simulate a complete process of task handling on a grid node network.
https://doi.org/10.1142/9789812562432_0025
International terrorists are increasingly using the Internet for covert communications, collecting information on their topics of interest, and spreading the word about their activities around the world. One way to detect terrorist activities on the Internet is by monitoring the content accessed by web users. This study presents an innovative, DM-based methodology. for web content monitoring. The normal behavior of a group of similar users is learned by applying unsupervised clustering algorithms to the textual content of publicly available web pages they usually view. The induced model of normal behavior is used in realtime to reveal anomalous content accessed at a specific computer. To speed-up the detection process, dimensionality reduction is applied to the content data. We evaluate the proposed methodology by ROC analysis.
https://doi.org/10.1142/9789812562432_bmatter
The following sections are included: