Skip main navigation

Cookies Notification

We use cookies on this site to enhance your user experience. By continuing to browse the site, you consent to the use of our cookies. Learn More
×

System Upgrade on Tue, May 28th, 2024 at 2am (EDT)

Existing users will be able to log into the site and access content. However, E-commerce and registration of new users may not be available for up to 12 hours.
For online purchase, please visit us again. Contact us at customercare@wspc.com for any enquiries.

SEARCH GUIDE  Download Search Tip PDF File

  • articleNo Access

    EDGeS: THE COMMON BOUNDARY BETWEEN SERVICE AND DESKTOP GRIDS

    Service grids and desktop grids are both promoted by their supportive communities as great solutions for solving the available compute power problem and helping to balance loads across network systems. Little work, however, has been undertaken to blend these two technologies together. In this paper we introduce a new EU project, that is building technological bridges to facilitate service and desktop grid interoperability. We provide a taxonomy and background into service grids, such as EGEE and desktop grids or volunteer computing platforms, such as BOINC and XtremWeb. We then describe our approach for identifying translation technologies between service and desktop grids. The individual themes discuss the actual bridging technologies employed and the distributed data issues surrounding deployment.

  • articleNo Access

    A SURVEY OF PRIVACY-PRESERVING COLLABORATIVE FILTERING SCHEMES

    With increasing need for preserving confidential data while providing recommendations, privacy-preserving collaborative filtering has been receiving increasing attention. To make data owners feel more comfortable while providing predictions, various schemes have been proposed to estimate recommendations without deeply jeopardizing privacy. Such methods eliminate or reduce data owners' privacy, financial, and legal concerns by employing different privacy-preserving techniques. Although there are considerable numbers of studies focusing on privacy-preserving collaborative filtering schemes, there is no comprehensive survey investigating them with respect to different directions. In this survey, we mainly focus on studying various privacy-preserving recommendation methods according to the data partitioning cases and the utilized techniques for preserving confidentiality. We also review privacy in general and examine in collaborative filtering scenarios. We discuss the proposed schemes in terms of their limitations and practical implementation challenges. Moreover, we give an overview of evaluation of such schemes. We finally provide a comprehensive guideline for studying in this area and propose future research directions.

  • articleNo Access

    Consensus-Driven Cluster Analysis: Top-Down and Bottom-Up Based Split-and-Merge Classifiers

    Consensus clustering is used in data analysis to generate stable results out of a set of partitions delivered by stochastic methods. Typically, the goal is searching for the socalled median (or consensus) partition, i.e. the partition that is most similar, on average, to all the input partitions. In this paper we address the problem of combining multiple fuzzy clusterings without access to the underlying features of the data while basing on inter-clusters similarity. We are concerned of top-down and bottom-up based consensus-driven fuzzy clustering while splitting and merging worst clusters. The objective is to reconcile a structure, developed for patterns in some dataset with the structural findings already available for other related ones. The proposed classifiers consider dispersion and dissimilarity between the partitions as well as the corresponding fuzzy proximity matrices. Several illustrative numerical examples, using both synthetic data and those coming from available machine learning repositories, are also included. The experimental component of the study shows the efficiency of the proposed classifiers in terms of quality and runtime.

  • articleNo Access

    Differentially Private Outlier Detection in a Collaborative Environment

    Outlier detection is one of the most important data analytics tasks and is used in numerous applications and domains. The goal of outlier detection is to find abnormal entities that are significantly different from the remaining data. Often, the underlying data is distributed across different organizations. If outlier detection is done locally, the results obtained are not as accurate as when outlier detection is done collaboratively over the combined data. However, the data cannot be easily integrated into a single database due to privacy and legal concerns. In this paper, we address precisely this problem. We first define privacy in the context of collaborative outlier detection. We then develop a novel method to find outliers from both horizontally partitioned and vertically partitioned categorical data in a privacy-preserving manner. Our method is based on a scalable outlier detection technique that uses attribute value frequencies. We provide an end-to-end privacy guarantee by using the differential privacy model and secure multiparty computation techniques. Experiments on real data show that our proposed technique is both effective and efficient.

  • articleNo Access

    PRIVACY-PRESERVING RANDOM PROJECTION-BASED RECOMMENDATIONS BASED ON DISTRIBUTED DATA

    Providing recommendations based on distributed data has received an increasing amount of attention because it offers several advantages. Online vendors who face problems caused by a limited amount of available data want to offer predictions based on distributed data collaboratively because they can surmount problems such as cold start, limited coverage, and unsatisfactory accuracy through partnerships. It is relatively easy to produce referrals based on distributed data when privacy is not a concern. However, concerns regarding the protection of private data, financial fears due to revealing valuable assets, and legal regulations imposed by various organizations prevent companies from forming collaborations. In this study, we propose to use random projection to protect online vendors' privacy while still providing accurate predictions from distributed data without sacrificing online performance. We utilize random projection to eliminate the aforementioned issues so vendors can work in partnerships. We suggest privacy-preserving schemes to offer recommendations based on vertically or horizontally partitioned data among multiple companies. The recommended methods are analyzed in terms of confidentiality. We also analyze the superfluous loads caused by privacy concerns. Finally, we perform real data-based trials to evaluate the accuracy of the proposed schemes. The results of our analyses show that our methods preserve privacy, cause insignificant overheads, and offer accurate predictions.

  • articleNo Access

    COLLABORATIVE CLUSTERING USING PROTOTYPE-BASED TECHNIQUES

    The aim of collaborative clustering is to reveal the common structure of data distributed on different sites. In this paper, we present a formalism of topological collaborative clustering using prototype-based clustering techniques; in particular we formulate our approach using Kohonen's Self-Organizing Maps. Maps representing different sites could collaborate without recourse to the original data, preserving their privacy. We present two different approaches of collaborative clustering: horizontal and vertical. The strength of collaboration (confidence exchange) between each pair of datasets is determined by a parameter, we call coefficient of collaboration, to be estimated iteratively during the collaboration phase using a gradient-based optimization, for both the approaches. The proposed approaches have been validated on several datasets and experimental results have shown very promising performance.

  • articleNo Access

    Data I/O management approach for the post-hoc visualization of big simulation data results

    Leading-edge supercomputers, such as the K computer, have generated a vast amount of simulation results, and most of these datasets were stored on the file system for the post-hoc analysis such as visualization. In this work, we first investigated the data generation trends of the K computer by analyzing some operational log data files. We verified a tendency of generating large amounts of distributed files as simulation outputs, and in most cases, the number of files has been proportional to the number of utilized computational nodes, that is, each computational node producing one or more files. Considering that the computational cost of visualization tasks is usually much smaller than that required for large-scale numerical simulations, a flexible data input/output (I/O) management mechanism becomes highly useful for the post-hoc visualization and analysis. In this work, we focused on the xDMlib data management library, and its flexible data I/O mechanism in order to enable flexible data loading of big computational climate simulation results. In the proposed approach, a pre-processing is executed on the target distributed files for generating a light-weight metadata necessary for the elaboration of the data assignment mapping used in the subsequent data loading process. We evaluated the proposed approach by using a 32-node visualization cluster, and the K computer. Besides the inevitable performance penalty associated with longer data loading time, when using smaller number of processes, there is a benefit for avoiding any data replication via copy, conversion, or extraction. In addition, users will be able to freely select any number of nodes, without caring about the number of distributed files, for the post-hoc visualization and analysis purposes.