Please login to be able to save your searches and receive alerts for new content matching your search criteria.
This paper describes how to carry out the matrix-vector multiplications Ax and ATx on parallel computers where A is a sparse matrix arising from the discretization of partial differential equations. Two partitionings of the sparse matrix suitable for parallel computers are discussed. They are derived by interpreting the sparse matrix as a graph. One of the techniques partitions the graph of the matrix by finding edge separators. The other technique partitions the graph by finding vertex separators. We claim that in either case computing ATx is no more complex than computing Ax. Results from the implementation of the matrix-vector multiplications on the Intel iPSC/860 are presented which substantiate the claim. Results from an efficient implementation on the Cray Y-MP/1 are also presented for comparison.
Clustering is an effective technique that can be used to analyze and extract useful information from large biological networks. Popular clustering solutions often require user input for several algorithm options that can seem very arbitrary without experimentation. These algorithms can provide good results in a reasonable time period but they are not above improvements. We present a local search based clustering algorithm free of such required input that can be used to improve the cluster quality of a set of given clusters taken from any existing algorithm or clusters produced via any arbitrary assignment. We implement this local search using a modern GPU based approach to allow for efficient runtime. The proposed algorithm shows promising results for improving the quality of clusters. With already high quality input clusters we can achieve cluster rating improvements upto to 33%.
This paper shows a simple algorithm for solving the single function coarsest partition problem on the CRCW PRAM model of parallel computation using O(n) processors in O(log n) time with O(n1+ε) space.
The issue of scalability is key to the success of massively parallel processing. Due to their distributed nature, message-passing multicomputers are appropriate for achieving scalar performance. However, the message-passing model lacks programmability due to difficulties encountered by the programmers to partition and schedule the computation over the processors and to establish efficient inter-processor communication in the user code. Therefore, this paper presents a compile-time scheduling heuristic, called BLS, that maps programs onto the processors of a message-passing multicomputer. In contrast to other methods proposed, BLS takes a more global approach in attempt to balance the tradeoff between exploiting parallelism and reducing communication overhead. To evaluate the effectiveness of BLS, simulation studies of scheduling SISAL programs are presented.
Many parallel compilation systems represent programs internally as Directed Acyclic Graphs (DAGs). However, the storage of these DAGs becomes prohibitive when the program being compiled is large. In this paper we describe a compile-time scheduling methodology for hierarchical DAG programs represented in the IFX intermediate form. The method we present is itself hierarchical reducing the storage that would otherwise be required by a single flat DAG representation. We describe the scheduling model and demonstrate the method using the Optimizing Sisal Compiler and two scientific applications.
We consider the k functions coarsest partition problem for a set S, where |S|=n, and k functions from S to S. We present an Ω(log n−k log k) time, linear work, lower bound for this problem on the CRCW PRAM model of computation.
In this paper, we study the global dynamics of the 5D structural leukemia model with 14 parameters as developed by Clapp et al. [2015]. This model describes the interaction between leukemic cell populations and the immune system. Our analysis is based on the localization method of compact invariant sets. We develop this method by introducing the notion of a partitioning of the parameter space and the notion of a localization set corresponding to this partitioning as its parameters change. Further, we obtain ultimate upper and lower bounds for all variables of a state vector without imposing additional restrictions. Local asymptotic stability conditions with respect to the leukemia-free equilibrium point (EP) are given. We deduce formulas describing inner EPs expressed in terms of positive roots of one 7D equation. Based on this equation, it is shown that the number of inner EPs cannot exceed 3 and one case of a global bifurcation of EPs is detected. Next, we prove the existence of the attracting set. Further, in two theorems, the global eradication/extinction leukemia theorems are described. The impact of using parametrically variable localization sets for a qualitative analysis of the ultimate behavior of leukemic cell populations is discussed.
Class imbalance is a fundamental problem in data mining and knowledge discovery which is encountered in a wide array of application domains. Random undersampling has been widely used to alleviate the harmful effects of imbalance, however, this technique often leads to a substantial amount of information loss. Repetitive undersampling techniques, which generate an ensemble of models, each trained on a different, undersampled subset of the training data, have been proposed to allieviate this difficulty. This work reviews three repetitive undersampling methods currently used to handle imbalance and presents a detailed and comprehensive empirical study using four different learners, four performance metrics and 15 datasets from various application domains. To our knowledge, this work is the most thorough study of repetitive undersampling techniques.
Partitioning a multi-dimensional data set (array) into rectangular regions subject to some constraints (error measures) is an important problem arising from applications in parallel computing, databases, VLSI design, and so on. In this paper, we consider two most common types of partitioning used in practice: the Arbitrary partitioning and (p × p) partitioning, and study their relationships under three widely used error metrics: Max-Sum, Sum-SVar, and Sum-SLift.
We study in this paper the partitioning of the constraints of a temporal planning problem by subgoals, their sequential evaluation before parallelizing the actions, and the resolution of inconsistent global constraints across subgoals. Using an ℓ1-penalty formulation and the theory of extended saddle points, we propose a global-search strategy that looks for local minima in the original-variable space of the ℓ1-penalty function and for local maxima in the penalty space. Our approach improves over a previous scheme that partitions constraints along the temporal horizon. The previous scheme leads to many global constraints that relate states in adjacent stages, which means that an incorrect assignment of states in an earlier stage of the horizon may violate a global constraint in a later stage of the horizon. To resolve the violated global constraint in this case, state changes will need to propagate sequentially through multiple stages, often leading to a search that gets stuck in an infeasible point for an extended period of time. In this paper, we propose to partition all the constraints by subgoals and to add new global constraints in order to ensure that state assignments of a subgoal are consistent with those in other subgoals. Such an approach allows the information on incorrect state assignments in one subgoal to propagate quickly to other subgoals. Using MIPS as the basic planner in a partitioned implementation, we demonstrate significant improvements in time and quality in solving some PDDL2.1 benchmark problems.
In this paper, we study strategies in incremental planning for ordering and grouping subproblems partitioned by the subgoals of a planning problem. To generate a rich set of partial orders for ordering subproblems, we propose an algorithm based on a relaxed plan that ignores the delete lists. The new algorithm considers both the initial and the goal states and can effectively order subgoals in such a way that greatly reduces the number of invalidations during incremental planning. We have also considered trade-offs between the granularity of the subgoal sets and the complexity of solving the overall planning problem. We propose an efficient strategy for dynamically adjusting the grain size in partitioning in order to minimize the total complexity. We further evaluate a redundant-ordering scheme that uses two different subgoal orders to improve the solution quality, without greatly sacrificing run-time efficiency. Experimental results on using Metric-FF, YAHSP, and LPG-TD-speed as the embedded planners in incremental planning show that our strategies are general for improving the time and quality of these planners across various benchmarks. Finally, we compare the performance of the three planners, the incremental versions using these planners as embedded planners, and SGPlan4.1.
We focus on the problem of constructing decision functions to aid in the valuation of alteratives in uncertain decision making. We discuss different types of scales available for representing the payoffs associated with the alternatives. Here we consider the case in which our basic scale is an ordinal case. Here however we augment this ordinal scale by allowing an additional notion. We indicate one special element on the scale which we call the denoted element. We name such a scale a Denoted Ordinal Scale (DOS). We point out that a DOS allows a binary partitioning of the basic ordinal scale which can be associated with differing semantics and used for various purposes. Here we focus on a binary partitioning and use a semantics which provides a classification of payoffs as to whether they are acceptable or not. This allows us to have information such as A is preferred to B but both are acceptable. We show how a DOS with this semantics can be used to construct sophisticated decision functions.
Parallel and distributed simulation techniques have been investigated in a number of studies to decrease the execution times of PCS network simulations. In this paper, we consider distributed simulation of PCS models using a two-state PCS simulation testbed which makes use of a conservative scheme at Stage 1, and of Time Warp at Stage 2, and focus upon the load balancing issue. We investigate and study several load balancing schemes for TDMA systems. Extensive simulation experiments were conducted on a cluster of workstations using a real suburban area serviced by an FCA-based PCS networks. Our results indicate clearly that careful load balancing scheme is important in the success of the PCS simulation model.
We propose a hierarchical heuristic approach for solving the Traveling Salesman Problem (TSP) in the unit square. The points are partitioned with a random dyadic tiling and clusters are formed by the points located in the same tile. Each cluster is represented by its geometrical barycenter and a “coarse” TSP solution is calculated for these barycenters. Midpoints are placed at the middle of each edge in the coarse solution. Near-optimal (or optimal) minimum tours are computed for each cluster. The tours are concatenated using the midpoints yielding a solution for the original TSP. The method is tested on random TSPs (independent, identically distributed points in the unit square) up to 10,000 points as well as on a popular benchmark problem (att532 — coordinates of 532 American cities). Our solutions are 8–13% longer than the optimal ones. We also present an optimization algorithm for the partitioning to improve our solutions. This algorithm further reduces the solution errors (by several percent using 1000 iteration steps). The numerical experiments demonstrate the viability of the approach.
A metabolic network model provides a computational framework to study the metabolism of a cell at the system level. Due to their large sizes and complexity, rational decomposition of these networks into subsystems is a strategy to obtain better insight into the metabolic functions. Additionally, decomposing metabolic networks paves the way to use computational methods that will be otherwise very slow when run on the original genome-scale network. In the present study, we propose FCDECOMP decomposition method based on flux coupling relations (FCRs) between pairs of reaction fluxes. This approach utilizes a genetic algorithm (GA) to obtain subsystems that can be analyzed in isolation, i.e. without considering the reactions of the original network in the analysis. Therefore, we propose that our method is useful for discovering biologically meaningful modules in metabolic networks. As a case study, we show that when this method is applied to the metabolic networks of barley seeds and yeast, the modules are in good agreement with the biological compartments of these networks.
The shuffled differential evolution (SDE) is a recently proposed version of differential evolution (DE). However the SDE employs several efforts to compensate limited amount of search moves in the original DE, but these efforts are performed by the same operator. To vary search moves of the SDE, this research proposes employing a secondary mutation operator beside of first mutation operator. This new mutation operator can generate some different offspring than those generated by the first one. Experiments demonstrate a better performance of the proposed algorithm than the SDE. In a later part of the comparative experiments, performance comparisons of the proposed algorithm with some modern DE and other evolutionary algorithms reported in the literature confirm a better or at least a competitive performance of our proposed algorithm. Also a real-world optimization problem, namely, Spread Spectrum Radar Polly phase Code Design, is solved to show the wide applicability of the DSDE.
This chapter introduces the concept of association rules, a form of local-pattern discovery in an unsupervised learning system. Association rules are used to uncover relationships between inherently unrelated data items. The terminology, notation, and the processes used with association rules are discussed, and a brief overview of three basic rule inference algorithms is given.