Please login to be able to save your searches and receive alerts for new content matching your search criteria.
We consider the problem of finding Hamiltonian cycles, linear arrays and rings of a faulty supercube, if any. The proof of the existence of Hamiltonian cycles in hypercubes is easy due to the fact they are symmetric graphs. Since the supercube is asymmetric, the proof of the existence of Hamiltonian cycles is not trivial. We show that for any supercube SN, where N is the number of nodes in the supercube, there exists a Hamiltonian cycle. This implies that for any r such that 3≤r≤N, there exists a cycle of r nodes in a supercube. There are embedding algorithms proposed in this paper. The embedding algorithms show a ring with any number of nodes which can be embedded in a faulty supercube with load 1, congestion 1 and dilation 4 such that O(n2-(⌊log2 m⌋)2) faults can be tolerated, where n is the dimension of the supercube and m is the number of nodes of the ring.
The class of hypercubes is one of the most important and popular topologies for interconnection networks of multicomputer systems. This class includes the binary hypercube and generalized hypercube. Based on the observation that these two graphs can be constructed using a graph theoretic operation known as the product of graphs, we propose a new method for generating large symmetric graphs for networks of multicomputer systems. This method is essentially algebraic in nature, and makes use of the product of a class of graphs known as quasi-group graphs. We call the graphs we obtain PQG graphs. Because these graphs are constructed by an algebraic operation, it simplifies the analysis of their performance. Many of the well-known topologies can in fact be expressed as PQG graphs; this makes the method a very general one. We also investigate the problem of routing in a PQG graph, and propose a hardware implementation of the routing algorithm to reduce the delay in the routing of messages. We then apply our results to the Petersen graph and show that the product of such graphs has vastly superior topological properties than hypercubes of the same degree.
This paper evaluates the performance of hypercube machines on a computational fluid dynamics problem. Our evaluation focuses on a prototype of a class of widely used fluid dynamics codes, FLO52, written by Antony Jameson, which solves the two-dimensional steady Euler equations describing flow around an airfoil. In this paper, we give a description of FLO52, its hypercube mapping, and the code modifications to increase machine utilization. Results from two hypercube computers (a 16 node iPSC/2, and a 512 node NCUBE/ten) are presented and compared. In addition, we develop a mathematical model of the execution time as a function of several machine and algorithm parameters. This model accurately predicts the actual run times obtained. Predictions about future hypercubes are made using this timing model.
Multigrid is a fast and popular iterative method used to solve linear partial differential equations. However, because the solution of very small problems is inherent in the multigrid iteration, it is difficult to implement efficiently on a massively parallel computer. In this paper, we present a model for the efficiency of multigrid on a parallel computer that depends only on the efficiency of the smoother at each level. This model can be used to verify that it is indeed difficult to obtain extremely high efficiencies (>95%), but that it is relatively easy to obtain moderately high. efficiencies (70% to 85%). The model also indicates that moderately high efficiencies can still be achieved as the number of dimensions in the partial differential equation increases. We also present an implementation of the multigrid v-cycle that has achieved 84% efficiency on the 1,024-processor NCUBE/ten.
Through simulation, we studied the performance of three adaptive “wormhole” routing strategies, and compared them with static routing. Since adaptive routing is susceptible to deadlock, an abort-and-retry strategy was used to prevent it from arising. The impact of packetization of long messages and buffering at message destinations were also studied. Results are presented and analyzed for a variety of hardware configurations and traffic conditions. The combination of adaptive routing, abort-and-retry, and buffering at the destination is shown to achieve excellent performance for modest cost.
The multiplication of a vector by a matrix is the kernel operation in many algorithms used in scientific computation. A fast and efficient parallel algorithm for this calculation is therefore desirable. This paper describes a parallel matrix-vector multiplication algorithm which is particularly well suited to dense matrices or matrices with an irregular sparsity pattern. Such matrices can arise from discretizing partial differential equations on irregular grids or from problems exhibiting nearly random connectivity between data structures. The communication cost of the algorithm is independent of the matrix sparsity pattern and is shown to scale as for an n×n matrix on p processors. The algorithm’s performance is demonstrated by using it within the well known NAS conjugate gradient benchmark. This resulted in the fastest run times achieved to date on both the 1024 node nCUBE 2 and the 128 node Intel iPSC/860. Additional improvements to the algorithm which are possible when integrating it with the conjugate gradient algorithm are also discussed.
Communication penalty for parallel computation is related to message startup time and speed of data transmission between the host and processing elements (PEs). We propose two algorithms in this paper to show that the first factor can be alleviated by reducing the number of messages and the second by making the host-PE communication concurrent with computation on the PE array.
The algorithms perform 2n consecutive sums of 2n numbers each on a hypercube of degree n. The first algorithm leaves one sum on each processor. It takes n steps to complete the sums and reduces the number of messages generated by a PE from 2n to n. The second algorithm sends all the sums back to the host as the sums are generated one by one. It takes 2n+n−1 steps to complete the sums in a pipeline so that one sum is completed every step after the initial (n−1) steps.
We apply our second algorithm to the front-to-back composition for ray casting. For large number of rays, the efficiency and speedup of our algorithm are close to theoretically optimal values.
In this paper we introduce a class of trees, called generalized compressed trees. Generalized compressed trees can be derived from complete binary trees by performing certain ‘contraction’ operations. A generalized compressed tree CT of height h has approximately 25% fewer nodes than a complete binary tree T of height h. We show that these trees have smaller (up to a 74% reduction) 2-dimensional and 3-dimensional VLSI layouts than the complete binary trees. We also show that algorithms initially designed for T can be simulated by CT with at most a constant slow-down. In particular, algorithms having non-pipelined computation structure and originally designed for T can be simulated by CT with no slow-down.
In numerical computations, the method of divided differences is a very important technique for polynomial approximations. Consider a pipelined divided–difference computation for approximating an nth degree polynomial. This paper first presents a method to transform the computational structure of divided differences into the pyramid tree with nodes. Based on graph embedding technique, without any extra communication delay, the pipelined divided–difference computation can be performed in a (2k + 1)-dimensional fault–free hypercube for n + 1 = 2k + t, k > 0, and 0 < t < 2k; the pipelined divided-difference computation can be further performed in a (2k + 2)-dimensional faulty hypercube to tolerate arbitrary (k - 1) faulty nodes/links. To the best of our knowledge, this is the first time such mapping methods are being proposed in the literature.
We analyse the fault-tolerant parameters and topological properties of a hierarchical network of hypercubes. We take a close look at the Extended Hypercube (EH) and the Hyperweave (HW) architectures and also compare them with other popular architectures. These two architectures have low diameter and constant degree of connectivity making it possible to expand these networks without affecting the existing configuration. A scheme for incrementally expanding this network is also presented. We also look at the performance of the ASCEND/DESCEND class of algorithms on these architectures.
The Incrementally Extensible Hypercube (IEH) is a novel interconnection network derived from the hypercube. Unlike the hypercube, the IEH graph is incrementally extensible, that is, it can be constructed for any number of nodes. In addition, it has optimal fault tolerance and its diameter is logarithmic in the number of nodes and the difference of the maximum and the minimum degree of a node in the graph is (i.e., the graph is almost regular). In this paper, we show that almost the entire IEH graph, except for those with N =2n-1 nodes for all
, has a Hamiltonian cycle; if an IEH graph has N=2n-1 nodes then it has only a Hamiltonian path, not cycle. These results enable us to obtain the good embedding of rings and linear arrays into the IEH graph.
A key issue in performing tree structured parallel computations is to distribute process components of a parallel program over processors in a parallel computer at run time such that both the maximum load and dilation are minimized. The main contribution of this paper is the application of recurrence relations in studying the performance of a dynamic tree embedding algorithm in hypercubes. We develop recurrence relations that characterize the expected load in randomized tree embeddings where, a tree grows by letting its nodes to take random walks of short distance. By using these recurrence relations, we are able to calculate the expected load on each processor. Therefore, for constant dilation embeddings, we are able to evaluate expected loads numerically and analytically. The applicability of recurrence relations is due to the recursive structure of trees and the fact that embeddings of the subtrees of a process node are independent of each other. Our methodology does not depend on the hypercube topology. Hence, it can be applied to studying dynamic tree growing in other networks.
In this paper we show the power of sampling techniques in designing efficient distributed algorithms. In particular, we apply sampling techniques in the design of selection algorithms on the hypercube and de Bruijn networks, and show that the message complexity of selecting an item from a set (file) is less sensitive to the cardinality of the set (file). Given a file with n keys, our algorithm performs a selection on a p-node de Bruijn network or hypercube using only O(p loglog n) messages and suffering a delay of O(τ log p loglog n), with high probability. Our selection scheme outperforms the existing approaches in terms of both message complexity and communication delay. Because of the lesser sensitivity of message complexity and communication delay of our algorithms to the file size, our distributed selection schemes are very attractive in applications where very large database systems are involved. Using our selection algorithms, we also show that both quicksort-based sorting scheme and enumeration sorting scheme can be developed for sorting large distributed files on the hypercube and de Bruijn networks. Both of our sorting algorithms outperform the existing distributed sorting schemes in terms of both message complexity and communication delay.
We study hierarchical configuration of distributed systems for achieving optimized system performance. A distributed system consists of a collection of local processes which are distributed over a network of processors, and work in a cooperative manner to fulfill various tasks. A hierarchical approach is to group and organize the distributed processes into a logical hierarchy of multiple levels, so as to coordinate the local computation/control activities to improve the overall system performance. It has been proposed as an effective way to solve various problems in distributed computing, such as distributed monitoring, resource scheduling, and network routing. The optimization problem considered in this paper is concerned with finding an optimal hierarchical partition of the processors, so that the total traffic flow over the network is minimized. The problem in its general form has been known to be NP-hard. Therefore, we just focus on distributed computing jobs which require collecting and processing information from all processors. By limiting levels of the hierarchy to two, we will establish the analytically optimal hierarchical configurations for two popular interconnection networks: mesh and hypercube. Based on analytical results, partitioning algorithms are proposed to achieve minimal communication cost (network traffic flow). We will also present and discuss heuristic algorithms for multiple-level hierarchical partitions.
Fault tolerance is an important issue in interconnection networks, and the traditional edge connectivity is an important measure to evaluate the robustness of an interconnection network. The component edge connectivity is a generalization of the traditional edge connectivity. The g-component edge connectivity cλg(G) of a non-complete graph G is the minimum number of edges whose deletion results in a graph with at least g components. Let g be an integer and g=∑si=02ti be the decomposition of g such that t0=[log2g], and ti=[log2(g−∑i−1r=02tr)] for i≥1. In this note, we determine the g-component edge connectivity of the hypercube Qn, cλg(Qn)=ng−(∑si=0ti2ti−1+∑si=0i⋅2ti) for g≤2[n2]+1,n≥7. Moreover, we classify the corresponding optimal solutions.
Hypercubes and Fibonacci cubes are classical models for interconnection networks with interesting graph theoretic properties. We consider k-Fibonacci cubes, which we obtain as subgraphs of Fibonacci cubes by eliminating certain edges during the fundamental recursion phase of their construction. These graphs have the same number of vertices as Fibonacci cubes, but their edge sets are determined by a parameter k. We obtain properties of k-Fibonacci cubes including the number of edges, the average degree of a vertex, the degree sequence and the number of hypercubes they contain.
In this paper, we study the fault-tolerant capability of hypercubes with respect to the hamiltonian property based on the concept of forbidden faulty sets. We show, with the assumption that each vertex is incident with at least three fault-free edges, that an n-dimensional hypercube contains a fault-free hamiltonian cycle, even if there are up to (4n−13) edge faults. Moreover, we give an example to show that the result is optimal with respect to the number of edge faults tolerated.
Large scale multiprocessor systems or multicomputer systems, taking interconnection networks as underlying topologies, have been widely used in the big data era. Fault tolerance is becoming an essential attribute in multiprocessor systems as the number of processors is getting larger. A connected graph G is called strong Menger (edge) connected if, for any two distinct vertices u and v, there are min{dG(u),dG(v)} vertex (edge)-disjoint paths between them. Exchanged hypercube EH(s,t), as a variant of hypercube Qn, remains lots of preferable fault tolerant properties of hypercube. In this paper, we show that Qn−Qk(1≤k≤n−1) and EH(s,t)−Qk(2≤k≤min{s,t}) are strong Menger (edge) connected, respectively. Moreover, as a by-product, for dual cube Dn=EH(n−1,n−1), one popular generalization of hypercube, Dn−Qk is also showed to be strong Menger (edge) connected, where 1≤k≤n−1, n≥3.
We introduce alternate Lucas cubes, a new family of graphs designed as an alternative for the well known Lucas cubes. These interconnection networks are subgraphs of Fibonacci cubes and have a useful fundamental decomposition similar to the one for Fibonacci cubes. The vertices of alternate Lucas cubes are constructed from binary strings that are encodings of Lucas representation of integers. As well as ordinary hypercubes, Fibonacci cubes and Lucas cubes, alternate Lucas cubes have several interesting structural and enumerative properties. In this paper we study some of these properties. Specifically, we give the fundamental decomposition giving the recursive structure, determine the number of edges, number of vertices by weight, the distribution of the degrees; as well as the properties of induced hypercubes, q-cube polynomials and maximal hypercube polynomials. We also obtain the irregularity polynomials of this family of graphs, determine the conditions for Hamiltonicity, and calculate metric properties such as the radius, diameter, and the center.
This paper analyses how the symmetry of a processor network influences the existence of a solution for the network orientation problem. The orientation of hypercubes and tori is the problem of assigning labels to each link of each processor, in such a way that a sense of direction is given to the network. In this paper the problem of network orientation for these two topologies is studied under the assumption that the network contains a single leader, under the assumption that the processors possess unique identities, and under the assumption that the network is anonymous. The distinction between these three models is considered fundamental in distributed computing.
It is shown that orientations can be computed by deterministic algorithms only when either a leader or unique identities are available. Orientations can be computed for anonymous networks by randomized algorithms, but only when the number of processors is known. When the number of processors is not known, even randomized algorithms cannot compute orientations for anonymous processor networks.
Lower bounds on the message complexity of orientation and algorithms achieving these bounds are given.