Loading [MathJax]/jax/output/CommonHTML/jax.js
Skip main navigation

Cookies Notification

We use cookies on this site to enhance your user experience. By continuing to browse the site, you consent to the use of our cookies. Learn More
×

System Upgrade on Tue, May 28th, 2024 at 2am (EDT)

Existing users will be able to log into the site and access content. However, E-commerce and registration of new users may not be available for up to 12 hours.
For online purchase, please visit us again. Contact us at customercare@wspc.com for any enquiries.

SEARCH GUIDE  Download Search Tip PDF File

  Bestsellers

  • articleNo Access

    THE RING-BANYAN NETWORK: A FAULT TOLERANT MULTISTAGE INTERCONNECTION NETWORK FOR MULTIPROCESSOR SYSTEMS

    In this paper, we propose a new fault tolerant multistage interconnection network (MIN) and a new adaptive self-routing scheme for the network. It can provide more multiple paths than the related previous networks between an input/output pair of a network by adding extra links between switching elements in the same stage and modifying the self-routing scheme of the banyan network. Our routing scheme is as simple as that of the banyan network, which is based on the topological relationships among the switching elements (SE’s) that render a packet to the same destination with the regular self-routing, which are discovered in this paper. We present an algebraic proof to show the correctness of this scheme, and an analytic reliability analysis to provide quantitative comparisons with other networks, which shows that the new network is more cost effective than the banyan network and other augmented MIN’s in terms of the reliability.

  • articleNo Access

    FAULT TOLERANT SYSTOLIC EVALUATION OF POLYNOMIALS AND EXPONENTIALS OF POLYNOMIALS FOR EQUISPACED ARGUMENTS USING TIME REDUNDANCY

    Many applications which require high speed evaluation of polynomials and exponentials of polynomials can now be implemented in the hardware very efficiently because of the advances in VLSI technology. Several fast algorithms have been proposed in the recent past for the efficient evaluation of polynomials and exponentials of polynomials for equispaced arguments on uniprocessor systems. In this paper, we consider the problem of organizing this evaluation on VLSI chips in the form of systolic arrays. We present linear fault tolerant systolic arrays which can evaluate the polynomials and exponentials of polynomials of any degree for a large number of equispaced points. These organizations have the main advantage that the interconnections between the processing elements are very regular and simple, and hence are very appropriate for VLSI implementation.

  • articleNo Access

    DESIGNING FAULT TOLERANT ALGORITHMS FOR RECONFIGURABLE MESHES

    This paper proposes a procedure to design fault tolerant algorithms for the R-Mesh and some of its restrictive variations. This procedure first identifies a healthy sub-mesh from a faulty model using the bypass and removal fault model. Then it uses scalable algorithms to simulate the larger faulty model on the resulting healthy sub-mesh. The algorithms for the bypass model tolerates n faults in an n×n R-Mesh (LR-Mesh) and runs in O(T log n) (O(T)) time, where T is the execution time on the original mesh without faults. For the removal model, we design fault tolerant algorithms for some interesting variations of the R-Mesh, specifically, the NXR-Mesh and the NXLR-Mesh. We propose the first scaling simulations for these models and present a simulation of the R-Mesh on the NXR-Mesh. The results of this paper enable us to consider certain reconfigurable models in a more practical environment than previously allowed.

  • articleNo Access

    MOBILE AGENT BASED CHECKPOINTING WITH CONCURRENT INITIATIONS

    Traditional message passing based checkpointing and rollback recovery algorithms perform well for tightly coupled systems. In wide area distributed systems these algorithms may suffer from large overhead due to message passing delay and network traffic. Mobile agents offer an attractive option for designing checkpointing schemes for wide area distributed systems. Network topology is assumed to be arbitrary. Processes are mobile agent enabled. When a process wants to take a checkpoint, it just creates one mobile agent. Concurrent initiations by multiple processes are allowed. Synchronization and creation of a consistent global state (CGS) for checkpointing is managed by the mobile agent(s). In the worst case, for k concurrent initiations among n processes, checkpointing algorithm requires a total of O(kn) hops by all the mobile agents. A mobile agent carries O(n/k) (on the average) size data.

  • articleNo Access

    A MINIMUM-PROCESS COORDINATED CHECKPOINTING PROTOCOL FOR MOBILE COMPUTING SYSTEMS

    Checkpoint is a designated place in a program at which normal process is interrupted specifically to preserve the status information necessary to allow resumption of processing at a later time. A checkpoint algorithm for mobile distributed systems needs to handle many new issues like: mobility, low bandwidth of wireless channels, lack of stable storage on mobile nodes, disconnections, limited battery power and high failure rate of mobile nodes. These issues make traditional checkpointing techniques unsuitable for such environments. Minimum-process coordinated checkpointing is an attractive approach to introduce fault tolerance in mobile distributed systems transparently. This approach is domino-free, requires at most two checkpoints of a process on stable storage, and forces only a minimum number of processes to checkpoint. But, it requires extra synchronization messages, blocking of the underlying computation or taking some useless checkpoints. In this paper, we design a minimum-process checkpointing algorithm for mobile distributed systems, where no useless checkpoint is taken. We reduce the blocking of processes by allowing the processes to do their normal computations, send messages and receive selective messages during their blocking period.

  • articleNo Access

    A DISTRIBUTED APPROXIMATION ALGORITHM FOR FAULT-TOLERANT METRIC FACILITY LOCATION

    In this paper, we propose an approximation algorithm for the Fault-Tolerant Metric Facility Location problem which can be implemented in a distributed and asynchronous manner within O(n) rounds of communication, where n is the number of vertices in the network. Given a constant size set formula which represents distinct levels of fault-tolerant requirements of all cities, as well as the two-part (facility and connection) cost of the optimal solution, i.e. F* + C*, the cost of our solution is no more than formula for the general case, and less than F* + 2C* for the special case where all cities have a uniform connectivity requirement. Extensive numerical experiments showed that the quality of our solutions is comparable (within 4% error) to the optimal solution in practice.

  • articleNo Access

    CONDITIONAL FAULT DIAGNOSABILITY OF DUAL-CUBES

    The growing size of the multiprocessor system increases its vulnerability to component failures. It is crucial to locate and replace the faulty processors to maintain a system's high reliability. The fault diagnosis is the process of identifying faulty processors in a system through testing. This paper shows that the largest connected component of the survival graph contains almost all of the remaining vertices in the dual-cube DCn when the number of faulty vertices is up to twice or three times of the traditional connectivity. Based on this fault resiliency, this paper determines that the conditional diagnosability of DCn (n ≥ 3) under the comparison model is 3n − 2, which is about three times of the traditional diagnosability.

  • articleNo Access

    Conditional Fault Tolerance of Hypermesh Optical Interconnection Networks

    Fault tolerance is especially important for interconnection networks, vastly influencing the performance of the parallel processing systems underlying the corresponding networks. This paper studies the fault tolerance of radix-kn-dimensional hypermesh optical interconnection networks, determines the connectivity of partial hypermesh, and derives the conditional connectivity of hypermesh provided that each adjacent set cannot be faulty simultaneously. Under this condition, the hypermesh networks can tolerate up to 2n(k-1)-k-1 fault processors without being disrupted, implying that when the number of dimension n (respectively, radix-k) is a fixed value in the hypermesh network, the larger the value of radix-k (respectively, dimension n) is, the higher the reliability and availability of the network becomes.

  • articleNo Access

    Link Failure Tolerance in the Arrangement Graphs

    The arrangement graph An,k is one of the attractive underlying topologies for distributed systems. Let fm(n, k) be the minimum number of faulty links that make every sub-arrangement graph An-m,k-m faulty in An,k under link failure model. In this paper, we proved that formula, formula, and formula for 2 ≤ m ≤ k − 2.

  • articleOpen Access

    Fault-Tolerant Panconnectivity of Augmented Cubes AQn

    The augmented cube AQn is a variation of the hypercube Qn. This paper considers the fault-tolerant Panconnectivity of AQn. Assume that FV(AQn)E(AQn) and n4. We prove that for any two fault-free vertices u and v with distance d in AQn, there exists a fault-free path Puv of each length from max{d+2,4} to 2nfv1 in AQnF if |F|2n4, where fv is the number of faulty vertices in AQn. Moreover, the bound is sharp.

  • articleNo Access

    (n2)-Fault-Tolerant Edge-Pancyclicity of Crossed Cubes CQn

    As one of the most fundamental networks for parallel and distributed computation, cycle is suitable for developing simple algorithms with low communication cost. A graph G is called k-fault-tolerant edge-pancyclic if after deleting any faulty set F of k vertices and/or edges from G, every correct edge in the resulting graph lies in a cycle of every length from g to |V(GF)|, inclusively, where g is the girth of G, the length of a shortest cycle in G. The n-dimensional crossed cube CQn is an important variant of the hypercube Qn, which possesses some properties superior to the hypercube. This paper investigates the fault-tolerant edge-pancyclicity of CQn, and shows that if CQn(n5) contains at most n2 faulty vertices and/or edges then, for any fault-free edge uv and every length from 6 to |V(CQnF)| except =7, there is a fault-free cycle of length containing the edge uv. The result is optimal in some senses.

  • articleNo Access

    Subnetwork Preclusion of (n,k)-Star Networks

    The (n,k)-star graph Sn,k, which is introduced to address scaling issues of the star graph, is recognized as an attractive interconnection network topology for building multiprocessor systems because of its desirable properties. Let Fm(Sn,k) be the minimum number of faulty vertices that make every subgraph isomorphic to Snm,km faulty in Sn,k under vertex-failure model, where 0mk1. In this paper, we prove that F0(Sn,k)=1, F1(Sn,k)=n, Fk1(Sn,k)=n!/(nk+1)!, and n!/(nm)!Fm(Sn,k)(k2m1)n!/(nm)!2(k3m1)n!/(nm+1)! for 1kn2 and 2mk2.

  • articleNo Access

    ROUTING ALGORITHMS FOR DOUBLE LOOP NETWORKS

    We give a new routing algorithm for double loop networks with n nodes which requires O(log n) time for preprocessing and constant processing time at each node on the route. A simple modification of the algorithm works for the case of a single fault (either node or link). The routing is always through a shortest path and the only information needed by a node to process is the address of the destination.

  • articleNo Access

    Fault Tolerance on Star Graphs

    The capability of fault tolerance is one of the advantages of multiprocessor systems. In this paper, we prove that the fault tolerance of an n-star graph is 2n-5 with restriction to the forbidden faulty set. And we propose an algorithm for examining the connectivity of an n-star graph when there exist at most 2n - 4 faults. The algorithm requires O(n2log n) time. Besides, we improve the fault-tolerant routing algorithm proposed by Bagherzadeh et al. by calculating the cycle structure of a permutation and the avoidance of routing message to a node without any nonfaulty neighbor. This calculation needs only constant time. And then, we propose an efficient fault-tolerant broadcasting algorithm. When there is no fault, our broadcasting algorithm remains optimal. The penalty is O(n) if there exists only one fault, and the penalty is O(n2) if there exist at most n - 2 faults.

  • articleNo Access

    Design and Analysis of the Symmetric Banyan Network (SBN): A Min with High Performance and High Fault Tolerance

    A new MIN called the Symmetric Banyan Network (SBM) is presented in this paper. In the SBN, 4 × 4 switching elements are used and they are connected symmetrically between the upper and lower parts of the network. There are 2N paths for every source-destination pair. The SBN is basically single-fault tolerant, but can tolerate up to three faults, with more elegant routing, except in the first and last stages which are still single-fault tolerant. And full accessibility is preserved even in some instances when half of the network is in fault. The routing of the SBN is self-adaptive in the presence of a fault. The throughput analysis of the SBN is done using computer simulations and shows that the SBN performs better than the Itoh's network, the ASEN (Augmented Shuffle Exchange Network) and the crossbar network. We analyze the cost/performance of the SBN against the MINs with multiple banyan networks such as MBSF (Multi Banyan Switching Fabric) and the PBSF (Piled Banyan Switching Fabric) and the analysis shows that the SBN is also attractive in terms of the cost.

  • articleNo Access

    THE CUBE-OF-RINGS INTERCONNECTION NETWORK

    We present a family of interconnection networks named the Cube-Of-Rings (COR) networks along with their basic graph-theoretic properties. Aspects of group graph theory are used to show the COR networks are symmetric and optimally fault tolerant. We present a closed-form expression of the diameter and optimal one-to-one routing algorithm for any member of the COR family. We also discuss the suitability of the COR networks as the interconnection network of scalable parallel computers.

  • articleNo Access

    SHUFFLE-RING: A NEW CONSTANT-DEGREE NETWORK

    The hypercube as a parallel interconnection network has been of academic and engineering concern for tens of year due to its many merits. However, its increasing node degree is an obvious weakness. Some networks such as the cube-connected cycles (CCC) and the de Bruijn network have been proposed to overcome the increasing degree of the hypercube. In this paper, we present a new cost-effective network which outperforms the cube network. It can overcome the increasing degree of cube networks while keeping the advantages of the cube network such as logarithmic diameter, easy routing, optimal fault tolerance, and suitability for the ASCEND/DESCEND class of parallel problems. Furthermore, the proposed network achieves the logarithmic diameter with a very small constant node degree, 3 or 4.

  • articleNo Access

    NANOCOMPONENTS: A DESIGNER'S POINT OF VIEW

    This paper gives a system designer's point of view regarding the emergence of the many new devices that are being considered as possible replacement of the Metal Oxyde Semiconductor Field effect transistor. The first part is a tentative to define criteria for the systemability of nanodevices. The second part is a short overview of the evolution of information processing systems. The third part is a more specific study of the demultiplexing and fault tolerance problems.

  • articleNo Access

    STUDYING PROBABILISTIC FAULTS IN EVOLVED NON-UNIFORM CELLULAR AUTOMATA

    We study the effects of random faults on the behavior of one-dimensional, non-uniform cellular automata (CA), where the local update rule need not be identical for all grid sites. The CA systems examined were obtained via an approach known as cellular programming, which involves the evolution of non-uniform CAs to perform non-trivial computational tasks. Using the "system replicas" methodology, involving a comparison between a perfect, non-perturbed version of the CA and a faulty one, we find that our evolved systems exhibit graceful degradation in performance, able to tolerate a certain level of faults. We then "zoom" into the fault-tolerant zone, where "good" computational behavior is exhibited, introducing measures to fine-tune our understanding of the faulty CAs' operation. We study the error level as a function of time and space, as well as the recuperation time needed to recover from faults. Our investigation reveals an intricate interplay between temporal and spatial factors, with the presence of different rules in the grid giving rise to complex dynamics. Studies along this line may have applications to future computing systems that will contain thousands or even millions of computing elements, rendering crucial the issue of resilience.

  • articleNo Access

    MPI-FT: PORTABLE FAULT TOLERANCE SCHEME FOR MPI

    In this paper, we propose the design and development of a fault tolerant and recovery scheme for the Message Passing Interface (MPI). The proposed scheme consists of a detection mechanism for detecting process failures, and a recovery mechanism. Two different cases are considered, both assuming the existence of a monitoring process, the Observer which triggers the recovery procedure in case of failure. In the first case, each process keeps a buffer with its own message traffic to be used in case of failure, while the implementor uses periodical tests for notification of failure by the Observer. The recovery function simulates all the communication of the processes with the dead one by re-sending to the replacement process all the messages destined for the dead one. In the second case, the Observer receives and stores all message traffic, and sends to the replacement all the buffered messages destined for the dead process. Solutions are provided to the dead communicator problem caused by the death of a process. A description of the prototype developed is provided along with the results of the experiments performed for efficiency and performance.