Please login to be able to save your searches and receive alerts for new content matching your search criteria.
Many crucial dependable and secure services including atomic commitment, consensus and group membership, and middleware services (such as replica, communication and transaction services) use fault detectors. Through the use of fault detectors, the overlying service can be exempted from failure treatment and synchronization requirements. Fault detection is essential for proving that the services carried out are correct.
In this paper, we first identify the necessary conditions to detect faults in a message passing system where multiple disjoint paths exist between each pair of endpoints. We then present the first fault detection protocol capable of detecting message meta-data modification in the presence of various message interferences in addition to other faults including omission faults, message replay and spurious messages using disjoint paths, where paths with faults are not known a priori. In addition, it authenticates message origins allowing Sybil attacks to be detected, identifies faulty paths, and classifies faults in the presence of multiple messages sent by various system processes. We establish the completeness and soundness properties of the proposed algorithm, i.e., it detects each considered fault and each detected fault is an actual fault, respectively. We also show that our algorithm does not yield a significant packet size and delay overheads. The algorithm shows the viability of the use of disjoint paths in fault detection.
We take advantage of the hierarchical structure of the star graph network to obtain an efficient method for constructing node-disjoint paths between arbitrary pairs of nodes in the network. A distributed fault-tolerant routing algorithm for the star network based on this construction method is then presented and evaluated. The proposed algorithm adapts the routing decisions in response to node failures. Node failure and repair conditions may arise dynamically (at any time) provided that the total number of faulty nodes at any given time is less than the node-connectivity n - 1 of the n-star. When a message is blocked due to faulty components, the source of the message is warned and requested to switch to a different node-disjoint path. The methods used to identify the paths, to propagate failure information back to source nodes, and to switch from a routing path to another incur little communication and computation overhead. We show that if the node failures occur 'reasonably' apart in time, then all messages will be routed on paths of length δ + ε where δ is the minimum distance between the source and the destination and ε is 0, 2, or 4. In the unlikely case where more failures occur in a 'short period', the algorithm still delivers all messages but via possibly longer paths.
The node-to-set parallel routing problem for a k-connected network Γ is as follows: given a node s and k other nodes {t1, t2, … , tk} in Γ, find k node-disjoint paths connecting s and ti, for 1 ≤ i ≤ k. From the viewpoint of applications in synthesizing fast and resilient collective communication operations, it is desirable to make the parallel paths as short as possible. Building such paths is a nontrivial problem for a general network. Optical transpose interconnection system (OTIS, also known as swapped) networks, a class of hierarchical structures built of n identical n-node factor networks, are known to be maximally fault-tolerant for any connected factor network, implying that they have maximal connectivity. We propose a general algorithm for the node-to-set parallel routing problem in OTIS/swapped networks that yields paths of length no greater than D + 4 in O(Δ2 + Δf(n)) time, where D and Δ represent the diameter and degree of the OTIS network and O(f(n)) is the time complexity of a shortest-path routing algorithm for the n-node factor network. Our node-to-set routing algorithm is shown to have optimal time complexity for certain OTIS networks of practical interest, including OTIS-Mesh and OTIS-Hypercube.