Please login to be able to save your searches and receive alerts for new content matching your search criteria.
No fault found (NFF) is a major threat in extremely dependable high-end process node integrated systems, in e.g., avionics. One category of NFFs is the intermittent resistive fault (IRF), often originating from bad (e.g., via- or TSV-based) interconnections. This paper will show the impact of these faults on the behavior of a digital CMOS circuit via simulation. As the occurrence rate of this kind of defects can take e.g., one month, while the duration of the defect can be as short as 50ns, thus to evoke and detect these faults is a huge scientific challenge. Two methods to detect short pulses induced by IRFs are proposed. To improve the task of maintenance of avionics and reduce the current high debugging costs, an on-chip data logging system with time stamp and stored environmental conditions is introduced. Finally, a hardware implementation of an IRF generator is presented.
At present, a cloud computing is attracting attention as a network service to share the computing resources, i.e., networks, servers, storage, applications, and services. We focus on a cloud computing environment by using open source software such as OpenStack and Eucalyptus because of the unification management of data and low cost. In this paper, we propose a new approach to software dependability assessment based on stochastic differential equation modeling in order to consider the interesting aspect of the numbers of components, cloud applications, and users. Also, we analyze actual data to show numerical examples of software dependability assessment considering such characteristics of cloud computing. Moreover, we discuss the determination of optimum software maintenance times minimizing the total expected software cost.
The Failure Mode and Effects Analysis (FMEA) documents single failures of a system, by identifying the failure modes, and the causes and effects of each potential failure mode on system service and defining appropriate detection procedures and corrective actions. When extended by Criticality Analysis procedure (CA) for failure modes classification, it is known as Failure Mode Effects and Criticality Analysis (FMECA). The present paper presents a literature review of FME(C)A, covering the following aspects: description and review of the basic principles of FME(C)A, types, enhancement of the method, automation and available computer codes, combination with other techniques and specific applications. We conclude with a discussion of various issues raised as a result of the review.
For fault-tolerant computer systems (FTCS) supporting critical applications, it is of key importance to be able to answer the question of whether they indeed fulfill the quality of service requirements of their users. In particular, answers related to the combined performance and dependability of the FTCS are important. To facilitate these so-called performability studies, we present DYQNTOOL+, a performability evaluation tool based on the dynamic queuing network concept, that allows for a combined modeling of system performance and dependability. Different from other performability evaluation tools, DYQNTOOL+ combines two different modeling paradigms, i.e., queuing networks and stochastic Petri nets, for respectively the performance and the dependability aspects of the system under study. The mutual relations between these two model parts, such as workload-induced failures and performance decreases due to failures, are explicitly modeled as well. By the above choice for such a combination of modeling paradigms, the modeling can be done in greater detail, thereby often revealing system behavior that cannot be revealed otherwise. We present the dynamic queuing network modeling approach and its implementation in DYQNTOOL+, as well as illustrate its usage by addressing a number of examples.
A methodology for reliability and safety analysis of a certain class of repairable fault-tolerant systems is presented. The analysis leads to a closed-form approximation of the probability of an absorbing state in a state transition diagram. This expression can provide insight into the relationship between various system parameters and system reliability and/or safety. The approximation technique is based on the combination of results from the analysis of several failure mechanisms, each studied by itself, into an expression for the approximate reliability of a system. The resulting approximation error can be analyzed in order to evaluate whether or not the approximation is useful in a given situation.
Wireless networked sensors are envisioned to play a vital role in the emerging pervasive service platforms that will instrument a wide range of next generation civil and military applications. Recently, we have proposed ANSWER: AutoNomouS Wireless sEnsor netwoRk as a service platform whose mission is to provide dependable information services to in-situ mobile users while satisfying their quality-of-service (QoS) requirements. Alongside with the stationary tiny sensors, the network employs more powerful mobile devices referred to as aggregation and forwarding nodes (AFNs). ANSWER exploits AFN mobility to support QoS requirements. However, as an AFN moves closer to an event (for example, a hazardous spill), it may be at risk (for example, due to potential damage). In this paper, we present a quantitative analysis of the interplay and balance between QoS support and asset safety. We propose a new scheme, called SAFER (for SAFEty-aware Relocation), which pursues relocation of the AFN in order to boost network performance without unnecessarily compromising AFN safety. For the relocation process to take place, SAFER uses historical data on detected events and employs an evolutionary neural network to assess the risk involved and predict good quality new position(s). Experimental results demonstrate the effectiveness of SAFER.
A large number of functionalities have been integrated into a single fully autonomous humanoid robot, HERMES. To evaluate the dependability of this extremely complex machine, and its ability to interact with strangers, HERMES was exhibited in a museum, far away from its home laboratory, for more than six months. During this period the robot and its skills were regularly demonstrated to the public by non-expert presenters up to 12 hours per day. Also, HERMES interacted with the visitors, chatted with them in English, French and German, answered questions and performed services as requested by them. Only three major failures occurred during the 6-month period, all of them caused by failures of commercially available modules that could easily be replaced. The key to this success was the dependability that had been originally designed into HERMES. During the design process certain design principles were followed in both hardware and software. These principles are introduced, and some long- and short-term experiments carried out with the real robot in real environments are presented. In fact, by demonstrating HERMES in the museum, at trade fairs and in TV studios — besides our institute environment — we have learned valuable lessons, especially regarding the interaction of a complex robotic assistant with unknown humans. Although we did not quantitatively evaluate the robot's performance or acceptance by the non-expert users, several qualitative results are given in this paper, and many videos highlighting these results can be downloaded from the HERMES homepage.
In this study, we propose a formal approach for the analysis of robustness with regard to human-made faults in procedural manuals. In many cases, fault detection and recovery tasks are embedded in procedural manuals that allow us to detect human-made faults. This investigation is important in preserving the trust of workflows in safety critical domains. We define the formal semantics of human-made faults according to Discrete Action Classification, which classifies fault actions of experts in a domain — the formal definition of the faults is based on a transition system. Using these semantics, we inject faults into a fault-free model of a procedural manual. After that, we verify that the model embedded with faulty tasks satisfies a property as the goal of a procedural manual by model checking. A model checking tool is given the opposite of the formula as a goal, and if this tool reports counter examples, we say that the procedural manual is not robust. We apply our framework to a procedural manual that a nurse executes for taking blood samples. In the case, we conclude that the procedural manual is not robust because that tool reports counter examples.