Reliability Evaluation of Distributed Computer Systems Subject to Imperfect Coverage and Dependent Common-Cause Failures
Liudong Xing and Akhilesh Shrestha
DOI : 10.3844/jcssp.2006.473.479
Journal of Computer Science
Volume 2, Issue 6
Imperfect coverage (IPC) occurs when a malicious component failure causes extensive damage due to inadequate fault detection, fault location or fault recovery. Common-cause failures (CCF) are multiple dependent component failures within a system due to a shared root cause. Both imperfect coverage and common-cause failures can exist in distributed computer systems and can contribute significantly to the overall system unreliability. Moreover they can complicate the reliability analysis. In this study, we propose an efficient approach to the reliability analysis of distributed computer systems (DCS) with both IPC and CCF. The proposed methodology is to decouple the effects of IPC and CCF from the combinatorics of the solution. The resulting approach is applicable to the computationally efficient binary decision diagrams (BDD) based method for the reliability analysis of DCS. We provide a concrete analysis of an example DCS to illustrate the application and advantages of our approach. Due to the consideration of IPC and CCF, our approach can evaluate a wider class of DCS as compared with existing approaches. Due to the nature of the BDD and the separation of IPC and CCF from the solution combinatorics, our approach has high computational efficiency and is easy to implement, which means that it can be easily applied to the accurate reliability analysis of large-scale DCS subject to IPC and CCF. The DCS without IPC or CCF appear to be special cases of our approach.
© 2006 Liudong Xing and Akhilesh Shrestha. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.