Asymptotic Confidence Limits for a Repairable System with Standbys Subject to Switching Failures

This paper studies system performance measures and asymptotic confidence limits for the mean time to failure, steady-state availability, steady-state failure frequency of a repairable system which two primary units, two standby units, and one repair facility when switching to standbys may fail.


INTRODUCTION
Traditionally, most research about the reliability (availability) of a repairable system with cold or warm standbys assumes that the switchover from a standby to an operational unit is perfect. However, this might be unrealistic. Although a system with cold standbys has the advantage of a zero failure rate, there are also drawbacks, such as a higher probability of switching failure and longer warm-up times. In this article, we study the reliability and availability characteristics of a system with two primary units when switching failures may occur for cold or warm standbys. In other words, a standby unit with a lower failure rate might have a higher probability of switching failure. We not only investigate the impact of the switching failure to the reliability and availability characteristics of the system but also present the behavior of asymptotic confidence limits for the system performance measures.
Repairable systems are usually studied with reference to the evaluation of their performance measures in terms of reliability and availability. Lewis [7] first introduced the concept of the standby switching failures in the reliability with standby system. Chung [3] has ever provided the reliability of k active and s cold standbys with multiple repair facilities and multiple critical and non-critical errors when the switching mechanism is subject to failure. He derived the reliability function in terms of LST of system state probabilities, which is very complicated and is generally unsuitable for computational purposes. As for the analysis of two-unit redundant systems, different assumptions have been studied extensively in the past, and a detailed bibliography is found in Srinivasan and Subramanian [15] . However, many of the analysis always consider that the switchover from a standby to an operational unit is perfect (see Goel and Shrivastava [4] , Shi and Li [12] , Gururajan and Srinivasan [5] , Shi and Liu [13] , Rajamanickam and Chandrasekar [10] , Sridharan and Mohanavadivu [14] , and others). Confidence limits for availability and reliability of the two-unit redundant systems were investigated by Abu-Salih et al. [1] , Jie [6] , and Masters et al. [9] . Recently, Yadavalli et al. [16] examined asymptotic confidence limits for the steadystate availability of a two-unit parallel system with the introduction of preparation time for the service station. Chandrasekhar et al. [2] derived a consistent asymptotically normal estimator and an asymptotic confidence interval for the steady-state availability of a two-unit cold standby system in which the failure rate of the unit while online is a constant and the repair time distribution is a two-stage Erlangian. This paper extends their statistical inference for system availability to encompass other useful performance measures in more realistic systems.
The main objective of this paper is to study asymptotic confidence limits for the mean time to failure (MTTF), steady-state availability, and failure frequency of a two-unit repairable system with standbys subject to switching failures. Problem formulation and assumptions are given in Section 2. System reliability and availability are developed in Sections 3 and 4, estimation and confidence limits are developed in Sections 5 and 6, and results are numerically illustrated in Section 7. Section 8 provides an example, and the final section concludes.

Problem Formulation and Notation:
In this paper, we consider a system which consists of two identical primary units operating simultaneously in parallel, two standby units (which may be hot, warm, or cold), and a reliable service station.
The assumptions of the model are described as follows. Suppose that primary and standby failures occur independently of the states of other units and follow exponential distributions with parameters λ and , respectively. In particular, a cold standby has 0 = α and a hot standby has λ α = . When a primary unit fails, it is immediately replaced by a standby if one is available. It is assumed that the switchover time is instantaneous. However, the switch to a primary unit is imperfect; the switching failure probability q depends on the state of the standby unit and decreases as α increases. In particular, a hot standby has 0 = q . If a standby unit fails to switch to a primary unit, the next available standby unit attempts to switch. This process continues until switching is successful or all the standby units fail. When a standby unit switches over successfully, its failure characteristics become those of a primary unit. If a primary or a standby unit fails, it is immediately sent to the service station where service is performed on the first come first served (FCFS) convention. It is assumed that the service station can serve only one failed unit at a time and that service is independent of the number of unit failures. In addition, the time to repair a failed unit is exponentially distributed with parameter µ . Once a unit is repaired, it instantly resumes standby status.
In this research, system reliability and availability characteristics are studied under the assumption that the system fails if the number of primary units is less than two; that is, three units fail. Therefore, the system fails if and only if 2 < i , where i denotes the number of primary units in the system. Such model has potential applications in both industrial and military systems. For example, in an air plane with four engines, it may be possible to fly the plane if only two engines functioning.
However, if less than two engines function, the plane will fail to fly (see Li and Chen [8] ).
Before further developing the model, we first present the notation used in later sections.

RELIABILITY ANALYSIS OF THE SYSTEM
At time t = 0, the system commences operation with no failed units (and includes two primary units and two standby units) and an idle service station. That is, the initial conditions for this system are given by The reliability and availability characteristics with switching failure probabilities under exponential failure times and exponential service times and can be developed through the birth and death process. Let P(i, j; t) denote the probability that there are i primary units and j standby units in the system at time t, where i = 1, 2, j = 0, 1, 2, and 0 ≥ t . The differential equations governing the state probabilities of this system are: Taking Laplace Transforms of both sides in (2)-(5) and using initial condition (1), these equations can be reduced to This system of linear equations can be solved to yield and )], , the probability that the system fails at time t. That is, ) t ; 0 , 1 ( P is the probability that the system fails at or before time t. Thus the reliability of the system is Let T be the time to failure of the system; the Laplace transform of the failure density From the listed above equations, we have Instead of inverting this expression (14) to get the distribution of T, we will be content with obtaining the mean time to failure (MTTF) using the derivative of ) ( s T with respect to s while s=0:

AVAILABILITY ANALYSIS OF THE SYSTEM
This section we will investigate the steady-state system availability and frequency. A set of differential equations for the availability case can be established in a manner similar to that used for the reliability analysis in Section 3. The first two equations are the same as (2) and (3). However, the equations (4) and (5) are rewritten in the following forms: Moreover, we still need an equation for state (0, 0), which governs the system given by: In steady-state, let ) ; , and hence the balance equations can be reduced: Since both states (0,0) and (1,0) are system down states, the steady-state availability of the system is given by From (29), the steady-state unavailability is and the downtime in minutes per year is 60 8760 Using the results by Shi and Liu [13] , the failure frequency of the system in steady-state is expressed as  (15), we finally obtain Furthermore, let ) 0 , 0 ( P be an estimator of ) 0 , 0 ( P (the probability of all units failed including standby in the system). From (28), it yields Using the results (24)-(27) in the previous sections, we easily obtain the estimators of ) From (29)-(30), we can obtain an estimator of and an estimator of (34)

Confidence Limits For MTTF, Availability, And
Failure Frequency: From the discussion in the previous sections, we know that are real-valued functions in X , Y , and Z which are also differentiable. Using the application of the multivariate central limit theorem (see Rao [11] ), it follows that ). , , ( Using the result by Rao [11] again, we have ). , , (  [17]   ) (∞ A decreases as either α or q increases, (iii) ) (∞ F increases as either α or q increases. Fig.3 depicts that (i) MTTF decreases as λ increases or µ decreases, (ii) ) (∞ A increases as λ decreases or µ increases, and (iii) ) (∞ F increases as λ increases or µ decreases.
Next, we perform asymptotic confidence limits for , for various values of system parameters. The following cases are analyzed to study the effects of various parameters on the estimating behavior of the system performance measures: results are respectively shown in Fig.4 and Table 1.  as the sample size gets larger. It should also be noted that the confidence intervals narrow as the sample size grows.

ROBUSTNESS OF CONSISTENT ASYMPTOTIC NORMAL (CAN) ESTIMATOR FOR
In order to see how good the normal approximation based on proposed above is, a simulation study of sensitivity is carried out to check on how accurate of this approximation is.
be a sequence of independent and identically distributed 3-dimensional random vectors. From each single simulation (replication) Ω with sample size n, we can obtain . The histograms of these are shown in Fig.5-7, respectively, based on N=1,000 replications. As expected the spread of the distributions will decrease with increasing sample size n by the law of large numbers (see De Groott [18]

AN EXAMPLE
Consider a duplex system consists of two processors connected in parallel. Besides the two primary units, there are two standby units so that when a unit breaks down, a standby unit is immediately substituted and thus the reliability of the system is improved. Units in operation or standby state are subject to breakdowns which occur by a Poisson process. When a unit is broken, it is repaired by an operator. System reliability characteristics are defined as previous section. During a sufficiently large amount of time 0 t (in order to obtain enough information), the managers collect three sets of thirty observations concerning failures: time to failure of primary units ) ( i X , to failure of standby units (

CONCLUSIONS
In this paper, we study a two-unit system with standbys and switching failures. We derive the explicit expressions for the system performance measures such as MTTF, steady-state availability, and failure frequency. Some numerical illustrations are performed.
The results indicate that the performances of this system are different from those of a system without switching failures. Confidence interval formulas for the MTTF, steady-state availability, and failure frequency are developed for this redundant repairable system with standbys and switching failures. We also provide the numerical simulations to examine the statistical behavior of varying the switching failure probability q, which gain some further insight on the system performance measures.