Bayesian Methods for Ranking the Severity of Apnea among Patients

Problem statement: Studies on apnea patients are often carried out based on data obtained from the sleep study. This data is quite scarce since high cost is required for conducting the study. Bayesian method is particularly suitable for analyzing limited data as it allows for updating of information by combining the current information with the prior belief. Approach: In this study we demonstrated the use of Bayesian methods to rank the severity of apnea for 14 patients, based on the posterior mean of the rate of occurrence of apnea. Results: The results indicated from the comparison using three different prior distribution for the underlying rate of occurrence of apnea, that is improper, gamma and log-normal priors, the ranking of patients in terms of severity of apnea are the same, regardless of the choice for the prior distributions. Conclusion: In conclusion the model fitting was found to be slightly better when based on gamma prior.


INTRODUCTION
The problem of apnea is a matter of concern since it can cause daytime sleepiness, where in some cases it may contribute to automobile accident. According to Young et al. (1993), the prevalence of apnea is 2 and 4% for women and men respectively based on a study in Wisconsin, United States. Apnea is a condition where one experiences sleeping disorder, causing disruption on the quality of sleep. People who suffer from apnea are not comfortable in their sleep due to the presence of a blockade in their throat during their sleep. The blockade is caused by a complete or partial cessation of airflow during the sleep, contributing to a pause in breathing for at least 10 sec. The case where a patient experiences partial apnea is known as hypopnea. The recurrence of apnea and hypopnea could occur repeatedly during sleep (Riley et al. 1990).
Patients who have been suspected to suffer from sleeping disorder will usually be referred to a sleep clinic or sleep laboratories (Quan et al., 1997). They will be asked to undergo a polysomnography test or sleep study under the supervision of the medical practitioners. Under a polysomnography test, signal information found based on Electroencephalography (EEG), Electrocardiogram (ECG) and Electromyography (EMG) are recorded to identify sleep stages of the patients and the events of apnea and hypopnea. According to Quan et al. (1997) this test is costly because it involves well-trained technicians to use the costly equipments and medical doctors to diagnose whether or not a particular patient suffers apnea.
Since the data on apnea patients is quite scarce due to the high cost for conducting polysomnography, Bayesian methods are particularly suitable to be used for analyzing the limited data as compared to the classical techniques. For example, Shi et al. (2005) conducted a hypotheses testing technique for comparing the effectiveness of two types of automatic Continuous Positive Airway Pressure (auto-CPAP) for treatment of apnea patients. The results found based on 21 and 22 subjects for each group respectively, indicate no significant difference between the two treatments. If Bayesian technique is used, one could provide a quantitative estimate on the effectiveness of each of the treatment, rather than having an inconclusive result of no difference between the treatment. Flemons and McNicholas (1997) have suggested prediction of the probability of patients having apnea using Bayes' theorem based on the combination of pretest probability of apnea using estimate from experienced clinician and the results from diagnostic tests. In this study we use Bayesian method for ranking the severity of apnea among 14 patients based on the records of a polysomnography database available from www.physiobank.net website.

MATERIALS AND METHODS
The data on apnea is obtained from a polysomnography which was originated from Rigney et al. (1994). This database involves collaboration between Massachusetts Institute of Technology and Beth Israel Hospital (MIT-BIH). The database consists of the records of polysomnography for 14 males of age between 32 and 56 years.
Bayesian model of occurrence of apnea: Suppose we consider every epoch, each of length Δt a small interval of length Δt in every epoch is considered. In this study we consider each epoch is of length 30 sec. For each i th individual, we assume that the probability of observing an occurrence of apnea in each epoch is λ i Δt. The probability of occurrence of more than one apnea event during the period Δt is negligible. For the i th individual, we are interested in observing the occurrence of apnea in each epoch and the total number of apnea events during the period t i .
Let Y i denotes the total number of occurrence of apnea events during the duration of sleep t i for the i th individual. Since the occurrence of apnea is a Poisson process, then we can say that Y i follows a Poisson distribution with parameter λ i t i which can be written as: Using the basic mechanism in the Bayesian approach which is: We have: Where: f(λ i |y i , t i ) = The posterior density function for λ i given y i and t i g(λ i ) = The prior distribution of λ i The underlying rate of apnea for the i th patient can be estimated using the posterior density function f(y i |λ i , t i ).

Poisson-gamma model:
It is reasonable to believe that λ i varies between individual following certain distributions. For mathematical convenience we may assume that λ 1 , λ 2 ,…,λ n represent a random sample which follows a gamma distribution probability density function (pdf) given by: where, λ is the true underlying rate of occurrence of apnea among patients in the population. Thus, the posterior distribution of y i given λ i and t i can be written as: It can be shown that f(λ i |y i , t i ) is gamma with parameters y i +α and t i +β and the marginal distribution of y i follows a negative binomial distribution given by: The prior parameter of α and β can be estimated using the empirical Bayes method by fitting the negative binomial model to the data consisting of all subjects based on the method of moment.

Poisson-lognormal:
For the purpose of comparison of results based on the choice of different prior distributions, we also find the empirical Bayes estimates for the prior parameters assuming that the underlying rate of occurrence of apnea following a lognormal distribution. The log-normal distribution is: with parameter mean and variance of 2 exp( ) 2 σ μ + and 2 2 exp 2 exp( ) 1 ⎡ ⎤ ⎡ ⎤ μ + σ σ − ⎣ ⎦ ⎣ ⎦ respectively. The choice of log-normal prior is reasonable since it is sensible to restrict the rate λ to be positive. The posterior distribution can be written as: and marginal distribution of y i is: It is clear that, hence Eq. 7 and 8 are not in the closed form. However the posterior mean can be obtained using Win BUGS 1.4 (Spiegelhalter et al., 2003). Following Clayton and Caldor (1987), we applied the EM algorithm to obtain the estimate for the hyper parameters μ and σ 2 . Based on these values of the hyper parameter we apply Markov Chain Monte Carlo (MCMC) approach using Win BUGS 1.4 to obtain the posterior expected values, posterior standard deviation and the Deviance Information Criterion (DIC) value.
Improper prior: For comparison of results based on gamma and log-normal priors, we assume another prior, which is represented by an improper prior distribution indicating our vague prior knowledge, given by: Thus, the posterior density function for λ i given y i and t i can now be given by: and again can be found by using Win BUGS.

RESULTS
The prior parameters found based on fitting the negative binomial model to the data are ˆ6.286 α = and ˆ1 6.324 β = . By implementing the EM algorithm the estimated prior parameters for μ and σ are-1.051 and 0.162 respectively. By using these prior parameters under the mechanism of the Bayes' theorem, we can obtain the mean posterior distribution in order to estimate the mean and posterior standard deviation for the rate of occurrence of apnea for each subject. The MCMC algorithm can be implemented in Win BUGS to generate samples from the posterior distributions as given in the Eq. 4, 8 and 11. The posterior mean and posterior standard deviation are obtained using these samples for assessing results based on the three types of prior distribution. In this study 10,000 iterations are carried out to find the model parameters, using the first 1000 iterations as burn in. The posterior means and posterior standard deviations for all the patients are presented in Table 1. In order to select the best fitted model, Spiegelhalter et al. (2002) proposed the use of DIC. The DIC values found based on the three different prior assumptions are also given in Table 1. From Table 1, it is clear that the ranking of severity of apnea for all subjects are the same for all choices of the prior distributions. When the rankings are compared subject 7 is identified as having the most severe case apnea, experiencing three episodes of apnea for every 2 min. However, when the estimated mean for each patient are compared different found that the results are slightly different, although the rankings are the same.

DISCUSSION
Bayesian approach is quite flexible making it as a suitable tool to be used for analyzing the apnea data, which is often find to be quite scarce. The flexibility in the Bayesian approach can be attributed to the allowance for updating of the current information involving the scarce data with the prior belief. The scarcity in the data is due to the high cost for conducting the sleep study as in some cases there are problems in getting participation of the patients in the study.
To decide on the prior belief, empirical Bayes methods are used for estimating the parameters for gamma and log normal distributions for describing the underlying rate of occurrence of apnea among the patients. Empirical Bayes methods use the ensemble of patients, implying rate of occurrence of apnea among the patients, implying "borrowing" information from the other patients to decide on the rate for a particular patient. Based on the comparison of results between the Poisson-gamma model and Poisson-lognormal model, in which both models make use of the empirical Bayes prior. It is found that on the average the posterior standard deviations for the Poisson-gamma are slightly lower than those for the Poisson-lognormal.
This reflects the greater precision the estimated rates found based on the Poisson-gamma model. This result is supported by the goodness of fit test obtained by using the DIC whereby the DIC for the Poissongamma is slightly lower than that for the Poissonlognormal. There are substantial improvement in the precision of the model when either gamma and lognormal priors are considered as opposed to the use of improper prior distribution. The greater values of the posterior standard deviation found based on the improper prior indicated the lower precision. This is not surprising as improper prior represents lack of knowledge in the underlying distribution of the rate of occurrence of apnea among the patients.

CONCLUSION
If we are willing to accept the belief that the sample may be representative of apnea patients is the population in the absence of random sampling a patient who is suffering from a moderate level of apnea. This value is estimated based on the median rate found in the sample. It is probably reasonable to use the rate of three episodes of apnea for every 4 min as a threshold level to decide on whether or not a particular patients is having a serious apnea problem. However if more data becomes available the level of threshold may change and could probably be more suitable for practical purposes.