Estimation of the Case Fatality Ratio of Middle East Respiratory Syndrome based on Patients’ Comorbidities with Application to the MERS-CoV Epidemics in the Republic of Korea, 2015

Corresponding Author: Changhyuck Oh Department of Statistics, Yeungnam University, Gyeongsan, Gyeongbuk 38541, Republic of Korea, Korea Email: aristoh99@hotmail.com Abstract: In an early stage of the course of the Middle East Respiratory Syndrome Coronavirus (MERS-CoV) outbreak in Korea in 2015, the Centers for Disease Control and Prevention of Korea and several studies reported estimated Case Fatality Ratios (CFRs) that were significantly different. Here, we propose an estimation method based on the commonly quoted naive estimator of CFR utilizing the number of in-hospital patients with comorbidities, as well as the numbers of cumulative confirmed and dead patients up to a certain date. We compared the proposed estimator with two naive and an integral method estimator by simulation experiments under the individual based susceptible-exposed-infected-recovered model and analysis of data from the 2015 epidemic of MERS-CoV in Korea. The proposed estimator better simulated and analyzed the Korean MERS data.


Introduction
During the early stages of a pandemic of a novel influenza, an estimate of the Case Fatality Ratio (CFR) can be utilized to determine how to distribute resources to supervise, prevent and treat the disease; therefore, this information plays an important role in response. Accordingly, when a novel virus occurs and spreads, estimation of the CFR is essential. However, because the CFR is determined after the epidemic, it is unknown during the course of the epidemic and must be estimated. In this study, we considered the CFRs among labconfirmed cases. The first confirmed case of the Middle East Respiratory Syndrome Corona Virus (MERS-CoV) in Korea was reported on May 20, 2015. Prior to entering Korea, the CFR in other regions known to be affected, such as the Middle East, was known to be about 40%. Additionally, secondary infections occurred through some hospitals and the Centers for Disease Control and Prevention of Korea (CDCPK) announced a daily estimate of CFR using the naive method, which is employed by the World Health Organization (WHO). The CFRs reported in the early stage of the epidemic were very low compared to the known CFR of MERS-CoV in Middle East countries. About one month after the first confirmed patient on June 19, 2015, the CDCPK reported a CFR of 14.5%.
During the epidemic in Korea, several articles were published thatestimated the CFR of the epidemic. Cowling et al. (2015) estimated the CFR as 21% using an integral method proposed by Garske et al. (2009) based on epidemic data of 24 deaths and 30 recovered persons among 166 confirmed cases up to June 19, 2015. Mizumoto et al. (2015) estimated the CFR in real time using an integral equation method and calculated estimates of 40 to 80% in the early stages of the epidemic. On the other hand, when data collected up to July 15 were used (after which there were no more new confirmed patients), Majumder et al. (2015) estimated the CFR to be 22% using the improved naive estimator, while CDCPK reported it as 17.7% based on the naive estimator. These wide ranging estimates in the early stages of the outbreak led to difficulties in developing preventative policies and treatments, as well as to uncertainty regarding which estimation method is more reliable.
Many studies have investigated methods for estimation of CFR of infectious diseases. The naive estimator is defined as the ratio of cumulative deaths to cases. Ma and Driessche (2008) represented the naive estimator of CFR using parameters of a compartment model with susceptible, infectious and recovered individuals. The naive estimator is widely used. For example, Atkins et al. (2014) estimated the CFR of Ebola in West Africa from 2013 to 2014 by the naive estimator. However, the naive estimator appears to underestimate the actual value since it assumes all patients in the hospital will recover and hence be discharged. Conversely, the improved naive estimator, which is given as the ratio of the number of dead patients to in-hospital patients, assumes that not all patients in the hospital have been infected, resulting in overestimation in the early stages of an epidemic. Therefore, trials have been conducted to improve two naive estimators by using auxiliary information of patients other than the number of confirmed cases and the number of dead patients.
Upon estimation of the CFR of SARS in Hong Kong and worldwide in 2003, Donnelly et al. (2003) and Ghani et al. (2005) used additional information regarding the time from onset to death or discharge from the hospital. Garske et al. (2009) investigated an integral method of CFR estimation of a novel flu (A/H1N1) that occurred in 2009 worldwide, resulting in 100,000 known patients and 429 deaths. Their proposed estimator utilized the ratio of deaths to a modified number of confirmed patients by using an estimated cumulative distribution of the time from onset to death, such that it improves underestimation of the naive estimator.
Among these, two naive estimators and an integral method estimator using the cumulative distribution of delay times from onset to death were employed for estimation of CFR during the 2015 Korean MERS-CoV epidemic. However, the integral method estimators depend on estimates of the times from onset to death, resulting in the estimated CFR differing among studies.
Here, we propose a novel estimator of CFR that utilizes information regarding comorbidities of patients in the hospital on the day of estimation and compare it to the two naive estimators and an integral method estimator. We demonstrate that the proposed estimator is better than the naive estimators and the integral method estimator used in the 2015 Korean MERS-CoV pandemic based on simulation experiments and analysis of data from the 2015 Korean MERS-CoV.

Materials and Methods
Let s be the days from when the first MERS-CoV patient was confirmed in a country and C(s), D(s) and R(s) be the cumulative numbers of confirmed patients, dead patients and recovered patients, respectively, up to day s. When an epidemic is completed on day T, the case fatality ratio π of a nation or a region is defined as the ratio of the cumulative number of deaths to that of confirmed patients up to T, that is, π = D(T)/C(T). Here, we do not count patients who have not come to the hospital for various reasons and therefore not been diagnosed as confirmed.
There are two types of naive estimators, the naive and improved naive estimator, which are as follows: (1) For s = 1,2,…,T, respectively. These estimators are also quoted as simple crude estimators as reported by Ghani et al. (2005) and Garske et al. (2009). Note that at day s = 1,…,T, the estimator 1 π assumes that every patient in the hospital has recovered, while 2 π assumes that all patients in the hospital are free from infection. When the epidemic is completed, the two estimators become the same, resulting in the true CFR. At each observation time during the epidemic process, the confidence intervals of the two estimators are given using normal approximation as in Ghani et al. (2005).
The estimator suggested by Garske et al. (2009) reduced the number of confirmed patients by employing an estimated distribution function of the time from onset to death to give a bigger value than the naive estimator. The estimator developed by Garske et al. (2009) is as follows: where, c(u) is the number of confirmed patients who begin symptoms at day u and F(s) is an estimated cumulative distribution of delay times from symptom onset to death. Therefore, F(s-u) is the probability that a patient who develops symptoms on day u will die by day s. However, if the day of symptom onset for a patient is not known, it must be estimated. If it is over-estimated, the Garske estimator itself becomes over estimated by reducing the denominator. The confidence interval for 3 π on each day can be obtained by using the bootstrap method with the estimated distribution function F(⋅) using data describing the delay times. Garske et al. (2009) Garske, while Mizumoto et al. (2015) used an integral method, of which Garske's method is included. Many studies have been conducted using Garski's method (Cowling et al., 2015;Mizumoto et al., 2015;Mishra et al., 2010;Presanis et al., 2009;Balcan et al., 2009;Khandaker et al., 2011;Nishiura, 2010;Yu et al., 2013;Nguyen-Van-Tam et al., 2010;Echevarría-Zuno et al., 2009;Merler et al., 2011;Yang et al., 2009).
When information regarding comorbidities of confirmed patients are available, we propose an estimator using this information in Equation 1 of the naive estimator. On day s, let the number of patients with and without comorbidities be H s (s) and H u (s), respectively. The proposed estimator is: The estimator 4 π assumes 100α% of patients with comorbidites are dead and 100β% of patients without it are as recovered on day s, where α and β are given constants obtained from previous pandemics of MERS, if available. Note that the naive estimator assumes α = β = 0 and the improved naive estimator assumes H(s) = 0 in addition to α = β = 0, i.e., not taking into account inhospital patients. The confidence interval of π can be obtained using normal approximation as follows (see appendix for proof):

Results
To compare the proposed estimator with the two naive estimators and the integral method estimator, we conducted simulation experiments and analyzed the 2015 Korean MERS-CoV epidemic data.

Simulation
An individual based Susceptible-Exposed-Infected-Removed (SEIR) model was used to simulate epidemic outbreaks. The EpiFast algorithm of Bisset et al. (2009) for the individual based SEIR model was implemented in C++. We assumed the population was homogenously mixed. The number of susceptible individuals in the group was set to 500. The number of the initial infectious patients and the initial latent patients was set to 1 and 0, respectively. The case fatality ratio of the entire group was set to 0.2 (0.1) 0.4 and the ratio of patients with comorbidities in the population was set as 0.10, 0.20 and 0.30. For the given CFR and the ratio of patients with comorbidity in the population, the CFR of patients with comorbidity was set as 0.9 (0.1) 0.5 and the CFR of patients without comorbidity was set such that the CFR of the whole group was satisfied. Removed individuals were classified as dead or recovered. The days from exposure to onset were set to 3, 4 and 5 with probabilities of 0.25, 0.5 and 0.25, respectively. Simulations were repeated until we had 1,000 selected observed processes with more than 30 infected patients at the end of the epidemic process for the pandemic. Because we do not know how high or low the CFR of patients with or without comorbidities is, we set α = 0 and β = 0.6 for simplicity.
To compare performance of the estimators, we proposed the following measurement: Assume we generate an epidemic process R times independently. For each epidemic process simulated, we first calculated the CFR π j , j = 1,…, R after the epidemic was completed. Let ( ) j k i π be the estimate of π j on day i = 1,…,N j and N j the days for which the epidemic continued in the j-th simulation, corresponding to ˆ, 1,..., 4 The square root of the mean square error of the daily estimates for estimator ˆk π on the j-th repetition is defined by: where, i 0 is the day on which the number of cumultive deaths is greater than or equal to a given number. In our simulation, i 0 was set to 5. This was because, in each epidemic process, if the number of cumulative deaths is too small, the estimate of the final CFR becomes unstable.  Table 1.
The values of ( 4) d were the smallest among ( ) k d , k = 1,2,3,4 for all cases, which means the performance of the proposed estimator 4 π was better than that of the others.
As expected, the values of ( 2) d were largest among ( ) k d , k = 1,2,3,4 for all cases. The values of (3) d were larger or the same as (1) d , which was somewhat unexpected because the Garske type estimator is known to be an improvement of the simple estimator. One explanation for this is that the Kaplan Meier type estimate for the cumulative distribution function utilized in the present study affected the result. Other type estimates for the cumulative distribution function might change the result, as reported by Cowling et al. (2015) and Mizumoto et al. (2015). The standard deviations m (1) , m (3) and m (4) are comparable with each other, while m (2) was larger than the others.

Korea 2015 MERS-CoV Epidemic
We used publicly available MERS-CoV data of confirmed MERS cases in the Republic of Korea from May 20 to December 23, 2015(KICH, 2015, as well as some newspapers published in Korea. In the data set, patient comorbidity, date of confirmed infection and recovery, death or discharged are given. The first confirmed case of MERS was recorded on May 20, 2015 and 185 more were confirmed through August 10, 2015. Among these, 36 died (19.4%), 141 were discharged (75.3%) and nine were still in the hospital (5.4%). Among the nine in-hospital patients, seven were without comorbidities and two were with comorbidities.      (186) did not change and the number of deaths (36) did not change between July 11 and October 25, 2015, at which time the number of deaths increased to 37. The first death was recorded on June 1, 2015; therefore, CFR estimates were obtained after that day. In the data reported, patients with number 25, 36 and 64 were discharged before or on the confirmed day. For the three cases, we use the confirmed day minus one day as a modification. Figure 1 is a plot showing the cumulative number of patients or infectious patients, as well as the cumulative number of patients discharged from hospitals or recovered and cumulative death by comorbidity condition. Table 2 shows the number of deaths and discharges classified by comorbidities. After the epidemic was completed, among 186 confirmed patients, 38 died, giving a CFR of 20.4%. Additionally, 154 patients had no comorbidities, giving a death ratio of 7.14%. The number of patients with comorbidities was 32, giving a death ratio of 84.4%.
Only 56 of the 186 records had information regarding the time from onset to confirmation. From these 56 records, we calculated the median as 4 days and the mean as 5 days. Therefore, we used the median 4 days for death records that did not specify the time from onset to confirmation (Fig. 2). Figure 3 shows the estimates generated using , π π π and 4 π .

Fig. 3. Daily estimates of MERS-CoV in Korea, 2015
Daily estimates for the CFR using each estimator became stable after July 11, 2015, after which there were no changes in the number of confirmed patients and dead patients. The estimates generated using 1 π were underestimated in the early stage of epidemics and became stable after about 40 days after the first confirmed patient. The estimates generated by 2 π were greatly overestimated in the early stage of the epidemics. Estimator 3 π proposed by Garske showed large fluctuations compared to the proposed estimator in the early stage. Estimator 4 π showed a very stable pattern in the early stages of the epidemic relative to other estimators and overall good performance.

Discussion
In this study, we proposed a new estimator for the CFR of MERS-CoV that considers the comorbidities of MERS-CoV patients. The estimator uses the same denominator of the naive estimator, but a different numerator in conjunction with the number of dead patients and patients with comorbidities. The developed estimator improves the underestimation problem of the naive estimator and the large fluctuation of the integral method in the early stage of the 2015 Korea-MERS-CoV epidemic. Since doctors check comorbidities of patients upon diagnosis, it is assumed that this information will be available under normal conditions. The mean square errors of daily estimates of estimators considered were calculated as a measure of comparison of estimators used in simulation experiments. To the best of our knowledge, this is the first study to employ such a method of comparison.
The data describing Korea MERS-CoV only included 56 cases for which the onset times were recorded; therefore, onset times for other cases must be estimated for Garske's estimator. This estimated time influences Garske's CFR estimate. For Garske's estimation, we used the Kaplan-Meier type estimator for the cumulative distribution function for censored data. However, Cowling et al. (2015) and Mizumoto et al. (2015) used a parametric estimation method for the cumulative distribution function. In a later study, comparison of estimators depending on the type used for the cumulative distribution function might be of interest.
For patients with comorbidities, it is not clear whether some symptoms were from MERS-CoV or other diseases and therefore it is not possible to know the day of onset from MERS-CoV.
The proposed estimator generated using information describing comorbidities of patients showed that it improved underestimation of the naive estimator and that it was more stable than Garske's estimator in the early stage of infection. Although we proposed this method of estimation of CFR for Korean MERS-CoV, it can be evaluated for application to other infectious diseases using the proper CFR of patients with comorbidities. Moreover, even though we used auxiliary variables describing comorbidities, other information such as severity of condition might be applied to improve existing estimators.
In the simulation, we used α = 0 and β = 0.6 for the proposed estimator, which assumes a CFR for patients without comorbidities of 0 and that of patients with comorbidities of 0.6. Even though these values are quite rough estimates for α and β, we obtained satisfactory results. However, we may use more accurate estimates for α and β if such information is available from previous pandemics.
In this study, we investigated various methods of estimating the CFR of the MERS-CoV among confirmed or reported cases. We did not count patients who did not come to the hospital for confirmation. In a future study, we can evaluate methods to adjust the proposed estimator for situations in which the CFR for all MERS-CoV patients, reported or not, is of interest.

Conclusion
The proposed estimator for the CFR of MERS in Korea 2015 using information of patients' comorbidity conditions performed better than existing widely used estimators, two naïve and one integral method estimators.
Appendix. Confidence Interval of 4 π Binomial confidence intervals for the underlying probability of death can be calculated from either exact methods or a normal approximation, as appropriate.
We first assume that given C(s), (R(s), D(s), H s (s), H u (s)) has a multinomial distribution, that is: