First-Degree Polynomial Gradient Approach to Reveal the Severity of COVID-19 Pandemic in Affected Countries

Corresponding Author: Dedy Rahman Wijaya School of Applied Science, Telkom University Bandung, Indonesia Email: dedyrw@tass.telkomuniversity.ac.id Abstract: COVID-19 is a new type of Coronavirus (2019-nCoV) which originated from Wuhan in China. Since 11 March 2020, WHO has declared COVID-19 as a pandemic. Currently, it has spread to 175 countries or regions around the world. From day to day, confirmed, recovered and death cases have been reported. This data rapidly changes that indicates an uncertain situation. This uncertain situation might affect many social-economic activities. However, until now, there is no approach to categorize these countries in conjunction with the latest situation. The typical measure, for example, the Case Fatality Rate (CFR) is used to measure the proportion of deaths compared to the total number of confirmed from a certain disease. It utilizes for diseases with discrete, limited-time courses, such as outbreaks of acute infections. The major drawback of CFR is it can only be considered as a final result when all the cases have been accomplished (either died or recovered). According to this gap, we proposed the first-degree polynomial or linear gradient approach to categorize the COVID-19 severity status of areas or countries based on the rate of confirmed, recovered and death cases. The status categorization is necessary information for all parties to be aware of the situation. It can be used for consideration to determine policies related to COVID-19 pandemic such as travel warning, self-isolation, work from home, lock-down, etc.


Introduction
At present, more than 100 countries have reported cases of the spread of COVID-19. These countries report for confirmed, recovered and deaths cases that occurred every day. If we look at the chronology, in December 2019, the first case occurred in Wuhan (Hubei, China) that is reported as pneumonia of unknown cause. According to the authorities, some patients are traders who sell at the Huanan Seafood market. World Health Organization (WHO) began monitoring the situation and asked for more information about laboratory tests carried out and various diagnoses considered (World Health Organization, 2019). On 27 December 2019, a doctor from the Hubei Provincial Hospital of Integrated Chinese and Western Medicine told the government that it could be caused by a new variant of corona virus. From 31 December 2019 through 3 January 2020, a total of 44 case-patients with pneumonia of unknown etiology were reported to WHO by the national authorities in China. During this reported period, unfortunately, the causal agent was not identified (World Health Organization, 2020a). On 7 January 2020, the Chinese authorities identified a new type of coronavirus namely Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2). Following WHO's best practices for naming new human infectious diseases, which were developed in consultation and collaboration with the World Animal Health Organization (OIE) and the United Nations Food and Agriculture Organization (FAO), WHO has named the disease COVID-19, short for "2019 coronavirus disease" (World Health Organization, 2020b). Furthermore, in 11 and 12 January 2020, WHO officially got more detailed information from the Chinese National Health Commission about this outbreak related to exposures in one seafood market in Wuhan City. As of 20 January 2020, COVID-19 spreads from Wuhan City to four countries including China (278 cases), Thailand (2 cases), Japan (1 case) and the Republic of Korea (1 case). Moreover, six deaths have also been reported from Wuhan City (World Health Organization, 2020a). Until this paper is arranged, there are 175 countries or regions with COVID-19 cases reported as shown in Fig. 1. This figure is generated based on a dataset from the repository of Johns Hopkins University Center for System Science and Engineering (JHU CSSE) (Dong et al., 2020). The online version of this map also can be accessed from our dashboard system (ais-rg.com) (Wijaya, 2020a). The COVID-19 cases are rapidly spreading around the world and several areas show the severe death cases with a lower number of recovered patients. Furthermore, on March 11, 2020, the WHO has declared it as pandemic (World Health Organization, 2020c). Until this paper is arranged, there is no vaccine for COVID-19 that makes a very high risk of visiting countries with high confirmed and mortality cases. According to this background, the motivation of this study can be formulated as follows: 1. COVID-19 has almost spread to all countries in the world. From day to day, confirmed, recovered and death cases have been reported. This data rapidly changes that indicates an uncertain situation. This uncertain situation might affect many social-economic activities. However, until now, there is no approach to categorize these countries in conjunction with the latest situation 2. The typical measure, for example, Case Fatality Rate (CFR) is used to measure the proportion of deaths compared to the total number of confirmed from a certain disease. It utilizes for diseases with discrete, limited-time courses, such as outbreaks of acute infections. The major drawback of CFR is it can only be considered as a final result when all the cases have been accomplished (either died or recovered). For instance, the calculation of preliminary CFR during a particular case like COVID-19 pandemic that has a high daily increase and long resolution time can be substantially different than the final CFR. Moreover, the main difficulty in estimating CFR is ensuring the validity of the numerator and the denominator (Harrington, 2016). It also cannot explicitly categorize the status of area or country Based on these motivations, we proposed the firstdegree polynomial or Linear Gradient Approach (LGA) to categorize the COVID-19 severity status of areas or countries based on the rate of confirmed, recovered and death cases. The status categorization is necessary information for all parties to be aware of the situation. It can be used for consideration to determine policies related to COVID-19 pandemic such as travel warning, self-isolation, work from home, lock-down, etc.
The rest of this manuscript is organized as follows. Section 2 explains the related studies including COVID-19 and linear gradient. Section 3 tells about the used dataset and the proposed method. Moreover, section 4 discusses the result of our experiment. Finally, section 5 is the conclusion of this study.

Related Works
COVID-19 is a new type of coronavirus (2019-nCoV) which originated from Wuhan in China. On December 12, 2019, the Wuhan Municipal Health Commission had reported 27 cases of viral pneumonia with seven of them critically ill. Most patients have a history of recent wildlife exposure at the Huanan Seafood Wholesale Market in Wuhan, China, where poultry, snakes, bats and other livestock are sold. Similar to other coronaviruses, COVID-19 is zoonotic that means it can transmit between animal and human (Cheng and Shan, 2020). Moreover, this virus is thought to spread primarily from person to person. Respiratory droplets produced when an infected person coughs or sneezes. Between people who come in close contact with each other (at a distance of about 6 feet). These drops can land on the mouth or nose of people who are nearby or may be inhaled into the lungs (Centers for Disease Control and Prevention, 2020). Symptoms of viral infections range from ordinary coughing, fever, shortness of breath and difficulty breathing to more severe cases such as severe acute respiratory syndrome, pneumonia, kidney failure and death (World Health Organization, n.d.). The period of incubation for this virus is five days (Lauer et al., 2020). Besides, this virus is still active for up to 3 h in the open air, 48 h on stainless steel, 72 h on plastic, 4 h on copper and 24 h on cardboard (Van Doremalen et al., 2020). The latest investigation denotes that this virus can spread via respiratory droplets (Centers for Disease Control and Prevention, 2020). Until this paper is written, there are 175 countries and regions with COVID-19 cases reported with 857487 confirmed cases (Engineering, 2020).
There are several studies and approaches related to cross-country COVID-19 epidemiology. The inclusion of them can clearly describe the position and importance of our research. The Bayesian approach was used to forecast for daily infection of COVID-19. This research concluded that there is a lot of uncertainty about future infection rates (Liu et al., 2020). CFR comparison was also performed between countries that investigated during 12th and 23rd March 2020 (Khafaie and Rahim, 2020). This study found that the severity of COVID-19 is associated with comorbidities and age, especially in countries with the highest outbreaks. Another research also compared CFR by age and regions in the most affected countries. This study also explains the limitation of CFR as a crude indicator, which does not consider the historical data changes (Natale et al., 2020). Moreover, there is a study that analyzes two variations of CFR such as age-specific CFR and age structure of diagnosed infection cases. This study shows that the differences between low and high CFRs are influenced by the age structure of confirmed cases (Dudel et al., 2020). Furthermore, the time to death approach has been proposed to determine the severity of COVID-19 in the World (Verma et al., 2020). Determining the severity can help to implement more appropriate strategies to deal with COVID-19 pandemic.
According to the above works of literature, COVID-19 is easily and widely spread. It also affected not only health problems but also socio-economic aspects. Hence, it necessary to identify the rate of confirmed, recovered and death cases. The rate of these cases can be measured by using gradient value. Gradient value is commonly used in machine learning algorithms. In the neural network, Gradient Descent (GD) optimization is typically used. Essentially, we can describe GD optimization as a hiker (the weight coefficient) who wants to climb down a mountain (cost function) into valley (cost minimum) and each step is determined by the steepness of the slope (gradient) and the leg length of the hiker (learning rate). Commonly, in deep learning, stochastic GD is used as a learning algorithm (Hinton et al., 2012;Hua et al., 2015;Sarno et al., 2020). Furthermore, gradient value can be obtained by curve fitting. Several studies have been reported the utilization of curve fitting. Curve fitting method has been reported for gas concentration analysis of gas sensors (Wijaya et al., 2016), detection of leukocytes (Ray, 2010), parameter evaluation in lightning impulse test technique (Pattanadech and Yutthagowith, 2015), hyperspectral data compression (Beitollahi and Hosseini, 2016). In our study, curve fitting is utilized to get the gradient value of confirmed, recovered and death cases.

Proposed Method
In this study, the time series dataset is crawled from the data repository for the 2019 Novel Coronavirus Visual Dashboard operated by the Johns Hopkins University Center for Systems Science and Engineering (JHU CSSE) (Engineering, 2020). The time series dataset is used from 22 January 2020 to 28 March 2020. The time-series data represents the number of confirmed, recovered and deaths cases during 66 days from 176 countries or regions around the world. We also provide the reformatted and clean time series country-level dataset that is periodically synchronized from the JHU CSSE repository (Wijaya, 2020b). Moreover, the steps of LGA as our proposed method are shown in Fig. 2. The detail explanation of each step can be formulated as follows: 1. The first step of LGA as our proposed method is determining the rate of confirmed, recovered and death cases in each country or region. These rates are calculated based on the gradient of a first degree using polynomial curve fitting that can be expressed by: where, f(x), a, x and b denote a function to the number of cases (confirmed/recovered/deaths), slope/gradient, day and intercept, respectively. The solution is to minimize the squared error (E): where, y and k indicate the actual number of cases and the number of data points, respectively. 2. The second step is extracting slope values from the lowest error for each case. Considering there are three linear models related to each case as follows: ac, ar, ad denote gradient of confirmed, recovered and deaths case, respectively 3. The third step is comparing these gradient values as the rule to determine the severity status. The severity status (s) is divided into three such as safe, cautious and dangerous that is encoded into 0, 1 and 2, respectively. The rule can be expressed as follows: The implementation of this proposed method is performed in python 3.6 environment. Numpy library is utilized to perform curve fitting and R-squared calculation. The graphical representation is used matplotlib. Live demo of our data analysis can be seen in https://ais-rg.com/.

Results and Discussion
Firstly, we want to investigate the comparison of stability value between CFR and gradient/slope values from day to day observation. Figure 3 shows the comparison of CFR, confirmed, recovered and deaths slope during 66 days from 22 January 2020 until 28 March 2020 that is represented by the x-axis. Y-axis refers to normalized values of CFR, confirmed, recovered and death slope. There is no specific unit to measure them because their values are obtained from mathematical or statistical calculations. To make a fair comparison, minmax normalization was used to make a uniform scale between 0 and 10. According to this figure, CFR value fluctuates from day to day. It started at 4 and extremely fluctuated in 8-9 days. Then, this value gradually rises until the last day of observation. The reason for the fluctuating value is because CFR only considers the last value of death and confirmed cases that make it produce unstable value during observation. On the other hand, gradient value considers all data points from 0 to k day of observation with minimizing sum squared error. It prevents this value extremely changes when a new data is included. This mechanism makes the gradient of confirmed, recovered and death cases have stable value in all observation as shown in Fig. 3. The major problem of gradient value utilization is it only available when k > 1. Table 1 shows the list of countries or regions that are affected by COVID-19. This list is ordered by CFR value It shows that the top ten countries with the highest CFR values complemented with these statuses according to gradient values. In this table, eight countries are identified as "dangerous" because ac > ar  ad > ar. Unfortunately, all of them have zero recovered gradient that's mean they have no success story to recover confirmed patients. Besides, the two countries are identified as "dangerous". Italy and Bangladesh are categorized as "cautious" countries because they have a higher recovered rate (ar) than death rate (ad). Furthermore, the examples of each status are also provided. In Fig. 4 to 6, first-degree polynomial fittings are performed to obtain gradient values from each case and area. In addition, R-squared value is also calculated to make sure the performance of polynomial fitting. R-squared measures the fit of the polynomial model with a horizontal straight line that means the null hypothesis. R-squared value is positive when the model follows the trend of data. In contrast, it becomes negative if the model does not follow the trend. According to the experiment, all of R-squared values are positive which means all of the models follow the data trend and can be used to determine the severity status of COVID-19 cases. Figure  4 shows the curve of total cases in the world such as confirmed, recovered and death cases. The global status of the world is "cautious" because ac > ar  ad  ar. According to Fig. 4a, the trend of confirmed case in the world still rises, but the death rate is lower than the recovered rate that is shown in Fig. 4c and Fig. 4b, respectively. Moreover, based on Fig. 5, Indonesia is categorized as a "dangerous" country that has a higher death rate than the recovered rate. It means that life expectancy due to COVID-19 infection in Indonesia is low. This situation is also confirmed by (Medistiara, 2020;Putri, 2020). Another example is China as shown in Fig. 6. China is the first country to have confirmed the COVID-19 case. However, the trend of confirmed cases is stagnant starting from the 40th day. Moreover, it is categorized as a "safe" country for the condition ac  ar with CFR is only 0.0061. For overall,  Fig. 7. This result implies that the majority of countries are still unsafe.

Conclusion
In this study, the first-degree polynomial gradient approach was proposed to categorize the COVID-19 severity status of areas or countries. The experimental result shows that the gradient values have better stability than CFR. The utilization of overall data points makes it better than CFR that is only considered the last number of death and confirmed cases. If a new data added, the gradient value does not extremely change that is the main difference from CFR value. Hence, the rule is formulated to categorize the severity status of countries or regions. This rule is determined based on the rate of confirmed, recovered and death cases in conjunction with these gradient values. According to this rule, the countries or regions categorized as cautious are 131. The number of countries with dangerous status is 43 and the safe status is only 2 countries. This result indicates that our proposed method can complement CFR as an indicator of country severity status. This status can be used for consideration several policies related to COVID-19 pandemic such as travel warning, selfisolation, work from home, lock-down, etc.