Lung Sound Classification Using Empirical Mode Decomposition and the Hjorth Descriptor

: Lung sound is produced by the respiration process in the human respiratory tract. It contains information about the health of the respiratory organs. Lung sound is non-stationary signals and complex signals. One method for the analysis of non-stationary signals often used for the analysis of lung sounds is Empirical Mode Decomposition (EMD). EMD is used to view the Instantaneous Frequency (IF) of the lung sound to differentiate the types of lung sounds. Features extraction directly on Intrinsic Mode Function (IMF) of EMD result is rarely performed in the lung sound analysis. In this research, the EMD was used to obtain IMF of lung sounds. IMF from lung sounds was then analyzed using the Hjorth descriptors. As a classifier, we used Multilayer Perceptron (MLP) with a three-fold cross validation (3fold CV) for validation. From the test, it was found that activity parameter in the first 10 IMF yielded 98.8% accuracy on five classes of data tested. The proposed method showed the excellence of the measurement of the Hjorth descriptor on IMF for feature extraction in lung sound classification.


Introduction
Many researchers developed a number of techniques to diagnose a pulmonary disorder automatically through the lung sound. Lung sounds heard through the stethoscope have valuable information in the diagnosis of lung disease. They are heard through the electronic stethoscope processed using the digital signal processing techniques to obtain the information contained in it (Charleston-Villalobos et al., 2007). In consideration that the lung sounds are nonlinear and non-stationary signals (Ahlstrom et al., 2006), the signal processing techniques used should consider their nature.
Like other biomedical signals, lung sounds are nonstationary (İçer and Gengeç, 2014;Kandaswamy et al., 2004) and complex (Costa et al., 2002). The signal complexity of lung sounds is evidenced by their fractal properties (Gnitecki and Moussavi, 2005) caused by the structure of the lungs that have self-similarity properties (Kitaoka et al., 1999). As a biological signal, the lung sound is also suspected to have a multiscale nature so that the multiscale analysis of lung sounds produces higher accuracy (Costa et al., 2002;Rizal et al., 2015b).
One method for analyzing non-stationary signals is the Empirical Mode Decomposition (EMD) (Huang et al., 1998). It is one part of the process in Hilbert-Huang Transform (HHT) to obtain the Time-Frequency Domain (TFD) of a signal (Reyes et al., 2014). EMD decomposed a signal into several intrinsic mode functions (IMF). The final results of HHT are Hilbert-Huang spectral, which is the Time-Frequency Representation (TFR) of the signal. In contrast to the other methods of TFD, EMD is a method that is data driven; it does not require any other information of the signal (Rilling et al., 2003). EMD is very widely used in various cases, such as for the analysis of weather data (Srikanthan et al., 2011), the non-destructive testing of composite materials (Kazys et al., 2004) and the analysis of transient pulses (Krasnitsky, 2009). Some applications in the biomedical signal analysis are presented by (Fonseca-Pinto, 2011).
EMD is used in the analysis of seizures on EEG signals in the biomedical signal analysis (Oweis and Abdulhay, 2011). It also screens for obstructive sleep apnea (Caseiro et al., 2010) and the identification of biomedical noise (Karagiannis and Constantinou, 2009). In a biomedical signal, it is typically intended to observe the frequency content in the IMF. EMD is used as the part of Hilbert-Huang Transform (HHT) to see the Instantaneous Frequency (IF) of biomedical signals. For the cases of lung sounds, HHT is used to see IF in lung sounds in certain IMFs (Chen et al., 2014).
One of the signal complexity measurement techniques is the Hjorth descriptor, a method used to analyze the Electroencephalogram (EEG) signal (Hjorth, 1970). The Hjorth descriptor measures a variance on the EEG signals in a particular order of signal variation. It consists of activity, mobility, and complexity and is used in EEG signal analysis (Hjorth, 1973), electromyogram signal (Mouzé-Amady and Horwat, 1996) and Electrocardiogram Signal (ECG) (Rizal and Hadiyoso, 2015;Tomak and Kayikcioglu, 2016). The Hjorth descriptor is chosen for biomedical signal processing for the simple computation and direct parameter calculation of a time-series as no signal transformation is required.
Research that combines non-stationary and signal complexity properties of the lungs has not been conducted so far. In this research, EMD and the Hjorth descriptors were used together to extract the characteristics of lung sounds. In this study, we extracted the IMFs of lung sound using EMD. The Hjorth descriptors of IMF1 to IMF10 of lung sound were used as the features and classified using MLP. This method produced 30 features and later this would be reduced to 10 features. In a previous study, the use of Hjorth descriptors for the extraction of lung sounds showed promising accuracy (Rizal et al., 2015a;2015b). From the test result, the proposed method was shown to provide higher accuracy compared to previous research (Rizal et al., 2015b).
The rest of this paper is organized as follows: The related work in lung sound analysis is presented in next section. Our proposed method and lung sound data used in this study is explained in the Material and Method section. The Result and Discussion section explains the result of our testing on the proposed system and discusses the result and comparison with other related research and the conclusion of the paper and future work are presented in the Conclusion section.

Related Work
Various techniques have been done in the form of the computerized lung sound analysis. The digital signal processing method used can be seen from the signal domain (Rizal et al., 2015c), the classification method (Palaniappan et al., 2013), or the case of lung sound analyzed (Shaharum et al., 2012). To achieve a good result, the techniques used must be appropriate to the nature and characteristics of lung sounds. One of the features of lung sounds that stands out is that it is nonstationary and has multiscale properties.
EMD is a method for the analysis of non-stationary signals often used for the analysis of lung sound (Huang et al., 1998). It is used to decompose lung sound into several IMFs to discern the nature of lung sounds from each other. Chen et al. (2014) used HHT in the identification of Velcro. Meanwhile, Reyes et al. (2008) analyzed the discontinuous adventitious sound using HHT. A continuous adventitious sound analysis using HHT was done in a paper by (Lozano et al., 2016). Overall, these studies used IMF1 to IMF3 to see the Hilbert-Huang Spectra (HHS) of the lung sounds.
Another use of EMD is to eliminate the undesirable signal components. Hadjileontiadis (2007), EMD and fractal dimension were used for lung sound denoising. The amount of IMF used depends on upon the energy criterion (Hadjileontiadis, 2007). Another study involving EMD for lung sound analysis was shown in a paper by (İçer and Gengeç, 2014). Eight IMFs were used to display the IF of crackle lung sounds. IF mean as features were classified with SVM. The obtained accuracy ranged from 90 to100%.
The Hjorth descriptor usage for lung sound analysis has been presented in some papers (Rizal et al., 2015a;2015b). In the paper by Rizal et al. (2015a), the Hjorth descriptor was calculated on the entire signal and produced an accuracy of 77% for five classes of data. With the same data, the Hjorth descriptors were calculated using a multiscale scheme (Rizal et al., 2015b). The resulting accuracy reached 95.06%. The result showed the ability of the multiscale scheme to improve the accuracy significantly. Charleston-Villalobos et al. (2013) used the Multiscale Entropy (MSE) to analyze the lung sound in patients with pulmonary alveolitis. The results indicated that the MSE method had the better consistency compared to the spectral methods. The combination of EMD and the Hjorth descriptors is expected to produce better features for the classification of lung sounds.

Lung Sound Data
Lung sound data were collected from various sources on the internet (The Auscultation Assistant, 2015; Arnall, 2015;The Rale Repository, 2015) and the CD of the book (Wilkins et al., 1996). There were 81 data sources that consisted of 18 normal bronchial, 13 asthmas, 15 crackle sounds, 15 stridors and 20 plural rubs. The same data was also used in some previous studies (Rizal et al., 2015a;2015b).
Normal bronchial represents a normal lung sound while asthma produces the wheezing sound included in Continuous Adventitious Sound (CAS) (Bohadana et al., 2014). Crackle, meanwhile, is one of the Discontinuous Adventitious Sounds (DAS) occurring in chronic bronchitis (coarse crackle) or pulmonary fibrosis (fine crackle) (Bohadana et al., 2014). Meanwhile, stridor is a musical sound, high-pitch, which indicates the presence of an obstruction in the upper respiratory tract (Bohadana et al., 2014). Meanwhile, plural rub occurs in the cases of pleural inflammation or tumors (Bohadana et al., 2014). Lung sounds are in the wave format with 8000 Hz sampling frequency and length of one respiratory cycle with the 16-bit resolution.

Empirical Mode Decomposition
Empirical Mode Decomposition (EMD) is a signal analysis method for stationary and non-stationary signals by decomposing the signal into some Intrinsic Mode Functions (IMFs) (Huang et al., 1998). Furthermore, the IMF will be used to obtain the Instantaneous Frequency (IF) of the signal (Huang et al., 1998). In this research, we only used the IMF of the signal. If given a signal x(t), the EMD algorithm is as follows (Chen et al., 2014): 1. Determining local maxima of x(t), generating upper envelope using interpolation. Doing the same way to generate a lower envelope 2. Calculate m1(t), the average of upper and lower envelope. The difference of the signal x(t) and m1(t) is expressed by h1(t) = x(t)-m1(t) 3. If h1(t) is not the IMF, then repeating the process in step (1) and (2) and count h2(t) = h1(t)-m11(t) 4. After k-th iteration, h1k(t) will become IMF if h1(k-1)(t)-m1k(t)=h1k(t). If m1k(t) close to zero, h1k(t) will be named as c1(t) 5. Calculate the first residue, res1(t) = x1(t)-c1(t). The residue will be the input for next IMF calculation. This process will be continued until the average value of the envelope is monotonic Thus, the signal x(t) can be expressed as: with c1(t), c2(t), …, ck(t) are IMF while the res(t) is residue.
In this study, we used IMF1 to IMF10 for feature extraction process. We chose the IMF1 to IMF10 because most of the lung sound data used had IMF up to 13. On IMF11 to IMF13, the signal became relatively monotonous, so it could not distinguish between classes of data.

The Hjorth Descriptor
The Hjorth descriptor is a parameter for assessing the characteristics of EEG signal in the time domain (Hjorth, 1970). The Hjorth descriptor consists of three parameters: Activity, mobility, and complexity. If σ0 = variance of x (n) or the input signal, then σ1 = variance of x1 (n) = x (n -x (n-1). Meanwhile, σ 2 = variance of x2(n) = x1(n)-x1(n-1). The Hjorth descriptor then is expressed as in Equation 2-4: Complexity of ordern σ σ σ σ with σ n is the variance of n order signal variation.
In this research, we used complexity order 1 or hereinafter referred to as complexity. Thus, Equation 4 can be written as Equation 5: The Hjorth descriptor calculation on ten IMF would generate 30 features.

Classification and Validation
In the classification process, Multilayer Perceptron (MLP) is used as a classifier. MLP neural network is the simplest and most popular artificial neural network for classification. MLP consists of the input layer, hidden layer and output layer. The number of nodes in the input layer is equal to the number of features as MLP input. Meanwhile, the number of output layer node is equal to the number of classes of data to be classified. The number of hidden layer is varied to obtain the highest accuracy. MLP is an ANN with supervised learning, so data splitting into training data and testing data is becoming important.
In the validation process, we used the N-fold cross validation (N-fold CV) method. In the N-fold CV, the data was divided into N data sets. Furthermore, one data set was used as a testing data and (N-1) dataset was used as the training data. The process was repeated N times so that each data set became the test data once. Total accuracy refers to the average of the accuracy of each of the training processes (Andersen and Martinez, 1999).
In this experiment, we used N = 3. With the amount of lung sounds from 13 to 18 per class, it would be a minimum of four data in one class for each data set.
Some parameters used in this research included sensitivity, specificity and accuracy. All three parameters are expressed in the Equation 6-8: where TP = number of data from class A that was correctly classified as member of class A. FN = number of data from class A that misclassified was not as the member of class A. TN = number of data from another class (not class A) that was correctly classified not as member of class A. FP = number of data from another class (not class A) that was misclassified as the member of class A.
Result and Discussion Figure 1 shows the example of EMD on normal bronchial sound. From Fig. 1, it can be seen that from IMF1 to IMF4, the maximum amplitude of IMF ranged from 0.5 to 1. Meanwhile, IMF5 to IMF9 had the maximum amplitude ranging from 0.02 to 0.1. This result indicates that the main components of the lung sound signal lie in IMF1 to IMF4. IMF1 to IMF4 had the higher signal variation compared to IMF5 to IMF10. So IMF1-IMF4 would produce the dominant features for classification.
In the classification process, first, we used all 30 features (10 activities, 10 mobility, 10 complexity); then we reduced this to 10 features for each data point to observe the effect of the MLP number of hidden neurons. The accuracy of the system with different numbers of the MLP hidden layer is shown in Fig. 2.
The highest accuracy was achieved by the activity with the hidden neuron numbers of 0, 20-45 and 55, with an accuracy of 98.8%. Overall, the Hjorth descriptor parameters produced the highest accuracy of 96.3%. The mobility individually provided the highest accuracy of 85.19% while the complexity resulted in the highest accuracy of 87.65%. Overall, the results were obtained for the use of the first 10 IMF.
To investigate the effect of the amount of IMFs on the accuracy, we tested using several IMF numbers with N-25-5 MLP. The results obtained are shown in Fig. 3. The use of IMF1-IMF10 still produced the highest accuracy compared to a reduction in the amount of IMFs. The maximum accuracies at IMF1-IMF5 and IMF 1-4 were 93.83 and 95.06%, respectively. Theoretically, the Hjorth descriptors at IMF6-IMF10 had a relatively small value due to the lower signal variation, as shown in Fig 1. The activity of IMF1-IMF10, IMF1-IMF5 and IMF1-IMF4 produced the highest accuracy as shown in Fig 3. The activity was the signal variance of IMF, while the mobility and the complexity where the ratio of signal variance for the consecutive derivative order; as such, the activity value must be higher than the mobility and complexity value. Therefore, the activity is the most dominant feature for IMF features extraction.   Chen et al. (2014) utilized the first IMF to estimate crackle in the Velcro case. The selection of the IMF amount to be used in the analysis of lung sound depends on the purpose the research. In this research, the purpose was to obtain IMF for feature extraction that would produce the highest accuracy, so 10 IMFs was quite a moderate choice. Table 1 shows the confusion matrix of the classification results for the activity of the IMF1-IMF10 with MLP 10-25-5 producing the highest accuracy. It appeared that an error occurred in one data. One pleural rub data was recognized as the normal bronchial. The result caused the pleural rub's sensitivity value to be 93.33% while the specificity of the bronchial was 98.41%. This value is quite good for the classification of lung sounds. From Fig. 4, it can be seen that the mean of activity for normal bronchial and pleural rub had a different value, but there are some data that have overlap value. It can be seen from the standard deviation for the activity of IMF2. IMF6-IMF10 may also have a contribution to the classification error. The resume of Se, Sp and Acc for all classes of data is displayed in Table 2.
In previous research, we used the multiscale Hjorth descriptor for lung sound feature extraction (Rizal et al., 2015b). In the research, coarse-grained procedures, such as the multiscale scheme was expressed in Equation 9 (Costa et al., 2002): where, x i is input signal; N is signal length; τ is scale and ( ) j y τ is the output signal. The best result achieved was 95.06%, using five scales and the complexity as the feature. Compared with the EMD Hjorth descriptor, multiscale Hjorth needed only five features to produce the highest accuracy, but the EMD Hjorth descriptor produced a higher accuracy compared to the multiscale Hjorth descriptor. The differences between the coarse-grained procedure and the EMD are that the coarse-grained procedure filters the input signal and, then, downsamples the signal to generate a scaled signal. Scaled signal cannot reconstruct the original signal (input signal). Meanwhile, in EMD, IMFs are decomposed components of the input signal so that the original signal x(t) can be reconstructed by adding the IMF with a residue as in Equation 1.  The advantage of EMD is that IMF is obtained directly from the data, without using any kernels (Reyes et al., 2014). As a consequence, the IMF is dependent upon the data, and the data shift could lead to the difference that the IMF had obtained (Rilling et al., 2003). Several previous studies used the IMF results of the EMD to generate IF (Charleston-Villalobos et al., 2007;Lozano et al., 2013;Chen et al., 2014). The IF of each of the data was calculated using the selected IMF and the features were extracted (İçer and Gengeç 2014). Using SVM as a classifier and IF-mean as the feature, the study obtained a maximum accuracy of 93% for the three classes of data. Meanwhile, in Chen et al. (2014) EMD was used to separate crackle and other components of the sound of breathing. The proposed method was able to distinguish between Velcro with another crackle with an accuracy of 92.20±1.80% (Chen et al., 2014). Compared with the previous research, our proposed method is much simpler because the features have been taken from the IMF directly without seeking the IF of the signal. The number of features used in this study after feature reduction was 10 with an accuracy of 98.7%. This feature was quite a modest amount given the number of data samples from each lung. Sound data reaches 30,000 samples of data.
The selection of the features in this research was performed by reducing the number of IMF sequentially. The dominant feature selection using the feature subset selection can be done in the next study. The development of the EMD method is still continued today as Ensemble EMD (EEMD) (Zhaohua and Huang, 2009), the Complete Ensemble of EMD with Adaptive Noise (CEEMDAN) (Colominas et al., 2014) and so on. Some of the works ahead can then be performed using several development methods of EMD for signal decomposition. Various methods of measuring signal complexity can be combined with EMD for further research.

Conclusion
The Hjorth descriptor measurements on IMFs produce some excellent features for lung sound classification. Using the 10 IMF of EMD results, 30 features were obtained classified using MLP and 3-fold CV. The results showed that activity is the most dominant parameter so that features can be reduced to 10 only with an accuracy that remains. The results indicate that the proposed method promises to be developed for the automatic lung sound analysis.