Fractal Dimension for Lung Sound Classification in Multiscale Scheme

: Lung sound is a biological signal with the information of respiratory system health. Health lung sound can be differentiated from other pathological sounds by auscultation. This difference can be objectively analyzed by a number of digital signal processing techniques. One method in analyzing the lung sound is signal complexity analysis using fractal dimension. To improve the accuracy of lung sound classification, Fractal Dimension (FD) is calculated in the multiscale signal using the coarse-grained procedure. The combination of FD and multiscale process generates the more comprehensive information of lung sound. This study used seven types of FD and three types of the classifier. The result showed that Petrosian C in signal with the scale of 1-5 and SVM with fine Gaussian kernel had the highest accuracy of 99% for five classes of lung sound data. The proposed method can be used as an alternative method for computerized lung sound analysis to assist the doctors in the early diagnosis of lung disease.


Introduction
Auscultation is an important procedure to establish the diagnosis of various lung disorders (Sarkar et al., 2015). The sound of airflow through respiratory tract is listened by a doctor using a stethoscope to analyze sound that is different from normal. This process is highly dependent upon the doctor's skill as it requires more practices for years (Abbas and Fahim, 2010). Various digital signal processing techniques, therefore, have been developed to analyze the lung sound quantitatively.
Lung sound is produced from air turbulence in the respiratory tract. Kitaoka proposed a 3D model of branches in lung that have self-similarity property (Kitaoka et al., 1999). So that lung has a finite fractal structure. As the lung structure has fractal property (Suki et al., 2003); thus, the lung sound follows the power law distributions (Ahlstrom et al., 2006). Gnitecki and Mousavi (2005) tested it using Katz Fractal Dimension (KFD), Sevcik Fractal Dimension (SVD) and Variance Fractal Dimension (VFD). They stated that lung sound has a part of fractal properties. The test of lung sound fractality using the degree of similarity (H) was conducted in by Rizal et al. (2018) where normal lung sound and crackle had H value ≈ 1, indicating that it had a strong indication to have fractal properties.
The utilization of fractal dimension in lung sound analysis has been conducted in previous studies. KFD, SVD and VFD were used to identify the bronchial provocation in lung sound that had produced a True Positive (TP) value up to 90.3% (Gnitecki et al., 2004). Fractal Dimension (FD) also was used to detect the peak of explosive lung sound, such as crackle and squawk (Hadjileontiadis and Rekanos, 2003). The result showed that FD worked properly in the detection of crackle or squawk in lung sound or bowel sound. For a similar detection, a combination of FD and wavelet produced a higher accuracy and more resistant to noise (Hadjileontiadis, 2005a;2005b). Meanwhile, a combination of FD and Empirical Mode Decomposition (EMD) can be implemented in denoising the explosive lung sound (Hadjileontiadis, 2007), noise removal in ECG signal (Agrawal and Gupta, 2013) and iris identification (Chen et al., 2013).
Apart from fractal properties, lung sound as a biological signal has some multiscale properties (Costa et al., 2005). The multiscale analysis of biological signal is often used in ECG signal (Costa et al., 2005), EEG signal (Lin et al., 2014), microvascular blood flow (Humeau-Heurtier et al., 2014), or lung sound (Rizal et al., 2016b). To the best of our knowledge, no study analyzes the combination of FD and multiscale analysis for lung sound analysis. Hence, this study conducted the feature extraction of lung sound using FD in multiscale signal. FD used in this study included Box-Counting FD (BCFD) (Mandelbrot, 1985), Katz FD (KFD) (Katz, 1988), Sevcik FD (SFD) (Sevcik, 1998), Variance FD (VFD) (Kinsner, 1994), Higuchi FD (HFD) (Higuchi, 1988) and Petrosian FD (Petrosian C and Petrosian D) (Petrosian, 1995). Meanwhile, the coarsegrained procedure was used for multiscale process (Costa et al., 2005). The FD of lung sound signal in each scale was used as a feature for classification. By implementing SVM, K-NN and MLP as a classifier, we found that Petrosian C FD and MLP produced the highest accuracy of 5-class lung sound classification.
This paper is organized as follow. In next section, we presents the theory of fractal and coarse-grained procedure as multiscale method used in this study. We described lung sound data, features extraction method and classifier in Material and Method Section. Subsequent Section discusses the result of the experiment. Meanwhile, conclusion of this paper is presented in Conclusion Section.

Fractal Theory and Coarse-Grained Procedure
One parameter used to define signal complexity by chaos approach is by implementing fractal dimension, which can be translated as the appearance dimension of self-similarity; the repetitive pattern of signals in some different patterns (Mandelbrot, 1985). The more selfsimilarity signal pattern appears, the higher the fractal dimension value would be. The value of the fractal dimension is not an integer as a Euclidean dimension that has 1, 2 and 3 dimensions for the line, area and space, respectively. For 1-dimension signal, the fractal dimension has a value of 1 ≤ FD < 2 where the more complex signal leads to the higher value of its FD (closer to 2) (Sevcik, 1998).
As lung structure has a self-similarity (Kitaoka et al., 1999), so we can expect that the lung sound also has a self-similarity property. The pulmonary sound produced by different disorders will have different properties, it is estimated that this can be seen through the fractal dimensions of each type of lung sound. Several FD calculations used in this study are explained in the following subsections.

Box-Counting Method
One of the earliest fractal dimension calculation techniques is box-counting (BC) method, which uses curve properties when filled by boxes (Mandelbrot, 1985). In this approach, the curve is covered by a set of boxes and the number of boxes of a particular size is calculated to obtain a total number of boxes required to cover all parts of the curve. If the size of the box is close to zero, the whole curve will be covered by boxes and it can be expressed mathematically as Equation 1: where, N(r) refers to the number of boxes size r required covering all parts of the curve. Practically, BC method estimates a fractal dimension by calculating the number of boxes with various sizes required covering all parts of the curve. Then, D B is calculated by observing straight line in the logarithmic plot of N(r) to r. It can be expressed mathematically as Equation 2: where, C is constant and D B is a gradient of the logarithmic graph of N(r) to r. This method is commonly known as the grid method and it requires a long computing time.

Katz Method
Katz Fractal Dimension (KFD) of a curve in series with length = N is defined as in Equation 3 (Katz, 1988 where, KFD is a fractal dimension in Katz method, while Lc is total curve length that can be calculated using Equation 4: where, dist (i,i +1) is a distance of two sequential points, while d in Equation 3 is the maximum distance or the curve diameter. This value can be obtained by calculating a distance between curve original point and the longest length of the curve, as expressed in Equation 5:

Sevcik Method
Fractal dimension calculation using Sevcik method (SVD) in a curve with N length can be expressed as Equation 6 (Sevcik, 1998).
Lc in Equation 6 is total length as expressed in Equation 4. Another variation of Sevcik method is by using normalization in x and y-axis before implementing LC and SVD calculation. Normalization in the x-axis is expressed in Equation 7: where, x i is the initial value in x-axis while x max is the maximum value of x i . Meanwhile, normalization in yaxis is expressed in Equation 8: where, y i is the initial value in x axis, y min is the minimum value of y and y max is the maximum value of y.

Variance Method
Variance Fractal Dimension (VFD) of signal s(t) is calculated using Hurst Exponent (H) as expressed in Equation 9: where, H is signal smoothness. In Equation 9, (∆s) ∆t can be calculated with s(t 2 )-s(t 1 ) and ∆t = t 2 -t 1 . With Equation 10, VFD can be calculated as Equation 10: where, E is Euclidean dimension where for 1-dimension signal, E value is 1. Therefore, Equation 10 can be simplified as Equation 11: VFD calculation can use varied values of ∆t as required. When separating signal and noise, ∆t = 1 (1 data sample), while for separating several data components, ∆t > 1.

Higuchi Method
Higuchi method (HFD) is one of fractal dimension calculation algorithms frequently used for biomedical signal analysis (Higuchi, 1988). The advantages of Higuchi method are that it has high accuracy and is efficient for fractal dimension calculation. A signal with N number of samples can be converted into a series of signal with length = k and varied resolution, as expressed in Equation 12: where, m is original time indication (m =1, 2,…, k). Then, curve length m k X , l m (k) is defined as in Equation 13: Notation a represents floor(a) where (N-m/k)k is normalization factor. The curve length of each k interval is expressed in Equation 14: Fractal dimension is obtained by a slope between plot ln(L(k)) and ln(1/k)). The value obtained by relation L(k)∝k −D with Higuchi Fractal Dimension (HFD) = D.

Petrosian C and Petrosian D Methods
Petrosian algorithm calculates the fractal dimension of a signal by converting the signal series into binary series (Petrosian, 1995). This algorithm has several variations with several different methods to convert the signal into binary series. In Petrosian C algorithm, the difference of sequential signals is calculated by ∆s(t) = s(t+1) -s(t). If ∆s(t)> 0, then ∆s(t) value will be set as 1, while, the value of ∆s(t) will be set as -1 if ∆s(t)<0. Therefore, binary series of '1' and '-1' is generated. ∆s(t) = s(t+1) -s(t) is also calculated in Petrosian D algorithm. If ∆s(t) > standard deviation of s(t), then the value of binary series is '1'. In contrast, if ∆s(t) < standard deviation of s(t), then the value of binary series is '-1'. Petrosian Fractal Dimension (PFD) is calculated using Equation 15: where, n is the signal length and N ∆ is the sign change of binary series.

The Coarse-Grained Procedure
The coarse-grained procedure is a multiscale analysis method that is used in multiscale entropy in (Costa et al., 2005). This procedure can be expressed mathematically as in Equation 16: where, τ is the scale, ( ) j y τ is the output signal of coarsegrained procedure in τ scale and x i is an input signal. Coarse-grained procedure in τ scale is a process of calculating the average of τ number of sequential data to generate a new signal. For example, in scale τ = 1, and so on. The length of data in ( ) j y τ decreas to reach N/τ.

Materials and Methods
Lung sound has two properties that would be explored in this research that is multiscale and fractal. For that in this study characteristic that would be used as a differentiation between the sound of one lung and the other is a multiscale fractal dimension. The process undertaken in this study is as in Fig. 1. In the lung sound, the multiscale process was done to break the lung sound into new signals with the different scale. The fractal dimension was calculated on the signal to characterize the signal. The next process was the classification to find out the accuracy of the designed method. The next subsection describes each step of Fig. 1 in detail.

Lung Sound Data
Lung sound data was obtained from various data on the internet (The Auscultation Assistant -Breath Sounds, no date; Ward, 2005). The data was used in our previous research (Rizal et al., 2016b;2017). Data was comprised of 5 classes, which could be classified into one normal lung sound class and four abnormal lung sound data. Abnormal lung sound consisted of wheeze, crackle, pleural rub and stridor. There were 22 data of normal lung sound, 18 data of wheeze sound, 21 data of crackle sound, 18 data of pleural rub sound and 20 data of stridor sound. The length of sound recording data was one breathing cycle with a sampling frequency of 8000 Hz. Lung sound was normalized to eliminate any variation caused by the recording process. Normalization process used in this study were zero mean normalization to set mean value to 0 and amplitude normalization, as expressed in Equation 17 and 18: where, y(i) and x(i) are respective output and input signals, while N is signal length.
The example of wheeze and normal lung sounds and spectrums are displayed in Fig. 2. In normal sound, there was a clear gap between inspiration and expiration phase. Most of the frequency spectrum was located in frequency less than 1000 Hz. In contrast, there was no clear gap between breathing phases in wheeze sound. The dominant frequency was less than 400 Hz, which is a pitch higher than normal sound.

Feature Extraction
The feature extraction process consists of two steps, coarse-grained procedure to break the signal into multiple scales and calculate the fractal dimension of the multiscale process signal. In the first step we used the coarse-grained procedure to form new signal with different scale as in Equation 6. This study used the scale τ of 1-20. In the next process FD was calculated for each output signal of the multiscale process. Fractal dimension methods used in this study included BCFD, KFD, SFD, VFD, HFD, Petrosian C and Petrosian D. As this study used the scale in the range of 1-20, there were 20 FD values generated as the lung sound features. This scale was then reduced in some steps to observe the effect of scale on accuracy. Scales used in this study were in the range of 1-20, 1-15, 1-10, 1-5, 1-4, 1-3, 1-2 and 1. On a high scale, the scale was reduced by 5 in every step since signal variance in high scale was very low, meaning that information contained in the signal was relatively low (Humeau-Heurtier, 2015). Scale 1 represented the original signal used as a reference whether the multiscale process with coarse-grained procedure had higher accuracy compared to the one without the coarse-grained procedure.

Classifier
This study used several classifiers in the classification process. Classifiers tested in this study were SVM with its various kernels (Theodoridis and Koutroumbas, 2010), K-NN with its variations and Multilayer Perceptron (MLP).
SVM was initially intended to classify linear problems by finding the best hyperplane to separate the two data classes (Cortes and Vapnik, 1995). Hyperplane can be a straight line or a surface that can separate two data classes. The best hyperplane is obtained by finding the maximum margin between two sets of objects of different classes. The margin, in this case, is the closest distance between the hyperplane and the nearest data in each class called the support vector. In its development SVM can be used to solve non-linear problems using a technique called kernel trick. There are two methods of kernel trick used in this study. The first is a polynomial

Amplitude
Amplitude Amplitude kernel covering quadratic SVM and cubic SVM. The second method is the radial base kernel function which consists of fine, medium and coarse Gaussian SVM.
K-Nearest Neighbor is a nonparametric classification method that calculates the similarity of test data with K training data stored previously (Bugdol and Mitas, 2014). The similarity of data, for example, is done by distance measurement. In the K number of training data closest to the test data, the most data class is selected. In this research, five KNN types with different K values are used in accordance with KNN type characteristics. At KNN fine used K = 1, KNN medium used K = 10 while in cosine KNN used K = 10 by using cosine distance. Meanwhile, the cubic KNN uses K = 10 with cubic spacing and weighted distance with K = 10 used in weighted KNN.
Multilayer Perceptron (MLP) is one of the architectures of artificial neural networks that are widely used to solve classification problems (Palaniappan, 2010). MLP is the simplest form of an artificial neural network. MLP consists of three layers of the input layer, hidden layer and output layer. The input layer contains the neurons with the same number of features to be classified and the output layer contains the neurons of the classes to be recognized. The number of neurons in the hidden layer is determined by trial and error. Simply MLP can be explained by the model of one neuron as in Equation 19: where, x is the input signal and w is the weighting representing synaptic modulation, i.e., how strong or how many neurotransmitters are affected by the input signal. While z is the number of responses that affect the neuron. The output of the neuron is expressed by the activation function which is with the input of the total weighted response of the input signal expressed by y = f (z). The most straightforward function of y is a linear function that is y = z while the activation function which is often used is the sigmoid function.
The three-fold cross-validation (three-fold CV) was used to validate the classifier performance in the classification process. Three-fold CV was selected because there were 18-22 of data for each class, meaning that there were 6-8 data for each class in one data set. Performance parameter used in this process was accuracy, expressed as: % 100% number of correct classified data accuracy number of data = × (20)

Results and Discussion
Results of the coarse grained procedure for normal and wheeze lung sound are displayed in Fig. 3. Generally, the shape of both signals remained the same. The change mainly occurred in the number of sample data from N to N/τ, which reduced the signal variance. The value of τ was inversely proportional to the variance, as illustrated in Fig. 4. This in turns caused some changes in FD as shown in Fig. 5-11.      As seen in Fig. 5-11, the value of FD was in the range of 1-2, except for KFD with FD in the range of 2 -7. This out of range value was caused by a weakness in the calculation of KFD as reported by Castiglioni (2010). As the objective of this study was the utilization of FD as features, then the precision of FD was not considered as a focus. Other methods with different FD were Pet C and Pet D, which were extremely small and relatively constant. This was caused by the process of FD calculation that considered a sign change as in Eq. 15 As seen in Fig. 5-11, the value of FD was in the range of 1-2, except for KFD with FD in the range of 2-7. This out of range value was caused by a weakness in the calculation of KFD as reported by Castiglioni (2010). As the objective of this study was the implementation of FD as features, then the precision of FD was not considered as a focus. Other methods with different FD were Pet C and Pet D, which were extremely small and relatively constant. This was caused by the process of FD calculation that considered a sign change as in Equation 15. FD value tends to increase as the scale increases and then at some point, the value decreases. The decrease of FD indicates a drop of signal complexity caused by the coarse-grained procedure. The coarse-grained procedure reduces the signal variance that subsequently reduces signal complexity, which influences FD value on each scale. In this study, FD change patterns became the features of each lung sound.
However, this tendency does not seem applicable to HFD, as its FD value increases when the scale increases. HFD depends on the value of Kmax where Kmax value is directly proportional to FD. In the coarse-grained procedure, the data length N decreased and Kmax was constant. This was equivalent to when N value was constant with the increase of Kmax and then HFD would increase.
Wheeze had the highest value of FD among other lung sound classes. Meanwhile, normal lung sound had lower FD compared to wheeze but higher than other three classes, except for stridor on BC FD. Wheeze had higher FD for being more complex than other signals. It had a musical pattern, high pitch and continued (Bohadana et al., 2014), while other lung pathological sounds were the discontinued signals.
The accuracy using various classifiers for each FD is displayed in Table 1-7. It can be seen that the results mostly indicated that the highest accuracy was achieved after the number of scales was reduced. This means that a large number of scales have no guarantee for the highest accuracy. The highest accuracy was achieved by Pet C on a scale of 1-5 using fine Gaussian SVM with 99% accuracy. The next highest accuracy was VFD with 98% accuracy on 15 scales using quadratic SVM, Pet D with 97% accuracy on 10 scales using MLP and SFD with 97% accuracy on a scale of 1-10 using cubic SVM. The summary of results and comparison between single scale and multiscale FD can be seen in Table 8. There was a significant increase in accuracy between FD measurement in the original signal and FD on a multiscale signal. The recommended features are also only five compared to one feature on regular FD measurements. Table 9 shows the comparison between the proposed method and the multiscale method for lung sound analysis in previous studies. It is seen that the proposed method has yielded higher accuracy compared to the previous methods. FDs on multiscale signals for lung voice analysis resulted in higher accuracy than Hjorth descriptor on multiscale signal (Rizal et al., 2016b), the multiscale gray-level difference (Rizal et al., 2016a) and multiscale entropy (Charleston-Villalobos et al., 2013). Another advantage of the proposed method is that it is a parameter-free to calculate FD.                 The proposed method in this study is FD calculation in the multiscale signal, which is different from the multiscale fractal dimension that is explained in various papers. Rahmad et al. (2014) used Multiscale Fractal Dimension (MFD) is explained as follows, Boulignd-Minkowski defined FD as in Equation 20 (Tricot, 1995): where, A(r) is the number of calculated elements and r is counting window. Then, multiscale fractal is defined as: where, u(t) is log(A(r)) and t is log(r) and D(t) or Multiscale Fractal Dimension (MFD) is first derivative of logarithmic curves (Florindo et al., 2012).
Other MFD approach was explained by Moisy (2008). BCFD was conducted in various ranges of r that yielded some values of D(r). In this approach, multiscale manipulation was applied to FD calculation, not to the signal. Using lung sound data in this study, the method yielded 55.6% accuracy in Cubic SVM.
The result showed that proposed FD calculation on the multiscale signal generated a better result than other similar multiscale. The advantage of the proposed method was that it only had one FD feature calculated in one signal, but the calculation was performed repetitively by the desired scale. For HFD, Kmax value influences generated FD that in turns produced the different accuracy. This study did not discuss the effect of noise on accuracy. Future work will focus on adding noise to lung sound.

Conclusion
This paper described the implementation of FD as a feature for lung sound classification. FD was calculated on various scales of the signal as the part of the coarsegrained procedure. Signal change in coarse-grained procedure created changes in FD value. Changes in FD value on various scales were used to differentiate one class data and the others. Petrosian C obtained accuracy of 99% in the scale of 1-5 with fine Gaussian SVM as a classifier. The result was obtained from 99 data consist of five classes. This finding is expected to aid doctors in the lung sound analysis using auscultation. This proposed method can be used in other biological signals, such as EEG or EMG that has the fractal properties. The multiscale method used in this study can still be improved to increase accuracy. Moreover, a variation of FD methods or other signal complexity calculation methods can be explored in future work.