Grasshopper Optimization Algorithm-Generative Adversarial Network for Lung Cancer Detection and Classification

: Lung cancer is one of the most dangerous deadly diseases for individuals worldwide. Thus, the survival rate is low due to the difficulty in detecting lung cancer at advanced stages like symptoms; thus, prominence for early diagnosis is important. The detection and treatment of lung cancer is having great importance for early diagnosis. The existing Convolution Neural Network (CNN) based deep learning methods showed tuning was the problem of choosing a set of hyperparameters for the learning algorithm and included outliers that affect the classification result. Therefore, the present research work aims to utilize Grasshopper Optimization Algorithm ( GOA) effectively to solve global unconstrained and constrained optimization issues. Additionally, performing training using the Generative Adversarial Network (GAN) model that controlled the behavior of the classifier during training showed a significant impact. The results showed that the proposed method gives better results in terms of accuracy of 98.89% when compared to the existing models such as KNG-CNN of 87.3%, mask region-based CNN of 97.68%, Transferable Texture CNN of 96.69%, Fuzzy Particle Swarm Optimization (FPSO) CNN of 95.62% and E-CNN method of 97%.


Introduction
Lung cancer is resulting in 1.8 million deaths as of the year 2020 and an incidence of 2.1 million cancer cases newly occurred. Lung cancer is caused due to the uncontrolled abnormal growth of lung parenchyma cells (Naik and Edla, 2021). Lung cancer research helps in developing treatments better which increases the quality and survival of a patient's life. The research papers provide a better and longer future that diagnosis lung cancer increases the living rate of survivors with the disease which has motivated research work (Hadavi et al., 2020). Lung cancer will be occurring because of a high number of thin-walled Alveoli which formed an enormous area to exchange the gas and maintain the process. According to World Health Organization (WHO), lung cancer detection at an early stage results in a 90% of survival rate (Kris et al., 2017). Thus, diagnosing the disease using various examination approaches such as X-ray, Magnetic Resonance Imaging (MRI), or Computed Tomography (CT) scans at an early stage is important. X-ray chest radiography and CT are the most familiar anatomic imaging techniques for recognizing lung diseases (Lyu et al., 2020). The CT images are followed by physicians and radiologist that recognize the disease in presence of disease visualization which will directly extend morphological patterns that describe the severity of the disease and measure the clinical course (Singh and Gupta, 2019). Therefore, various deep learning techniques were developed especially CNNs were used recently with promising results for classifying the lung nodule images into malignant or benign classes (Nasrullah et al., 2019). The CNNs utilized the convolutional layers for the extraction of features that showed complexity in the features that were extracted using the last CNN layer . This included edge features that were extracted and showed considerable complexity when the middle pooling was used (Snoeckx et al., 2018). The texture analysis was performed for the entire image and was not much efficient when compared to the patterns that were showed lower complexity (Moninuola et al., 2021).

228
Thus, the proposed method used the GAN layer that explored the image properties efficiently and feasibly (Jia et al., 2017). Developing the training process using hyperparameters directly control the behavior of the training algorithm and have a significant impact on the performance of the model (Suji et al., 2020). The classification of lung tumor tissues into benign and malignant was the toughest because of the tuning problem of choosing a set of hyperparameters for the learning algorithm and including outliers which affect the classification result (Zhang et al., 2020;Banerjee et al., 2017;Fedorov et al., 2020).To solve such an issue, the GAN model was used that controlled the behavior of the classifier during training and the hyperparameter optimization approach that is based on Grasshopper Optimization to classify the lung cancer CT images into Malignant or Benign. The results showed that the proposed method gives better results in terms of accuracy of 98.89% when compared to the existing models such as KNG-CNN of 87.3%, mask region-based CNN of 97.68%, Transferable Texture CNN of 96.69%, Fuzzy Particle Swarm Optimization (FPSO) CNN of 95.62% and E-CNN method of 97%.

Related Works
The existing techniques based on lung cancer detection reviewed in this section Jena and George (2020) developed a deep learning approach using an efficient Kernel-based Non-Gaussian CNN (KNG-CNN) nodule for tumor classification. The developed model used Lung Image Database Consortium image collection (LIDC-IDRI) for the extraction of input images. These input images were now undergone pre-processing using CLAHE and segmentation based on the ROI technique. The obtained segments were fed as input for the KNG-CNN model which was effectively distinguished the lung CT images into malignant or benign growth. However, the classification results in a few of the cases showed lesser optimal in cancer prediction and also kernel approximation. Hu et al. (2020) developed a CNN that combined the mask R-CNN model that was combined with ANN using supervised and non-supervised models. The developed model reduced the image number using CNN at the time of training and testing time. The developed model performed automated segmentation of lungs in CT images using CNN-based Mask R-CNN specialized model for mapping the lung regions that were classified using supervised and unsupervised machine learning. The classifier used were Naive Bayes, Support Vector Machine (SVM), Gaussian Mixture Models (GMMs) and K-Means Clustering approaches. However, the developed model failed for considering the labels for each data point resulted in better performances. Ali et al. (2020) developed an energy layer from where the texture features of CNN for performing lung nodule classification using CT images. The developed automated model performed lung nodules segmentation of the CT images using CNN-based Mask R-CNN which was modeled for lung region mapping. The developed model showed improvement in accuracy and other metrics. However, the developed model consumed more time in terms of classification due to the large dataset requiring specialized hardware expedite for performing the training process.
Asuntha and Srinivasan (2020) developed a deep learning model using a fuzzy particle swarm optimization approach for detecting and diagnosing lung cancer. The developed model utilized an optimum system for lung CT imaging techniques and reduced the computational complexity using CNN. The developed model uses feature extraction techniques such as Wavelet Transform based features like Scale Invariant Feature Transform (SIFT), Local Binary Pattern (LBP), Zernike Moment, geometric, volumetric and Histogram of Oriented Gradients (HOG) features are fed the Fuzzy Particle Swarm Optimization (FPSO) approach for best features selection. The obtained features were classified using the Deep Learning CNN approach that reduced the computation complexity of the model. However, the developed model failed during the process of segmentation that showed slow convergence during the process of searching the features. Harsono et al. (2020) developed a deep transfer learning model for 13 DR-Net for performing Lung Cancer detection. The existing model faced the problem of lung nodule detection and classification using 13DR-Net which is a one-stage detector. The developed model combined the pre-trained natural images weight of the Inflated 3D Conv Net backbone that was fed for the feature pyramid network. The developed deep transfer learning model effectively classified the lung cancer images into malignant or benign tumors. However, the model was computationally expensive which gave rise to a technically and economically challenging process. Kasinathan et al. (2019) developed an automated lung cancer diagnosis that performed segmentation and classification using a CNN-based contour model. The developed model utilized the LIDC-IDRI dataset for the evaluation of the results that generated accurate 3D lesions for Lung Tumor-related CT images. The feature extraction process was performed for 3D images leading to the process of deformation that quantified the process of centroid displacement and classified the lung nodules as malignant or benign tumors. However, the intensity-based features consumed more computational time and were sensitive to speed. 229 Tiwari et al. (2021) detected the lung cancer nodule based on the Mask-3FCM and TWEDLNN algorithm. The developed model performed segmentation based on the Contrast Enhancement (CE), Geometric mean based Otsu Thresholding (GOT) approach based on Modified Clip limit-based Contrasts Limited Adaptive Histograms Equalization (MC-CLAHE). The extracted features were extracted and MU based FCM algorithm was used for the detection of the lung nodules. The developed model was sophisticated in obtaining accurate models compared to the existing approach and the tasks were required to be included in predicting the lung nodules further for examination. Nazir et al. (2021) (developed an effective segmentation approach with pre-processing for lung cancer detection based on the CT images. The developed model was based on image fusion for lung cancer segmentation for optimizing and diagnosing lung cancer. The fusion of Laplacian Pyramid and Adaptive Sparse Representation (ASR) was performed that performed decomposition. Yet, the fusion rule and the layer details were required in the research and also evaluation for large and different datasets was required for robustness.

Proposed Methodology
The block diagram of the proposed method is shown in Fig. 1. The steps involved in the proposed hyperparameter optimization are explained following.

Data Collection
The LIDC-IDRI is the dataset used for lung cancer detection based on the thoracic CT scans based on mass lesions. There are 7 academic centers and 8 medical imaging techniques that have collaborated the dataset that consists of 1018 cases. Each of the subjects is having images with clinical thoracic CT scans that are associated with an Extensible Markup Language (XML) annotation. The Radiologists categorize the malignancy levels from 1 to 5 pulmonary nodule categories. The starting 1 to 3 categories will be identified as benign having class 0. The other 2 categories 4 and 5 are the malignant classes which are represented as class 1. The proposed method uses the LIDC-IDRI dataset that consists of 2, 44, 527 images that have the digital radiograph and computed radiography output monitored system (Armato, 2015). Each of the subjects has images taken from clinical thoracic CT scans that are associated with the XML file. It records the results with a two-phase image annotation process that mainly experienced thoracic radiologists with an initial blinded read phase. Each radiologist is reviewed independently with CT scans and marks the lesions that are belonging to each of the three categories as ("nodule > or = 3 mm," "nodule <3 mm," and "non-nodule > or = 3 mm"). The radiologist reviews independently their marks with anonymized marks for the three other radiologists give rise to the final opinion. Figure 2 shows the sample images taken from the LIDC-IDRI dataset. These image data and images are used for the process of pre-processing.

Denoising using Gaussian Filter
The obtained images have now undergone the process of image normalization that will change the intensity values of pixels into an intensity value of pixels of familiar ones. The Gaussian filter is used for performing the normalization process. The Gaussian filter is generally a linear filter that will blur the image to reduce the noise level in an image. The Gaussian filter will blur the edges and thus reduce the contrast level. The Gaussian filter removes the noises yet keeps the edges sharp respectively for all the pixels. The Gaussian Kernel Coefficients (GKC) are sampled using the 2D Gaussian function which is given in Eq. (1) (1) where, is known as the standard deviation.
The GKC obtained are sampled outcomes from 2D Gaussian function which will be evaluated using the Eq. (2) From Eq. (2), σ is the standard deviation for the distribution. Here, the distribution is assumed as zero. The value will be continuously discretizing and storing those discrete pixels obtained from the Gaussian function. The sample images are shown in Fig. (3).

Segmentation using Otsu's Thresholding Morphological Process for Nodule Identification
The important regions are segmented and it is performed using the thresholding technique which has the residual part is removed accurately. Thus, to perform the function multi-level Otsu thresholding pixel separation has to be performed for classifying the input images into respective classes. The separated gray levels are generated based on the values of intensity. The threshold values are calculated using multi-Otsu thresholding that will find the desired number of classes. The weighted classes are present among the class probabilities which are calculated using Eq. (3): The value of the threshold ranges from 1 to tq1..,n are weighted classes having the probabilities P from the pixels present in the foreground and background.
The means of classes are given as Eq. (4): The average grey level values obtained are expressed as shown in the expressions 1 and 2.
An input image having a structuring element performs the morphological operations for an image that obtains an output image having the same size.
Each of the pixels has an input image that is corresponding to the pixels are present and perform reconstruction of shapes from the input image and segmented images are shown in Fig. 3.

Hyperparameter Optimization using Grasshopper Optimization Algorithm
After removing the noise from the LIDC-IDRI dataset, the process of proposed hyperparameter optimization is carried out for tuning hyperparameters based on cancer classification. Each of the convolution layers has pixel values that are tweaked during the process of training. The weights obtained are known as the model parameters and hyperparameters that are affecting the behavior of the model during training.

Grasshopper Optimization Algorithm
The main aim of the grasshopper optimization approach is to relate the movements of distinct grasshoppers to overcoming the optimization problem that will show up the exploration and exploitation of movements in the search space. The mathematical model for the GOA is shown in Eq. (5): where, Xi is known as the position of the i th grasshopper Si is known as the social interaction, Gi is known as the gravity force on the i th grasshopper and is known as the wind advection. r1, r2 and are known as the random numbers ranging from [0, 1]. This will provide random weight for distinct factors in (1). Equation 6 will define the social interaction which is obtained based on the social forces as given by s (r) ( ) is represented as shown in Eq. (7). The dij is known as the distance among ith and jth j th grasshopper. From Eq. (7), is known as the intensity of attraction and l is known as the attractive length scale. These equations from 5 to 7 are used for solving the grasshopper movement aspects for solving the problem of optimization. If in case the 2nd and 3rd terms of Eq. 4 are showing less significance, then the developed mathematical equation is represented as shown in Eq. (8): Here, ubd and lbd are known as the upper bound and lower bound having d th dimension.
represents the coefficient for shrinking the comfort, repulsion and attraction zone. The GOA has the tendency strongly for move quickly toward the current optimal value and chooses the factor as 1 that becomes important for finding the optimal balance among the targeted and random walks of search agents. The Optimal parameter values obtained enables the range of performances to avoid the problem of overfitting using the GOA optimization with the GAN approach.

Classification using Generative Adversarial Network (Gan)
The hyperparameters of the best obtained are now fed as input for the deep learning GAN approach for the process of lung CT image classification. The GAN is constructed based on game theory that operates between the machine learning models. The GAN consists of two networks that together train where the generator is given with the vector of random values such as input that generates the data having the same structure as that of the training data. The discriminator is provided with the batches to train the data that contained observations from the training of data generated output in the network classifies the observation as generated or real. The two models generated are implemented using neural networks. Here, the networks are known as a generator that will define the p model (x) implicitly. This generator function was evaluating the density function p model. The generator draws samples from the p model which is defined with the distribution prior over the vectors which are served as an input. From input to the generator function is indicated as G (z;  (G)). Here,  (G) is known as the learnable parameter that defines the game of strategy. The input vector is represented as z which is thought of as the randomness source that shows a deterministic system generated by a pseudo-random number generator. The discriminator is trained with the binary classifier and will be offered 2 versions of the generator costs. The GANs will involve both discriminator and generator networks which involves both drawing real data and random data continuously which will be generated through the training process. The discriminator will be used for training the deep neural network classifier. If the discriminator is trained for assigning the data to a real case, then the discriminator 228 will be shown from the training set. Also, the training process might have fake data that will be constructed to sample the first random vector having the prior distribution through model latent variables. The sample x generated by the generator is defined as x = G (z). The function G is known as the simple function and is represented by random transform and unstructured vector z that is converted to structured data which will be intended for distinguishing statistically from the process of training data.
The discriminator will classify the fake data and it will train all the data into a benign or malignant class. The input images are fed for the dropout layer followed by the Leaky ReLU layer. The obtained features were then processed for the batch normalization layer and at last, the convolution layer generates the output. The backpropagation model made it possible for the discriminator to derive an output to the discriminator from an input. The generator will be trained and made as input to the malignant class. The process of training in the discriminator is used for the process of binary classification showed an exception for classification of data as benign that comes out the distribution changes the generator constantly. The learning process is performed using the generator which is unique as it is not giving the targets specifically for all the inputs other than simply giving a reward to produce the output. The output generated might fool the opponent which is constantly changing. The proposed based classification algorithm classifies the Lung CT images into Benign and Malignant.

Results
The proposed GOA-GNN model obtains the outputs in terms of sensitivity, accuracy, f-score and specificity, which are indicated for model generalization. The proposed model overcomes the problem of hyperparameter optimization using the LIDC-IDRI dataset. The simulation outputs are evaluated using the Intel Core i7 processor which works with 2 GHz of CPU time with 48 GB RAM. The present research work utilizes 80% of images for training and 20% of images for testing. The k-fold validation is performed where the value for k = 5. The training data will be fed to the classifier and the evaluation is taken place using the testing data.

Performance Measures
The performance of the proposed method is evaluated using the following parameters:

 Accuracy
Accuracy measures the exactness of predicted value based on the machine learning model. The accuracy is expressed mathematically as shown in Eq.

 Recall
The proportion of actual positives to that of the correctly predicted data is defined as recall, which is expressed as shown in Eq. (13) Table 1 shows the results obtained for the proposed GOA-GAN method in terms of accuracy, sensitivity, specificity, MCC and F-score for the LIDC-IDRI dataset. There are various optimization approaches used for the evaluation of results and comparing them with the proposed GOA and GAN approaches. The existing algorithms such as Grey Wolf Optimization (GWO), Ant Lion Optimization (ALO) and Whale Optimization Algorithm (WOA) were utilized for the evaluation of the results with the proposed GOA technique. The GOA showed low accuracy of 60% as it faced problems during local searching and showed a slow convergence rate. The ALO algorithm required more run time until the optimization process and thus lowered the accuracy values up to 90.35%. Similarly, the WOA showed slow convergence speed, low accuracy of 95.78 and easily fall into local optimum showed lower accuracy values. The present research work used hybrid model of tuning GOA with a GAN classifier. The optimization approach worked well for the grasshopper's mimic behavior determined the feature subset improved the performances by 98.89%. The results evaluation for GOA with existing optimization approaches is shown in Fig. 4.
The results evaluation for GNN with existing classification approaches is shown in Fig. 5. Table 2 show Performance evaluation for the classification approaches in terms of Accuracy, Sensitivity, Specificity, MCC and F-score. The Alex net showed lower accuracy of 96.9% due to Duplication of data because of overlapping pixel blocks and also required more memory. Similarly, the Google Net classifier was used for sizing the pre-trained model that obtained larger values of 92.9% better when compared to the VGG. The limited divergence of the inception model obtained an accuracy of 90.78% lower. Similarly, the VGG network has weights that were large but the GAN model reduced the size and weight of the data increased the efficiency of energy obtained 98.89% of accuracy.     Table 3 shows the comparative analysis of the proposed method with the existing models. Jena and George (2020), KNG-CNN showed classification results with lesser optimality for cancer prediction and also kernel approximation for accuracy of 87.3%. Also, Mask Region-Based Convolutional Neural Networks (Hu et al., 2020) failed for considering the labels for each data point resulting in better performances of 97.68%. Transferable Texture Convolutional Neural Network (Ali et al., 2020) consumed time during large dataset classification and thus obtained an accuracy of 96.69%.

Discussion
The developed FPSO-CNN (2020) model failed during the process of segmentation that showed slow convergence during the process of searching the features 95.62%. The Enhanced-CNN (E-CNN) method (Harsona et al. 2020) intensity-based features consumed more computational time and were sensitive to the speed of 97%. Similarly, the Mask-3 FCM and TWEDLNN obtained an accuracy of 96%, specificity of 97% and sensitivity of 94% because of the complex problems and also Imran Nazir developed Laplacian Pyramid (LP) decomposition algorithm with the Adaptive Sparse Representation obtained a sensitivity of 98% and specificity of 89%.
The Proposed GOA-GAN method showed an accuracy of 98.89%, a sensitivity of 98.67% and a specificity of 97.98% that proving that the model improved its performance by using GAN. The model decreased the weight and size function thereby lowering the computation of the model.

Conclusion
The present research work performed the training process using the Generative Adversarial Network (GAN) model that directly controlled the behavior of the training that showed a significant impact using the Grasshopper Optimization Algorithm (GOA) on the performance of the model is being trained. The proposed GOA-GAN model showed a decrease in the size and weight function lowered the model's computation and overcame the problem of complexity. This is because the GAN model is having controlled the behavior of the classifier during training and also GOA is effective highly to solve global unconstrained and constrained optimization issues. The proposed method used is the GAN layer explored the image properties efficiently and feasibly. The classification of benign and Malignant tumor tissues in the 231 lung is a difficult task because tuning problem of choosing a set of hyperparameters for the learning algorithm and including outliers that affect the classification result. The results showed that the proposed method gives better results in terms of accuracy of 98.89% when compared to the existing models such as KNG-CNN of 87.3%, mask region-based CNN of 97.68%, Transferable Texture CNN of 96.69%, Fuzzy Particle Swarm Optimization (FPSO) CNN of 95.62% and E-CNN method of 97%. In the future, Classifying the obtained nodules by the risk are benign, aggressive cancer, slow-growing would guide physicians better in evaluating the accuracy of every case.