A Backward Regressed Capsule Neural Network for Plant Leaf Disease Detection

: This study investigated the introduction of backward regression coupled with DenseNet features in a Capsule Neural Network (CapsNet) for plant leaf disease classification. Plant diseases are considered one of the main factors influencing food production and therefore fast crop disease detection and recognition are important in enhancing food security interventions. CapsNets have successfully been adopted for plant leaf disease classification however, backpropagation of signals to preceding layers is still a challenge due to low gradient flow. In addition, parameter and computational complexities exist due to complex features. Therefore, this study implemented a loop connectivity pattern to improve gradient flow in the convolution layer and backward regression for feature selection. We observed a 99.7% F1 score with backward regression and 87% F1 score without backward regression accuracy on testing our framework based on the standard Plant Village (PV) dataset comprising ten tomato classes with 9080 images. Additionally, CapsNet with backward regression showed relatively higher and stable accuracy when sensitivity analysis was performed by varying testing and training dataset percentages. In comparison Support Vector Machines (SVM), Artificial Neural Networks (ANN), AlexNet, ResNet, VGGNet, Inception V3, and VGG 16 deep learning approaches scored 84.5, 88.6, 99.3, 97.87, 99.14, and 98.2%, respectively. These findings indicate that the introduction of backward regression of features in the CapsNet model may be a decent and, in most cases superior and less expensive alternative for phrase categorization models based on CNNs and RNNs. Therefore, the accuracy of plant disease detection may be enhanced even further with the aid of the fusion of several classifiers and the integration of a backward regressed capsule neural network.


Introduction
The basic goal of smart farming is to develop innovative solutions for the future sustainability of mankind (Patil and Kumar, 2020). However, plant disease is detrimental to this goal as it destroys crops or diminishes their overall quality. In addition, the use of pesticides to control the spread of plant diseases renders the soil contaminated, which after some time becomes unsuitable for sowing and planting (Jadhav et al., 2021). Therefore, in addition to other challenges, plant diseases contribute significantly to food insecurity, malnutrition, and poverty in Africa, where most people depend on agriculture (Hasan et al., 2020;Li et al., 2020).
Different plant species are affected by different plant diseases caused by factors related to climate change, soil and plant nutrients, pests, and organic soil content, among others (Barbedo, 2020). So far, different techniques have been used to recognize plant diseases (Barbedo, 2020). With the increasing use of smartphones and internet services, mobile phones can easily be used to detect plant diseases. Smartphones have high-resolution cameras and they can also be used to perform computational tasks (Jadhav et al., 2020).

822
Manual plant disease recognition methods (Jogekar and Tiwari, 2021) are widespread but limited, ineffective, costly, and time-consuming hence automatic and efficient recognition methods may be an alternative. Convolutional Neural Networks (CNNs) including other deep models such as ResNet (He et al., 2016), GoogleNet, VGG (Kim and Rhee, 2018) and AlexNet  have been applied in other jurisdictions in an attempt to solve these problems (Fuentes et al., 2018;Kwabena et al., 2020b;Zhang et al., 2019;Kwabena et al., 2020a). They have achieved impressive results in this domain but tend to be 'data-hungry', invariant and vulnerable to problems that can easily lead to mis-classifications (Al-Furas et al., 2019). The state-of-the-art CNN uses pooling that not only leads to the loss of important features but also increases the number of parameters and complexities in models (Al-Furas et al., 2019). A survey by Patrick et al. (2022) establishes that Capsule Neural Networks (CapsNets) perform better than traditional CNNs due to the aforementioned limitations. This is large because CapsNets can connect spatial data and convolution layers hence efficient for image classification.
CapsNets have been widely used as plant disease classifiers. When integrated with VGG-16 (OxfordNet), CapsNets can reduce over-fitting and improve detection accuracy (Simonyan and Zisserman, 2014;Patrick et al., 2022). CapsNet architectures require encoding image input and computation of the class probability using the SoftMax method. Hinton et al. (2018) showed that the use of small training datasets negatively impacts accuracy rates. Similarly, they have a limiting effect on the effectiveness of the training model. Contrary, Ferentinos (2018) notes that CapsNet architectures are effective when used on small image datasets. The classifications of plant leaves can as well be achieved through other methods such as the use of the Support Vector Machines (SVM) (Poojary and Shabari, 2018;Das et al., 2020), Euclidean classifier (KNN), and the Artificial Neural Network (ANN). Even though CapsNets do well on small data sizes, they have difficulty recognizing images in complex backgrounds (Patrick et al., 2022). The current state-of-the-art CapsNet has a very weak gradient flow because it is difficult for signals to be back-propagated (Jogekar and Tiwari, 2021). CapsNets also lack a technique for efficient parameter selection which leads to computational inefficiency due to limited feature diversification. The deeper a CapsNet becomes, the more complex it gets (Patrick et al., 2022).
In an attempt to strengthen CapsNet, this study presents the following contribution: (1) We introduce loop connectivity to CapsNet in place of single normal convolutions to make it easier for signal backpropagation and to improve gradient flow. This minimizes errors during classification.
(2) We introduce backward regression as a feature selection approach to capture significant parameters for further processing. This process was done to reduce the computational complexities and promote parameter efficiency in the model.

Related Work
Several studies have been conducted on plant disease detection. Sullca et al. (2019) used computer vision and machine learning to identify illnesses in blueberry leaves. In Kumar and Vani (2019) CNN was used to identify tomato leaf diseases. The CNN framework had two components. The first component consisted of a model for feature extraction made up of four convolution layers with the activation function ReLU and max pooling, while the second part was made up of two dense layers and a flattening layer. The activation for the second section was Softmax. However, only a small number of diseases were considered in the study because the data-gathering technique was laborious and time-consuming. CNN has also been used in a transfer learning framework to detect different types of tomato diseases and pests Llorca et al. (2018). The study concluded that an increase in the number of tomato disease classes reduced over-fitting problems. Another CNN-based study by Ferentinos (2018) examined plant disease classifications using Plant Village dataset and determined that transfer learning techniques were more accurate than the "from scratch" learning techniques. The results proved that CNN models could be adopted for different plant species and the architectures are more effective when used in large plant datasets.
According to Lowe et al. (2017), CNN architecture requires a substantial training time for the neurons, but the method is widely recognized due to its high classification accuracy. Ferentinos (2018) proposed a deep learning method that uses a large training image dataset collected from different geographical locations and cultivation conditions to increase accuracy potential. This showed that the training model is highly dependent on the volume and quality of the input data for better outputs. Pöpperli et al. (2019) further argue that CapsNet models perform better than CNN when an absolute value is used as the input data.
Other plant disease detection models in the literature have shown promising results. However, most of them are deep, complicated, invariant, not resilient, underperforming, and lacking in adaptability. They're also colorless, textureless, spatially inert, and deformable. Due to these flaws, CapsNets were developed with the ability to encode spatial information, texture, color, and deformation. Capsules are ideally suited for crop disease detection because texture and orientation play important roles in recognizing leaf sections that do not correspond to the rest of the leaf. As a result, we propose an enhanced backward regressed capsule neural network for plant disease diagnosis, which is particularly beneficial for inputs with uncertain probability distributions. Thus the main contributions of this study are to strengthen gradient 823 flow through the use of loop connectivity, promote computational and parameter efficiency through feature diversification, maintain low complexity by using a combination of complex and simple features, and lastly use backward regression for selection of significant feature maps in the model after the first convolution which reduced characters in the model.

Data
Despite the limited number of leaf disease categories and plant disease images, most studies have used the Plant Village (PV) dataset (Brahimi et al., 2017;Mohanty et al., 2016;Barbedo, 2018). The PV dataset is categorized into different classes of plant diseases. Studies have shown that one specific plant may be recorded in different plant disease classes within the PV dataset. At the same time, similar plant diseases having the same common name may equally be recorded in a different class of plant diseases. These variations are evident in the works of authors who used CNN architectures for crop disease classifications (Mohanty et al., 2016;Lowe et al., 2017;Too et al., 2019;Dou et al., 2019). Therefore, we adopted images from the PV database in our study. A total of 9080 tomato images were subdivided into ten disease classes namely: Mosaic virus, bacterial spot, early blight, late blight, leaf mold, healthy septoria spot, target spot, yellow leaf curl, and spider mite as collected by Hasan et al. (2019).

Overview of Capsule Neural Networks (CapsNets)
Capsules are groups of firmly and deeply fixed neurons while Capsule Neural Networks (CapsNets) are collections of capsules. The current state-of-the-art comprises a single convolution layer, a primary capsule layer, and a digit layer (Sabour et al., 2017). The input layer encompasses the pre-processing procedures where an image's size is modified to 28 × 28. The hidden layer comprises a convolutional layer with a kernel of magnitude 9 × 9 with a stride of one for feature extraction, followed by a ReLU function that helps in feature activation. The next layer is the primary capsule layer which has a kernel size of 9 × 9 with a stride of two and deals with feature map tensors. We introduced backward regression after the primary CapsNets layer to select only the significant features. The selection of features through regression is aimed at reducing complexities in the model. The output from the primary capsule layer is also activated by a ReLU function (Sabour et al., 2017). It contains a decoder network with three dense layers and a routing by agreement algorithm proposed by Hinton et al. (2018). Brahimi et al. (2017) used the algorithm to detect movements in movies.
Let the value of the lower capsule used to store image data be j, then the output of that lower capsule is ˆuj| I, and its prediction of the higher-level capsule I am computed as: where, wij is a weighting matrix learned through backpropagation while uj| I denotes the vector that I use for the prediction of the jth capsule. The role of each capsule is to predict the output of the higher-level capsules. The coupling coefficients increase if the prediction conforms to the output of the higher-level capsule (Patrick et al., 2022). Equation (2) (Sabour et al., 2017) shows the SoftMax function that is used to calculate the coupling coefficients: The variables in Eq.
(2) encompass the coupling coefficient cij, the log probabilities bij which is set to zero at the start of routing by agreement and bik is a normalization term that ensures that all output values are within the range of 0 and 1, i.e., a valid probability distribution. The log probabilities are significant in determining whether the lower-level capsule I can be coupled with the higher-level capsule j. Using Eq. (2), the input vector to the higher capsules will be calculated as: where, vj is the output of capsule j. Since the probability of existence is represented by the length of the output vector, short vectors need to be reduced to an almost zero value while long vectors are increased to a near one value. To achieve this we used Eq. (4): where, vj is the vector output of capsule j and sj is its total input while bij is updated during routing by agreement between vj and uj|i. This follows the rule that says, 'if two vectors agree, the inner product will be large. The agreement aij performing updates between log probabilities bij and coupling coefficients cij is calculated as: To describe the whole routing procedure for computation of high-level vector, Eq. (2) and (6) are used.
Various properties of the entities of an image such as size, orientation, and position are encapsulated by the directions of vectors to allow the capsules to learn the relationships between features within an image. The loss function is used to equalize the values between zero and one. Each capsule in the last layer is associated with the loss function lk computed: where, Tk is equal to one if a digit of class k is present otherwise zero and m + = 0.9 and m − = 0.1. The λ downweighting of the loss for absent digit classes stops the initial learning from shrinking the lengths of the activity vectors of all the digit capsules.

Backward Regression
To eliminate feature maps that did not contain any significant information for the classification, backward regression was used (Fig. 1). To begin with, the significance level p of the model was selected as 0.5, then fitting was done and all independent variables were included. The predictor with the highest p-value was then identified. If the P-value of a feature did not satisfy the set threshold it was rejected and removed from the dataset for failing to satisfy the 95% confidence level and the model fitted again. If the Pvalue of the feature, which was the highest in the set, was less than the significance level, we just stopped comparing and forwarded the feature maps to the primary capsule for further processing. This was done repeatedly until all the significant features were identified. For example, consider our model m with a total of n predictors/features i.e., x = x1,x2,...,xn, the role of our backward regression is to estimate significant features to classify k classes as: where, y(mk) are chosen significant features for model k, b0 is the y-intercept, b1 is the slope parameter, and ϵ is the error term. The backward regression process iterates over k models determining suitable features for each model. For instance given k = 10 which can form k models, i.e., m10, if each class is executed individually with say n = 10 then for each of the models if one useless prediction is removed we remain with 9. To begin with, the original tree before regression is shown by Eq. (8) The best model among the configurations in Eq. (9) was chosen based on R 2 .

The Improved Capsule Neural Network Model
We used three convolutions and one primary capsule to capture diversified features where the convolutions followed the loop connectivity pattern of DenseNets (Fig. 2). The loop connectivity pattern was used to strengthen gradient flow by making it easy to propagate signals to earlier layers more directly, contributing to the parameter and computational efficiency through feature concatenation. It helped maintain low complexity in the model through the use of both complex and simple features.
The overall architecture contains three convolutional subnets which are arranged in a looping manner. Feature maps from each subnet are grouped like those in the work done by Jégou et al. (2017). To build primary capsules in the subnets, this study used convolutional layers with 3 × 3 kernels with a stride of 1, 5 × 5 kernels with a stride of 2, and 9 × 9 kernels with a stride of 2 respectively. This study also used padding to ensure equity in size. Figure 2 depicts the proposed architecture that we adopted. After each subnet, backward regression was used to select significant features only. Backward regression reduces the number of parameters hence minimizing computational complexity. The features from the subnets were squashed to form the PrimaryCaps layer. Routing by agreement (Jogekar and Tiwari, 2021) was used. xcolor We did not encounter any limitations in the methods used for the work.

Results
The capsule neural network is based on parameters like momentum, batch size, learning rate, dropout, and learning rate decay. Because neural networks deal with datasets of the same size (Ferentinos, 2018) this study tuned the dataset parameters by modifying the input images to 28 × 28. Plant leaf disease classification was done using an improved Capsule neural network model and the conventional CapsNet. Confusion matrix, F1 scores, and graphical representations were used to find the accuracy of each model. F1-score was computed as: Table 1 summarizes the disease classification accuracy based on the F1 score, precision, support, and recall metrics. The precision for the Tomato mosaic virus class was relatively low compared to the other classes of tomato leaf diseases. However, most of the predictions from the model were noted to be false positives. The recall results for the Early blight tomato leaf disease class were relatively low compared to the other classes of tomato leaf diseases. Only 90% of the actual Early blight tomato leaf diseases were correctly classified. The model perfectly classified the recall for the Target Spot tomato leaf disease class. Figure 3 displays the accuracy and loss obtained from the backward regressed model while Fig. 4 shows the accuracy obtained from the conventional CapsNet model without backward regression. Generally, backward regressed CapsNet has a 99.9% F1-Score while the conventional CapsNet attained 87%. Table 2 shows the sensitivity analysis of our new approach vis a' viz conventional CapsNet with no consideration for feature selection. CapsNet with regressed feature selection shows relatively higher accuracy than when featuring selection.
We further compared the CapsNet models with different deep learning models namely: SVM, Artificial Neural Networks (ANN), AlexNet, ResNet, VGGNet, GoogleNet, and InceptionNet V3, and observed results as shown in Table 3. For this experiment, we used a PV dataset with a ratio of 20:80% for testing and training respectively. Our model had the best testing and training accuracy followed by AlexNet, Inception V3, ResNet, ANN, Normal CapsNet and the last one was SVM.
Research done by Kurup et al. (2019) used a capsule neural network model to diagnose plant diseases. They used a dataset size of 54,306 images and obtained an accuracy of 94 percent. Research done by Verma et al. (2020) used transfer learning to create capsule networks for the classification of potato illnesses and compared their performance to a few famous pre-trained CNN models, notably ResNet18, VGG16, and GoogLeNet. Colored images of healthy and sick leaves from the PlantVillage dataset were utilized to train the models. With 91.83% accuracy, CapsNet demonstrated comparable performance to state-of-the-art CNN models. Research done by Kwabena et al. (2020a) suggests the application of the Gabor and Capsule networks distinguish hazy, distorted, and previously unknown tomato and citrus illness images. The suggested model achieves a test accuracy of 98.13%. According to the researchers, the technique may be applied to other crops and might be a valuable tool for detecting invisible plant illnesses under poor weather and lighting circumstances. Mensah et al. (2021) used the squared Euclidean distance, sigmoid function, and a'simple-squash' function instead of the dot product, SoftMax normalizer, and squashing function present in the dynamic routing method. Extensive trials on the three datasets revealed that the proposed model improves test accuracy consistently across the three datasets while also allowing for an increase in the number of routing iterations with no performance impact. On the tomato dataset, the suggested model beat a baseline CapsNet by 8.37 percent, with an overall test accuracy of 98.80 percent, equivalent to state-of-the-art models on the same datasets. Samin et al. (2021) built a deep learning architecture model (CapPlant) that uses plant photos to detect whether it is healthy or infected. The prediction procedure does not need handmade features; rather, the architecture extracts representations from the incoming data series automatically. To extract and categorize features, many convolutional layers are used. The last convolutional layer in CapPlant is replaced with a cutting-edge capsule layer that incorporates orientational and relative spatial relationships between distinct components of a plant in an image to more precisely forecast illnesses. The suggested architecture is validated using the PlantVillage dataset, which includes over 50,000 images of healthy and diseased. When compared to existing plant disease classification models, the CapPlant model showed significant gains in prediction accuracy. The generated model's testing findings obtained an overall test accuracy of 93.01 percent, with an F1 score of 93.07 percent. We compared the work done using the state-of-the-art CapsNet and observed that our work displayed a higher percentage of accuracy. This means that CapsNet performs better when improved through regression and the use of dense connectivity loops.

Discussion
A backward regressed capsule neural network for plant leaf disease detection was proposed in this study. The PV datasets were split into an 80:20% ratio for training and testing for all the models. The margin and reconstruction losses make up the loss function that was used to train the model. The loss function's default settings for m+, m-and were kept in this implementation. The Capsule models were trained using three rounds of routing. Random adjustments to parameter values and intermediate layers were made in all of the models to see how responsive each model is to the modifications. The CNN models performed poorer than the planned Capsule models as a result of these adjustments. The CapsNets models' performance was unaffected by changes in momentum, batch size, learning rate, dropout, and learning rate decay. The number of routing iterations was the single most critical hyperparameter that substantially influenced the performance of the CapsNet models, with three yielding some performance values. The input images were downsized from 256 × 256 to 48 × 48, 68 × 68, and 224 × 224, and the models were trained, to show the CapsNets' versatility and resilience. On the other hand, CapsNet models generated fairly constant results when the picture size was increased. Increasing the picture size to 224 × 224, on the other hand, needed more computer resources and training time and hence was not feasible for this investigation.
We recommend the adoption of DenseNet architecture (Jégou et al., 2017) to strengthen the gradient, maintain low feature complexity and provide computational and parameter efficiency. We recommend the use of dense connectivity and backward regression because we were able to reduce model complexity in terms of time and number of parameters hence a record accuracy of 99.9% was observed. Research done by Pleiss et al. (2017) demonstrated that the use of DenseNet connectivity provides more accurate results because it is easier to propagate error signals more directly to earlier layers. To enhance the CapsNet, this study also used backward regression to reduce the number of parameters and ease computation in the model. Unlike pooling, which discards any data, regression only selects data with vital information. This idea improved the accuracy of this model, enhanced computational efficiency, and reduced complexities related to computation as depicted in Table 3. Sensitivity analysis of the model to different portions of training data also illustrates the robustness of CapsNet with regression compared when no regression is used (Table 2. Our model retained a higher accuracy overall, though the value dropped when the ratio of traintest was at 90-10%, it was still higher than CapsNet with no regression. Generally, it's expected that a model's accuracy decreases with a decrease in training data but generally, our model showed a good stable trend. In this study, the PV dataset with 10 disease classes was used with the CapsNet model with backward regression. The PV provides a standard dataset that has been used to train and test several models with success (Ferentinos, 2018). Compared to Ferentinos (2018), we used a small plant database to develop the model. Ferentinos (2018) used five CNN architectures that included AlexNetOWTBn, VGG, GoogLeNet, AlexNet, and Overfeat. Each of the original images' architectures had a success rate of 99.44, 99.48, 97.27, 99.06, and 98.96%, respectively. The success rate percentages of the pre-processed images were 99.07% (AlexNetOWTBn), 98.87% (VGG), 97.06% (GoogLeNet), 98.64% (AlexNet) and 98.26% (Overfeat). Chaki and Parekh [56] support these results, who argued that the accuracy and success rate of plant leaf image detections should normally fall between 90 and 100%. This means that neural networks effectively use the leaf feature for plant disease classifications. Although this study used a relatively smaller dataset than Ferentinos (2018), the difference in accuracy was negligible hence there is no major effect if a large or small dataset is used. The use of a large dataset is essential regardless of any augmentation techniques and transfer learning systems used in the model (Llorca et al., 2018). However, according to Barbedo (2020), creating a large database with all the plant species and their related plant diseases is impractical a matter echoed by Kamilaris and Prenafeta-Boldu (2018). Therefore, novel models should ideally use moderate to small training datasets with perceivable minimum misclassification and high accuracy. Therefore, future efforts could also focus on architecture optimization and models that can utilize realtime smartphone models for plant disease classification in a bid to balance training data needs.

Conclusion
The adoption of DenseNet intuition based on loop connectivity patterns promoted strong gradient flow through easy error signal propagation, parameter, computational efficiency through channel-wise concatenation, and use of both complex and simple features that maintained low complexity. In addition, the dynamic routing eased predictability in the model after the primary capsule level. Further, backward regression in every subnet reduced the number of characters, a technique that selected the significant features and discarded those that were of no use to the model all while making a positive impact on detection accuracy.