Experimental Evaluation of Coffee Leaf Disease Classification and Recognition Based on Machine Learning and Deep Learning Algorithms

: Coffee plant diseases constitute a significant danger to world coffee production, and the greatest challenge is to detect these diseases as early as possible to save the crop. Traditional methods are most often based on visual observations, often with errors in diagnosing diseases. Machine Learning has become a tool that presents itself as an alternative for automatically identifying plant diseases. Our study is to implement a robust method of classification and recognition of coffee leaf diseases using both classical ma learning and deep learning methods, so we set up a custom CNN. These methods were evaluated on the Arabica coffee leaf dataset known as JMuBEN. The results of the classical machine learning methods ranged from 81.03 to 100% and the best performance was obtained with SVM and Random Forest; while the deep learning. In comparison, these provided results between 97.37 and 100% with our CNN custom obtaining receiving accuracy with the lowest loss of 0.013%. Accuracy, precision score, recall, and MCC were employed as performance indicators to support this performance.


Introduction
Agriculture is the sector in which we find the exploitation of the land to obtain crops and also the animal exploitation called breeding. It is confronted at all times with diseases that can jeopardize production. Diseases negatively affect plants and animals and hinder market access for agricultural production. Among these diseases, plant diseases are a real obstacle in terms of yield and quality for producers (Crop Diseases, 2021). There are various categories of plant diseases, including fungal, oomycete, hyphomycete, bacterial, and viral (Lu et al., 2021). Coffee is one of the most consumed products with a world production of over 7 million tons per year (Planetoscope-Statistiques, 2021). Kenya, the second largest African producer of Arabica coffee, is confronted with diseases that attack coffee leaves, which affects its production (Quels sont les meilleurs cafés d'Afrique, 2020). These diseases are cerscospora, coffee rust, and phoma (Jepkoech et al., 2021). Identifying diseases and knowing when and how to deal with them effectively is an ongoing challenge for growers. Disease symptoms can appear on the leaves at any stage of the plant's growth.
Early detection and recognition of diseases can minimize crop losses. The process of manual detection of plant diseases is not reliable enough and is very difficult on large farms (Giraddi et al., 2020). Detection using computer vision for plant disease detection will enable experts to take preventive measures for early detection. Detection and classification of coffee leaf disease have important economic and technical significance in agriculture. In recent years, computer vision has been used for object detection and image classification and has made tremendous progress (Jain et al., 2022). We can note several techniques that have been proposed that enable plant disease diagnosis. With a 93% overall accuracy, (Ramcharan et al., 2017) used transfer learning to train a deep learning model to recognize three diseases and two different forms of pest damage. The SSD model, which (Saleem et al., 2020) implemented and showed to have the best average precision of all deep neural networks i.e., 73.07% (Afifi et al., 2020) recommends using Convolutional Neural Networks (CNNs) for their ability to use prior learning weights, three deep learning architectures are selected such as ResNet18, ResNet34 and ResNet50 to build two basic models, a Triplet network and a Deep Adversarial Metric Learning (DAM) network. These methods achieve 81 to 99% accuracy (Xiao et al., 2020) and propose a convolutional neural network. In this case, ResNet50 uses a method based on two separate data sets that include both source and feature images. For leaf blight affecting the crown, leaf, and fruit, the classification performance is 100%; for grey mold and powdery mildew, it is both 98% (Yang et al., 2021) used a fusion module would merge the image features detected by the convolutive imbalance module and ensure feature extraction from the unbalanced data set. Their findings showed that their method performed better than cutting-edge techniques, with an accuracy of 97.58% on the set of images for rice pests and diseases (Shao et al., 2021) proposed a method that merges the LC-FCN model based on pretrained models with the watershed algorithm for dense rice image recognition with an accuracy of 89.88% DAWI and Wulan propose research doing a method using wavelets to denoise the images of the data set and then performs the classification of these images with convolution method. The results obtained using the wavelet method and convolutional neural network gave an accuracy of 97% (Dewi and Utomo, 2021;Yu et al., 2019) and propose a new approach for the detection of apple leaf diseases using deep learning considering regions of interest. They designed two sub-networks in the first step. First, the input image is divided into three parts which are the background, the foliage, and the spots indicating diseases on the leaf, which is the region of interest, then they use pre-trained algorithms to train separately with a new set of types containing information about the class. This method named ROIaware DCNN gives an accuracy of 84.3% Yu and Son in their study proposed an LSA-Net method to recognize the foliar diseases of apple trees. The LSA-Net consists of two sub-networks, the first is a feature segmentation sub-network and the second is a classification, this method gave an accuracy of 89.4% (Yu and Son, 2020). Ahmed and Reddy presented a mobile system for automation to diagnose leaf diseases. The developed system uses convolution neural networks as the underlying deep learning algorithm to classify 38 disease categories and achieves an overall accuracy of 94% (Ahmed and Reddy, 2021;Velásquez et al., 2020) used their work diagnostic model of the state of development of CLR in the crop at the scale of the Coffea arabica variety, Caturra, through the use of remote sensing techniques with multi-spectral cameras adapted to drones and deep learning techniques. Their diagnostic model obtained an F1 score of 77.50%.
Our study consists the first time using classical machine learning algorithms such as K-Nearest Neighbors (K-NN), Support Vector Machine (SVM), Logic Regression, Random Forest, Multi-layer Perceptron Classifier (MLPClassifier), Decision Tree Classifier (DTClassifier), Gaussian Naive Bayes (GNB) and in a second time Convolutional neural networks like MobileNet, VGG-19, Inception-V3, DenseNet-201, ResNet-50, and eventually, our Custom CNN were used to apply deep learning methods more specifically. Finally, a comparison will be done to show which models in our study are the most reliable.

Dataset
The database used for our study is the Arabica coffee leaf dataset called JMuBEN, it consists of coffee leaf images from a plantation in Kirinyaga County, Kenya. The first folder includes 7682 images of Cerscospora, the second contains 8337 images of rust, and the third has 6572 images of Phoma. We also added a folder of 8927 healthy images from the JMuBEN2 folder. In sum, the dataset of our study contains 31518 images of coffee leaves divided into four classes, this case, Phoma, Cerscospora, Rust, and healthy (Jepkoech et al., 2021). Figure 1 shows images of different coffee leaf diseases.
Our dataset will be divided as shown in Table 1 below:

Machine Learning
Machine learning is a subfield of artificial intelligence that allows a machine to have a learning capacity, without having been programmed for a specific task. Machine Learning is generally classified into three categories namely (Shobha and Rangaswamy, 2018).

Supervised Learning
In supervised learning, the system is helped in its learning, it is presented with input data and the outputs that we want to have. The system will then generalize what it has learned for the other unlearned data. The algorithm will develop a function that will make an accurate prediction of the output from the input data. Supervised learning is divided into two categories in this case: -Regression: In this case, the output variable is a category -Classification: The output variable in this instance is a numerical value These main algorithms are Random forests, decision trees, the k-Nearest Neighbor (k-NN) method, linear regression, naive Bayesian classification, Support Vector Machine (SVM), logistic regression and gradient boosting, etc.

Unsupervised Learning
In unsupervised learning, the system receives no help in determining the structure of the input data. The algorithm will seek to discover features in the data to achieve a certain goal. This approach is also called feature learning. These main algorithms are: K-Means, hierarchical clustering/grouping, and dimensionality reduction

Reinforcement Learning
In reinforcement learning, the system learns by interacting with the surrounding environment. At any given moment, the system obtains the current state and all possible actions. It performs one of the actions and receives a feedback signal that notifies it of its new state and the associated reward. Through iteration, the system should be able to automatically determine the ideal behavior (the one that maximizes the rewards) for a specific context. This type of learning is also called "semi-supervised" in the sense that the reward indicates the right result to achieve For our study, we will use supervised machine learning algorithms such as Support Vector Machine (SVM), k-Nearest Neighbors (K-NN), Random Forest (RF), Logistic Regression (LR), Multi-layer Perceptron Classifier (MLP), Decision Tree Classifier (DTClassifier) and Gaussian Naive Bayes (GaussianNB).

Support Vector Machine (SVM)
SVMs are used for classification, regression, and outlier detection. In a high-or infinite-dimensional space, the SVM creates a hyperplane or group of hyperplanes that can be utilized for classification, regression analysis, and other activities. It makes sense that the hyper-plane with the greatest distance from any class's closest training data points will accomplish a good separation, as in general, the higher the margin, the smaller the classifier's generalization error is. (Support Vector Machines, 2021). SVM solves the following equation:

K-Nearest Neighbors (K-NN)
K-NN is a non-parametric classification algorithm that is based on training data. It uses distance calculation to measure the similarity of the test data to the data used for training. Subsequently, the test data is classified by a majority vote of the k nearest neighbors of the training set (Akbulut et al., 2017). The distance calculation used is the Euclidean distance, its formula is as follows: When using KNN, we considered 5 clusters. The RF is an algorithm based on the assembly of independent decision trees. Each tree has an autonomous view of the problem due to a double random draw, namely tree bagging and feature sampling. All three decisions are federated and the decision taken by the RF for test data is then the vote of all trees. The error rate of Random Forest is a function of the correlation between two trees, and the accuracy of each tree (Alam and Vuong, 2013).

Logistic Regression (LR)
The LR is a binomial regression model. The aim is to model as well as possible a simple mathematical model to numerous real observations. Logistic regression is a special case of a generalized linear model to which we apply the sigmoidal function. To produce the logistic regression equation, the maximum likelihood ratio is used to determine the statistical significance of the variables (Kurt et al., 2008).

Multi-layer Perceptron Classifier (MLP Classifier)
MPL Classifier is an artificial neural network algorithm organized into multiple layers within which information flows from the input layer to the output layer only. The MPLClassifier can be defined as a multi-layered directed graph with several hidden layers and fully connected layers. Supervised learning using the backpropagation algorithm is more often used to train the MPLClassifier. The MPLClassifier is an evolution of the single-layer perceptron correcting its weaknesses (Wan et al., 2018).

Decision Tree Classifier (DT Classifier)
DTClassifier is a technique using a decision tree from training data. A decision tree is a predictive model that is a correspondence between observations about an item and conclusions about its target value. In tree structures, leaves represent classifications (also called labels), unleaved nodes are features, and branches represent feature conjunctions that lead to the classification. Decision tree classifiers are also known as multilevel classifiers (Delbarre et al., 2021).

Gaussian Naive Bayes (GNB)
Naïve Bayes is a classifier based on a generative model with a fast learning and testing process. Bayesian classifiers, work based on Bayesian rule and probability theorems. A simplified version of the Bayesian classifier called naive Bayes uses two assumptions. Naive Bayes classification is a case of the naive Bayes method with the assumption of the existence of a Gaussian distribution on the attribute values given the class label (Jahromi and Taheri, 2017). Its general formula is:

Deep Learning
A machine learning method called deep learning uses artificial neural networks. Deep learning is used in many applications ranging from natural language to computer vision processing. There are different types of deep learning architectures namely: Deep Neural Networks (DNN), Deep Belief Networks (DBN), recurrent neural networks (RNN), and Convolutional Neural Networks (CNN). Deep learning has been successfully applied in several research areas, in this case, agriculture, health, education, environment, and health (Santos et al., 2019). In our study, we will use convolutional neural networks. CNN is a deep learning algorithm that is widely used in computer vision, including natural language processing, speech recognition, face recognition, etc. It is composed of three main layers namely the convolution layer, pooling layer, and full layers (Davuluri and Rengaswamy, 2022)

Convolution Layer
The convolution layer is a key element of convolutional neural networks. Its role is to detect the presence of feature points in the images it receives. First of all, it performs convolution filtering. It proceeds by calculating the convolution of each image with each filter. We then obtain an activation map, which locates the characteristic points in the image.

Pooling Layer
Pooling is an operation that consists of reducing the size of the feature points output from the convolution phase. It proceeds by slicing the image into regular cells, then keeping within each cell the maximum value. The most common choices are adjacent cells of size 2 × 2 pixels that do not overlap, or cells of size 3 × 3 pixels, spaced from each other by a step of 2 pixels. Generally, this layer is positioned between two convolution layers. The output is the same feature points but smaller in size.

Fully Connected Layer
Fully connected layers receive a vector as input and produce an output as a vector. They apply a linear combination and an activation function to the values. They allow us to classify the images received as input by providing a vector of sizes equivalent to the number of classes in our problem. Each column of the vector indicates the probability of the data belonging to a given class (Aburass et al., 2022).

MobileNet
MobileNet uses depth-separable convolutions. It can reduce the number of parameters without losing accuracy. This allows for lightweight deep neural networks. It is composed of two main layers: Point layers (pw) and deep layers (dw). MobileNet has been made freely available by Google. The deep layers are convoluted with a core size of 3 × 3 and the point layers are also convoluted with a core size of 1x1. These layers use the rectified linear unit activation function and the batch normalization algorithm. MobileNet has 19 deep layers (Attallah, 2021).

VGG-19
VGG-19 is a variant of the visual geometry network VGGNet which is a deep neural network with a multilayer. It was created at the University of Oxford. VGG-19 consists of 19 layers which use convolutional layers with a 3×3 size core with max-pooling layers. It has two fully connected layers, each with 4096 nodes, which are then followed by a Softmax layer which together form the classifier (Zheng et al., 2018).

Inception-V3
Inception is a convolutional neural network widely used in classification tasks. The structure of the Inception network is at the heart of the GoogLeNet network. Several versions of Inception exist namely Inception v1 (2014), Inception v2 (2015), Inception v3 (2015), Inception v4 (2016), and Inception-ResNet (2016). The Inception module typically contains three different convolution sizes and maximum pooling. For the network output of the previous layer, the channel is aggregated after the convolution operation, and then the non-network elements are pooled (Sharma, 2022).
Compared to previous versions (Inception v1 and v2), the network structure of Inception v3 uses a convolution kernel division method to divide large-volume integrals into small convolutions. For example, a 3*3 convolution is divided into 3*1 and 1*3. With the division method, the number of parameters will be reduced, it will speed up the learning speed of the network and the spatial feature can be extracted more efficiently (Dong et al., 2020).

DenseNet-201
DenseNet is short for Dense Convolutional Network and uses fewer parameters than a conventional CNN because it does not learn redundant feature maps. DenseNet layers have 12 filters. DenseNet has four different variants: . In our study, we used DenseNet201, each layer has direct access to the original input image and the gradients of the loss function. The computational cost is considerably reduced, which makes DenseNet one of the best choices for image classification (Rahman et al., 2020).

Proposed Custom CNN
We propose a custom CNN that is less complex but very efficient, it is composed of two convolutional layers with a 5X5 size core and four convolutional layers with a 3X3 size core. For each pair of convolutional layers, there is a maxpooling that will allow for a reduction of the images and the whole will be connected to 128 fully connected layers. Figure 2 shows the architecture of the proposed CNN

Architecture of Our Method
The architecture of our method is presented in three main steps: Step 1: This is the pre-processing part and the division of our dataset. For the data preprocessing step, we used the histogram equalizer on the images to expand the grey-level distribution range from 0 to 255. The images will also be normalized in the range of [0,1] Step 2: It is the training of the algorithms mentioned above on our dataset Step 3: It is about making the classification and the recognition of diseases Figure 3 will show the general architecture of our study.

Experimental Setup
The performance of the eight machine learning models and six deep learning models for coffee leaf disease detection and recognition were evaluated with the following metrics: Precision, the Mean square error, Recall, the F1 score, and Matthews Correlation Coefficient. We will use the Precision graph and the Loss graph for the deep learning models and for the machine learning models we use the K-fold Crossvalidation and the ROC curve. The experiments were conducted using the Python programming language on a DELL desktop computer equipped with a 2.90 GHz Intel (R) Core i7-10700 CPU, 32 GB RAM, and an NVIDIA Quadro P400 GPU.

a) Parameters Setting
Machine Learning K-Fold cross-validation is a technique used to evaluate the performance of machine learning or deep learning models in a robust way. It divides the data set into k parts of approximately equal size. Each part is selected as it goes along for testing and the remaining parts are for training. This process is repeated k times, and then the performance is measured as the average of all test sets (Wong and Yeh, 2019). In our study k = 10. Figure 4 illustrates the K-Fold.
The Receiver Operating Characteristic (ROC) curve was used to evaluate the performance of classification algorithms. It provides a graphical representation of a classifier's performance, rather than a single value like most other metrics (Delbarre et al., 2021).

Deep Learning
The models were trained using Stochastic Gradient Descent (SGD) as an optimizer with a momentum of 0.9, and the learning rate is 0.0001. The learning rate defines the learning progress of the proposed model and updates the weight parameters to reduce the network loss function. The maximum number of epochs was set to 20 and batches of 20 were used in this experiment. Table 2 summarizes the hyperparameters used in our study (Yee-Rendon et al., 2021).

Evaluation Metrics
To evaluate the performance of the models in our study, we will use different evaluation metrics, namely the accuracy, Precision, Mean square error, Recall, the F1 score, and Matthews Correlation Coefficient. They are calculated from the following formulas: -Accuracy is a performance measure that demonstrates how accurately the system placed the data in the appropriate class -The average of the squared mistakes, or the mean square difference between the estimated values and the true value, is measured by an estimator's mean square error

Results
In this section, we will present the classification and recognition results according to two scenarios: The scenario using the eight machine learning algorithms mentioned above and the second scenario will present the five deep learning algorithms with our customized CNN.

Machine Learning Scenario Results
In this section, we present the different results of model learning. Table 3 presents the test results of each model using the test data as presented in the general architecture of our study above and we obtain results ranging from 100 to 81.03%, with SVM and RF both recording the best performance and GNB recording the worst performance.

Deep Learning Scenario Results
We now present the test results of the deep learning models used. Table 5 presents the performance of accuracy and loss of the validation and testing stages of the deep learning models used. We obtain the validation loss values that vary between 0.014 to 9.83% with our custom CNN recording the best performance and for the validation accuracy, we have values that vary between 97.33% and 100% with our custom CNN and the MobileNet that both record the best performance. Regarding the loss of the test, we have values that vary between 0.013 and 9.68% with our Custom which has the best performance, and for the accuracy of the test, we get scores that vary between 97.37 and 100% with our Custom CNN and the MobileNet that both record the best performance. We present the loss and accuracy curves and the confusion matrix in Table 6 below. The proposed CNN model is less complex and is faster in terms of learning time per epoch with an average of 559s while MobileNet is 667s, Inception V3 is 824s, ResNet50 is 2129s, VGG19 is 2601s, DenseNet 201 is 2981s. It minimizes the mean square error better than all the methods used in our study.

Performance Metrics
Tables 7 and 8 present the respective metrics of the machine learning and deep learning models that consolidate the performances obtained in the two scenarios above.

Discussion
Coffee plant diseases are a major threat to agriculture in general and its productivity in particular, creating many economic losses.
Several studies on the detection of plant diseases have been conducted using machine learning. The most used machine learning technique is deep learning and more precisely convolutional neural networks, which are very efficient for image-based plant disease classification. Also, we have highlighted the performance of classical machine learning models. This study showed that the method used is very effective for the classification and disease recognition of coffee plants from images.
For classical machine learning models, we get better results with 100% results with SVM and RF. Concerning the deep learning algorithms used in our study, we find that our model presents better results for the validation data and the test data. Thus, we obtain the lowest values for the loss functions which are 0.014% for the validation and 0.013% for the test. For accuracy, we obtain with the MobileNet model the best performances both for the validation and for the test with a value of 100%.

Conclusion
In this study, we used several approaches for the classification and recognition of coffee leaf diseases. Both classical machine learning and deep learning methods were used to implement our method.
The evaluation of these approaches showed the effectiveness of the SVM and Random Forest methods with 100% accuracy for the classical machine learning models while MobileNet and our custom CNN showed their effectiveness with 100% accuracy for the deep learning models. Regarding our custom CNN, it recorded less loss than MobileNet using the loss function and this shows that our model is the most robust.
SVM and Random Forest have performed well as deep learning models. In the future, we can implement a robust method in an uncontrolled environment by coupling our method with segmentation to better focus on the diseased part of the plant.