Diabetic Retinopathy Detection using Deep Learning Techniques

Corresponding Author: Santhi Balachandran SASTRA Deemed University, Thanjavur, India Email: shanthi@cse.sastra.edu Abstract: Diabetic Retinopathy is a type of eye condition induced by diabetes, which damages the blood vessels in the retinal region and the area covered with lesions of varying magnitude determines the severity of the disease. It is one of the most leading causes of blindness amongst the employed community. A variety of factors are observed to play a role in a person to get this disease. Stress and prolonged diabetes are two of the most critical factors to top the list. This disease, if not predicted early, can lead to a permanent impairment of vision. If predicted in advance, the rate of impairment can be brought down or averted. However, it is not easy to detect the presence of this disease, given the time-consuming and tedious process of diagnosis. Presently, digital color photographs are evaluated manually by trained clinicians to observe the presence of lesions caused due to vascular abnormalities, which is the major effect of Diabetic Retinopathy. This method, although it is pretty accurate, proves to be costly. The delay brings out the need to automate the diagnosing, which will, in turn, have a significant positive impact on the health sector. In recent times, the adoption of AI in disease diagnosis has ensured promising and reliable results and this serves as the motivation for this journal. The paper employs Deep learning methodologies for automatic detection of Diabetic Retinopathy, resulting in a maximum accuracy of 80%, as compared to traditional Machine learning approaches giving only a maximum accuracy of 48% on the same IRDiR Disease Grading Dataset (413 images with 5 levels of DR-Training set; 103 images with 5 levels of DR-Test set). The data set contains digital fundus images of different levels of Diabetic Retinopathy in discrete frequency distributions.


Introduction
Computer-aided medicine, digital health consultancies, health monitoring systems are some of the recent buzzwords that are being viral nowadays. Common people are given the luxury of treatment and diagnosis from their homes with a single tap, thanks to the connectivity and computational infrastructure that has given rise to the digital era we live in. While common illness or mild infections are very curable without consulting a qualified doctor, some severe infections still require lots of effort from the medical community, wherein technology can only aid the process but cannot act independently.
With the development of Artificial intelligence, technology independently assesses the patient's health and diagnoses the problem in no time with the past history and the data tagged with it. Diabetic Retinopathy is one such problem that the article tries to address.
Diabetic Retinopathy is caused as an outcome of diabetes damaging the blood vessels present on the interior of the retina, leading to leakage of fluids and blood into the tissues neighboring it. The leakage develops soft exudates (also referred to as cotton wool spots), hard exudates, microaneurysms and hemorrhages. It is the primary source of vision impairment amongst the working population.
This paper aims at developing a deep learning based automated DR detection model from a frugal amount of data. The performance of DL models will be juxtaposed with other traditional algorithms reinforced with and without various image processing techniques (like filtering, smoothing, etc.,). DL being a black-box approach is augmented by inputting feature engineered images to provide a head start in the performance.

Literature Survey
Earlier researches on Diabetic Retinopathy detection used SVM to classify the Non-Proliferative Diabetic Retinopathy (NPDR) into different grades. Carrera et al. (2017) the implementation began with the extraction of features like Blood vessel density, Number of Microaneurysms and density of hard exudates. The classification was in two phases, namely, NPDR detection and grade classification in the NPDR. The results were benchmarked against the Decision Trees (DT) approach, obtaining an average accuracy of 85% and a maximum sensitivity of 95%. Another paper on proliferative DR (Adarsh and Jeyakumari, 2013). leverages Multiclass SVM (one against all) to classify the severity of DR into normal, mild, moderate, severe and proliferative. The approach included manual feature extraction, which was extensively based on morphological operations and features for the model were selected after performing a statistical significance test. The average accuracy and sensitivity were 95% and 91%, respectively. Rubya and Pintu (2019). incorporated a fuzzy logic classifier to detect different stages of DR, resulting in up to 95% accuracy. Jothi and Jayaram (2019) developed a method to classify the blood vessels that potentially lead to DR and other retinal vascular defects using the Firangi filter technique. Both the above papers had used retinal fundus image data from DIARETDB and STARE databases. Karim et al. (2019) used MATLAB's Neural Network Pattern recognition tool to investigate DR symptoms by detection of micro-aneurysm. The analysis was benchmarked against the traditional ML methods such as Naïve Bayes and SVM.
The recent progress involves research on automated computer-aided diagnosis using Deep Learning approaches to detect and grade Diabetic Retinopathy in advance, from the digital fundus images. Balyen and Peto (2019) discuss the general influence of AI, ML and DL on the field of ophthalmology. Cheung et al. (2019). suggest the potential of DL in the detection of DR and Diabetic Macular Edema (DME) from optical coherence tomography and digital fundus images. Recent trends in research show that DR detection is done prevalently using Deep Learning Techniques. The reason being, ease of obtaining results with less preprocessing efforts. Papers employing the CNN approach has proven to provide superior performance much better than traditional ML algorithms. Xu et al. (2017) used deep CNN with data augmentation techniques to classify fundus images as DR and normal, acquiring accuracy of approximately 95%. This approach demonstrated to be more efficient than the XGBoost model with feature extracted inputs. Islam et al. (2018) and Pratt et al. (2016). had also developed CNN based models to identify Microaneurysms (MA) using Kaggle datasets, achieving a sensitivity of 95% and above. Wan et al. (2018) incorporate transfer learning methodology on popular CNN architectures for enhanced performance in DR image classification.
Similar to the above approach, (Maya and Adarsh, 2019) use CNN to detect bright lesions in the fundus images. The main difference between (Andonová et al., 2017). method and the former is that, the image is converted into smaller blocks instead of using the whole image, followed by preprocessing and transformation (like Adaptive Histogram Equalization, Gaussian Noise, Gray scaling, green channel) and subsequently fed into the CNN architecture with 4 convolution and 2 fully connected layers for classification. Validation is done by comparing against expert marked fundus images. Hemanth et al. (2019) use hybrid techniques using Deep Learning and traditional image processing methods for DR diagnosis from retinal fundus images. The research findings above prove Deep Learning techniques supersede any other approaches when it comes to automated and efficient detection of DR and (Thanati et al., 2019). consolidates some of the recent approaches. The review paper (Bellemo et al., 2019) extensively discuss the increased usage of emerging technologies like Artificial Intelligence in the national screening of Diabetic Retinopathy around the globe.
The paper presented here also deals with CNN based approach incorporating tuning of hyperparameters, whereas emphasizing primarily on having fewer layers and lesser trainable parameters to achieve better and reliable results. Although the papers referenced above had employed deep learning paradigms for detecting DR, they either consisted of deeper layers with great number of parameters or had consumed huge quantities of datasets. The procedure will be discussed in detail in the subsequent sections.

Methodology
This paper mainly concentrates on the application of Convolution Neural Networks in the diagnosis of Diabetic Retinopathy. The salient features of the CNN or any Deep Learning paradigm are automatic feature extraction and efficient computation. These advantages of CNN provided motivation to pursue this approach in this paper.
A CNN model is composed of a convolution layer, a pooling layer and an activation layer in multiple combinations of stacks. The input data fed into the network will be convolved with various filters. These filters are akin to traditional image processing filters, except that they are learned automatically instead of being defined explicitly.
The 413 images from the training set were used in the development and validation of the model. The model was tested on the 103 images test set.
The problem architecture is split into multiple phases, as shown in Fig. 1.

Pre-Processing
Images from the training dataset (samples in Fig. 2) are read using the OpenCV library. This library is essential for image processing, as it contains rich built-in features for rapid processing. The input images, as seen from the algorithm's perspective, are just matrices of pixel values of dimension (height, width, number of channels). Here, channels refer to RGB layers.
First, pre-processing is the reshaping of input data. The images acquired are of size 4200×2800 pixels and occupy 0.5 mb of memory space. This demands too much of RAM usage, resulting in slower computation. Hence, are downsized to 256×256 dimension.
Images from the training set will have inherent variance and direct feeding of input data to the CNN will result in slower convergence of solution and unexpected results. Hence, following the reshaping, the image needs to be normalized.
Finally, the processed images are split into train and validation set of 4:1 ratio (80%: 20%). This validation set is utilized to gauge the capacity of the model.

Data Augmentation
Any deep learning model is data thirsty and in order to expand data, a data augmenter is used to generate more data without the need to collect more actual data. Keras library has inbuilt functionality to perform this operation. Some augmentation techniques include image rotation (to induce rotational invariance), horizontal flipping, scaling, zooming in, cropping and translation. A very recent paper (Frid-Adar et al., 2018) uses state of the art Generative Adversarial Networks (GANs) to create new data. The next section discusses the methodologies in designing a Deep Convolution Neural Network.

Model performance
This paper involves some important data augmentation procedures, namely, image rotation with a range of 25° either side, image shearing with a range of 0.2 radians, zooming of image with a range of 0.2, horizontal flipping, random shifting of images in the height axis with a range of 0.1, random shifting of images in the width axis with a range of 0.1, while shifting fill mode of 'nearest' was chosen and image normalization across features.

Deep CNN Design
Following the pre-processing step, the design of CNN is carried out. As hinted before, it is a combination of stacks of layers to perform the necessary mathematical computation Fig. 6.
The convolution layer is the primary entity, that enables an image to undergo transformation to learn its features. The convolution operation is performed, as shown in Fig. 3. This operation results in a lower or same dimensional matrix. In a Deep CNN context, the filters/kernels are parameters that are learned by a suitable backpropagation algorithm.
The pooling layer subsequently decreases the dimension of the data, as represented in Fig. 4. Conventionally max pooling and average pooling are used in CNN to obtain more stable results.
Batch Normalization is the layer preceding activation layer to reduce the internal covariant shift (Ioffe and Szegedy, 2015). This aids in increased speed of training.
The activation layer performs the role of non-linear function mapping that enables the learning of complex features. It only fires those neurons that observe a particular feature in that layer (Feature maps in every layer Fig. 6. Rectified Linear Unit (ReLU), Sigmoid, Scaled Exponential Linear Units (SeLU) are some of the activation functions used in this paper. Softmax is generally put to use as the last layer that outputs a multiclass predicted label.  y Dropouts are regularization techniques, used in diminishing the overfitting of training data. A dropout layer is introduced in the architecture, which randomly prunes off neurons during training.
Several of the above building blocks constitute a CNN model for image classification. The output from the stack of CNN layers will be flattened, converted from (height, width, number of channels) vector into a vector of shape (height * width * number of channels, 1). A fully connected layer follows the previous flattened layer.
Once the layers are constructed, the weights (parameters) and bias of the neurons are randomly initialized and the model is trained to learn them. A relevant loss function like Binary Cross Entropy was defined to measure the error in prediction. Optimizers enable the learning of parameters that minimizes the cost and decides the speed of learning. Effects of optimizers Adam, Adagrad and RMS Prop were applied in this paper and their analysis results are discussed later in this journal.

Hyperparameter Tuning and Selection
From the previous component, modifying certain entities like the number of CNN blocks, activation functions used, filter size, type of pooling, types of optimizers and their initial learning rates, resulted in varying performance of the model Table 3 validated against the validation/test set. The hyperparameter settings that gave the best results were considered for the final model Table 4. Referring to other journals and analysis of the dataset helped in funneling down the possible combinations of hyperparameters that can be experimented.

Prediction and Validation
After the model was trained with the best possible accuracy, it is ready to be used for the prediction of unseen data. During the training phase, the datasets were fed into the model in mini-batches of 32 images every iteration. The weights were learned for 20 epochs. The testing was performed on the test dataset and accuracy was used as the performance evaluation metric. Detailed analysis of deep learning model results and comparison with traditional ML algorithms are discussed in the subsequent section.

Results and Discussion
Initial attempts involved modeling the dataset using standard Machine learning techniques like Logistic Regression, Linear Discriminant Analysis, k-NN, Decision Tree, Random Forest, Support Vector Machines and Naïve Bayes. In this paper, an ensemble learning method is used in ML modeling with a k-fold cross-validation approach. The dataset with extracted Gray-Level Co-Occurrence Matrix (GLCM) features, divided into k equal subsets, was used in the model training. That is, each ML model is trained k times with k-1 buckets as the training set and the remaining 1 bucket as the validation set. The output evaluation metric, accuracy, is considered for analysis. Two different sets of approaches were implemented and it is as follows.
GLCM feature extraction from the raw input image without performing any processing (application of any filter, particularly Gaussian filter). Box plots were plotted to explore the accuracy ranges of the model. From Fig. 5a, it can be inferred that a simple Logistic Regression model performed better than the rest of the models, with Random Forest performing equally well.
Their average accuracy from Table 1, will substantiate the above observation, that the LR model has the maximum average accuracy of approximately 48%. The standard deviation denotes the generalization capability of a model. Lower the value, better generalization is achieved.
b. GLCM feature extraction from the input image after applying the Gaussian filter. Box plots were plotted to explore the accuracy ranges of the model. From Fig.  5b, it can be inferred that a simple Logistic Regression model performed better than the rest of the models, with Random Forest performing equally well.
Their average accuracy from Table 2, will justify the above observation, that the LR model has the maximum average accuracy of approximately 49%. The standard deviation denotes the generalization capability of a model. Lower the value, better generalization is achieved.   The above results imply standard ML techniques did not fetch good accurate predictions and it did not improve despite executing feature engineering (application of filters) on the dataset. Since the ML models failed to deliver results for this dataset, Deep Learning techniques were implemented. Table 3 provides the outcomes of different experimented architectures.
Maximum accuracy of 80% was attained in the architecture containing a total of 5 CNN Layers (4 with ReLU activation, 1 with Softmax activation), trained using Adam optimizer initialized with a learning rate of 10 −3 for faster convergence of results. The resulting architecture contains fewer parameters compared to the others and the proposed configuration consisting of a unique combination   The inherent feature extraction quality of a convolution neural network has produced greater than 70% accurate results in most of the cases with a simpler architecture and no additional feature engineering phase. The primary rationale behind this paper is the proposal of a light-weight and straightforward Deep Learning architecture and benchmark the performance against the conventional Machine Learning solutions. This model can be deployed in low memory devices like basic smartphones with less computational complexity and achieving higher throughput.

Conclusion
From the results obtained, it can be inferred that deep learning models have been successful in capturing the underlying pattern in the data. Hence, the increased performance of these models in comparison to the machine learning models.
A variety of machine learning models were tried and the corresponding results were plotted. Random Forest models capture the underlying distribution of the images in the data set but are not able to match the performance given by deep learning models. The maximum accuracy obtained from a ML model was 48% after cross-validation and feature engineering, whereas the average accuracy of DL models tried, were greater than 70%, with a maximum reaching 80% without much complexity. This reports a significant 67% performance improvement with lesser pre-processing efforts, leaving the deep learning model a clear preference over other traditional algorithms for automated DR detection.
The originality in our approach lies in the frugal designing of a lightweight Deep CNN architecture (less than 6 layers) and suitable hyperparameter settings with efficient usage of the IRDiR dataset, which had very little data to be used in a data-intensive deep learning approach. In comparison, other research papers used around 80000 images to develop a deep learning-based solution. The emphasis of our solution is a trade-off between data size and performance using a lightweight Deep Learning model.
Thus, automated diagnosing of Diabetic Retinopathy with the help of digital fundus images experimented, with a noble cause of preventing the disease which is spread across the community. It will assist doctors in early diagnosis and suggestion of precautionary treatment for vision impairment. Further, it inculcates confidence in patients on the curability of this disease with a higher success rate.

Future Scope
Further, the solution can be improvised to achieve exceptional accuracy and minimize the categorization of false positives and false negatives, with the employing automatic hyperparameter tuning methods like metalearning, evolutionary search in parameter space.
Additional reinforcements can be made with the confluence of highly performing Deep Learning and Machine Learning models.