Deployment of CNN on colour fundus images for the automatic detection of glaucoma

Detection of glaucoma has become critical, as it has arisen as the subsequent essential driver of visual impairment, around the world. At present, most of the algorithms in use rely on pre-trained deep neural networks to produce the best results. However, the high computational time and complexity and the need of a large database, make glaucoma-detection arduous and difficult. Keeping these in mind, this paper proposes a new convolutional neural network architecture, in particular, ProspectNet, which has demonstrated to accomplish a better accuracy with lesser computational time and complexity when tested against two pre-trained networks: VGG16 and DenseNet121. The data set is an amalgamation of two publicly available datasets-DRISHTI-GS and Glaucoma Dataset (Kaggle), comprising ocular colour fundus images of glaucomatous as well as normal eyes. ProspectNet has accomplished a normal AUC (area under the curve) as 0.991, specificity, and precision as 0.98. Confusion matrices also plotted to illustrate the new architecture’s efficacy. These outcomes demonstrate that ProspectNet is a hearty option in contrast to other best in class calculations for a medium sized dataset. The paper suggests three distinct structures for glaucoma detection. One advantage of our approach is that no special feature selection, such as detailed measurements of particular traits like the structure of the optic nerve head, is necessary.


INTRODUCTION
Glaucoma is an eye illness brought about by harm of the optic nerve.This harm frequently occurs by dint of high intraocular pressure and seldom because of a serious catastrophe for the eye or other injuries.Mostly, it lies in the genetic lineage and is inherited from ancestors to offspring.One aspect of this disease is that it pertains to old age and another is that it is irreversible.Usually, both the eyes are affected.However, one eye is, mostly, more badly affected than the other.An incredible level of the worldwide populace (assessed to be 64.3 million in 2013, expanding to 76 million in 2020 and 111.8 million in 2040) is victim to glaucoma.New statistics gathered by Bourne et al. (2021) shows that at least 2.2 billion individuals worldwide suffer from some form of vision impairment, and at least 1 billion of those have a condition that could have been avoided or is yet unresolved.Halpern and Grosskreutz (2002) state that glaucoma poses an even bigger public health concern than cataracts since the blindness it causes is permanent.Detection of this disease is difficult since no symptoms are seen in the early stages.The majority of the time, ophthalmologists treat all common eye conditions, including glaucoma, diabetic retinopathy, and age-related macular degeneration.A recent study by Resnikoff et al. (2020), the ophthalmology workforce that encompassed 198 countries (or 94% of the world's population), found that there is a substantial shortage of ophthalmologists in both the present and the future.Since glaucoma causes optic nerve-head harm and resultantly, visual field abandons, the best evaluating for glaucoma is the recognition of characteristic changes in the optic nerve structure by Krishnan and Faust (2013).However, because of an absence of adequate number of prepared specialists, devising an automatic detection system is almost essential.Globally, work started on detection of glaucoma as an extension of the applications of image processing.It is undeniable, though, that this case of glaucoma-detection is a superior imageprocessing application.Artificial intelligence (AI) was infused with image-processing techniques to obtain more accurate results, and the normal processing techniques like wavelet transform were modified to suit the occasions.
The following describes how the paper is organised: section 2 presents related work for the detection of glaucoma.In addition to providing a clear record of the many convolutional networks that were used, section 3 provides a full description of the dataset materials.The results of the several CNNs on the dataset taken into account, along with the ROC curves and confusion metrices, are shown in section 4 of the article.The conclusion and future work are described in section 5 at the end.

RELATED WORKS
A convolutional neural network (CNN) is a deep learning algorithm which inputs an image, relegates significance to different aspects of image and performs differentiation amongst the image set by Li et al. (2018).This work involves convolution neural networks as an instrument to distinguish between glaucoma eye and typical eye fundus images.The pre-processing needed in a convolution neural network is much lower than that of any other classification algorithms.A typical ConvNet comprises several convolution layers by Diaz-Pinto et al. (2019) and Gómez-Valverde et al. (2019), accompanied by filters which are capable of retrieving features of importance required for classification.We need to use a substantial image data collection with more than 14000 images in order to finetune these networks.
For instance, Kolář and Jan (2008) had detected glaucoma on the basis of fractal description which was followed up by classification.Fractal aspects can be utilized as highlights for retinal nerve fiber misfortunes recognition, which is an indication of glaucomatous eye.Maheswari et al. (2017) accomplishing the objective by utilizing LS-SVM (Least Squares Support Vector Machine) to rank the correntropy features extracted by EWT (Empirical Wavelet Transform).In the interim, another school of thought proposed the analysis of the cup-to-disc ratio (CDR).The CDR communicates the extent of the disc occupied by the cup.For an eye that is typical, CDR ought to be somewhere in the range of 0.3 and 0.5.With progressive neuro-retinal degeneration, the ratio in question increases.Vision is totally lost at a CDR worth of around 0.8.Additionally, the method put forward in this paper restricts the extraction area by excluding the blood vessel region, and sample images of these structures are manually collected.Mishra et al. (2011) proposes a technique for segmentation utilizing the concept of adaptive thresholding and it utilizes the features acquired from the picture, like mean and standard deviation, to eliminate data from the red and green channels of a fundus image and obtain an image which contains just the optic nerve head region in both the channels.The optic circle is divided from the red channel and optic cup from the green channel respectively by Issac et al. (2015).However, their method failed when tested on low contrast images due to the small dataset used.Corner thresholding and point contour joining based novel techniques have additionally been proposed in literature to build smooth shapes of optic disc.This algorithm tracks blood vessels inside the disc region and identifies the points at which the first vessel twists from the optic circle limit and interfaces them to acquire the contours of optic cup by Soorya et al. (2018).
A neuro-fuzzy strategy was likewise carried out to separate the features, for example, cup to disc (c/d) proportion, proportion of the distance between optic disc centre and optic nerve head to diameter of the optic disc, and the proportion of blood vessels area in inferior-superior side to area of blood vessel in the nasal-temporal side.These highlights are approved by characterizing the ordinary and glaucoma pictures utilizing neural network classifier by Nayak et al. (2009).Researchers have also tried to implement unsupervised learning to segment the optic nerve head.A template-matching, texture-based and model-based approach has been followed by Mvoulana et al. (2019).Gradually, deep learning (DL) came into the picture.Several algorithms have been proposed.Among different attributes, CNNs are known due to their capacity to learn highly discriminative features from raw pixel intensities by Diaz-Pinto et al. (2019).Diaz-Pinto et al. (2019) utilizes five ImageNet-trained models (VGG16, VGG19, InceptionV3, ResNet50 and Xception) for automatic assessment utilizing fundus images.The outcomes propose that ImageNettrained models are an extraordinary option for automatic glaucoma screening frameworks.Phan et al. (2019) have examined about utilizing DCNNs where utilizing three DCNNs showed areas under the curve (AUCs) of 0.9 or more.They also noted that the factor influencing discriminating ability is image quality rather than image size.Gómez-Valverde et al. (2019) claims to have accomplished a great performance using a transfer learning scheme with VGG19 accomplishing an AUC of 0.94 with sensitivity and specificity ratios similar to the expert evaluators of the investigation of glaucoma.There have been more developments in this area, and end-to-end deep convolutional neural network models have been developed.Here, the learning technique is made out of three phases: OD (optic disc) and PC (physiological cup) segmentations, morphometric feature estimation and glaucoma detection, which are sequentially trained utilizing a curriculum learning strategy by Perdomo et al. (2018).
A few papers discuss using and modifying pre-trained neural nets.One such case being the utilization of a GoogleNet Inception v3 pre-trained model for transfer learning, this included training the data with a pre-defined (trained) existing model.The last classification layer of the Inception v3 model was changed to fit the classification needs, and then fine-tuned using data.For back-propagation, the Adam analyser, an adaptive learning rate technique, was utilized as an enhancement work, while cross entropy was utilized as a loss function.Ahn et al. (2018) has shown exploratory outcomes that showed that the deep networks accomplished better classification accuracy after the integration of the handcrafted features, e.g., scale-invariant feature transform by Li et al. (2019).Although these techniques had a maximum accuracy of 0.8284, they performed particularly poorly in segmenting and detecting lesions, indicating that this was a highly difficult task.Specialized and clinical view points to construct a DL framework to address those needs, and the possible difficulties for clinical adoption are discussed by Nath and Dandapat (2012).AI and DL gradually started playing a critical part in clinical ophthalmology practice, with suggestions for screening, diagnosis and follow up of the major causes of vision disability in the setting of ageing populations around the world by Haleem et al. (2013) and Ting et al. (2019).However, authors merely use "a deep learning approach" and comparable words to refer to all methods rather than discussing particular deep learning techniques or architectures.A study has demonstrated that a deep learning framework can identify glaucoma with high sensitivity and specificity.Be that as it may, high or obsessive nearsightedness and physiologic measuring brings about focus positive outcomes by Li et al. (2018).In this work, author used smaller data sets at the expense of decreased performance.
The initial step to detect glaucoma is to capture good quality images of an eye.The images can be both colour fundus and those obtained through Optical Coherence Tomography (OCT).It is to be noted that a single OCT machine can cost up to $15000.This work aims to provide that early detection of glaucoma to the masses, especially to those remote places where affording a doctor is also a luxury.Thus, colour fundus images have been used.Many scientists are using CNNs to identify glaucoma with the greatest accuracy.The dedicated libraries of Python make the arduous task easier.Inspired and motivated by all of this, we have designed an algorithm using deep neural networks.
Currently, most of the algorithms in use rely on pretrained deep neural networks to produce the best results.However, the high computational time and complexity and the need of a large database, make glaucoma-detection arduous and difficult.Most of the current algorithms have been found to perform worse in patients who have several disorders, such as glaucoma.Therefore, the use of deep learning-based technologies can improve the performance of specialists while treating patients who have a variety of eye disorders.The majority of currently used approaches rely on optic cup and disc-based factors such cup to disc ratio.These methods are sensitive to the fundus image quality.Such methods may not be able to handle visual noise, raising concerns about their security.This research work proposes a new CNN architecture, namely, ProspectNet, which has proved to achieve better accuracy with lesser computational time and complexity as compared to the existing architecture.

PROPOSED METHOD
The paper suggests three distinct glaucoma detection architectures.Our method has the benefit of not requiring any particular feature selection, which is one of its benefits.The models for our strategy were trained using a variety of datasets.The datasets include photos with various levels of illuminance, contrast, image resolution, colour, and other heterogeneities.Due to the broad dataset used for training, the models are robust and impervious to errors.

Materials and Methods
This study uses 1065 retinal colour fundus images from two publically accessible datasets: the DRISHTI-GS by Sivaswamy et al. (2015) and the glaucoma dataset (obtained from Kaggle).The images are labelled in two categories, glaucoma and normal.Fig. 1 demonstrates the images taken into consideration for the experiment.

Fig. 1. Examples of colour fundus images from the dataset incorporated
The DRISHTI-GS by Sivaswamy et al. (2015) comprises a total of 101 retinal colour fundus images.The dataset comprises 71 glaucomatous images and 30 normal images.This dataset is freely accessible and was acquired and commented by Aravind Eye Hospital, Madurai, India by Sivaswamy et al. (2014).The dataset comprises images from Indians.The other set of images are obtained from a publicly available image data source, Kaggle and is labelled as Glaucoma dataset.This dataset envelops an aggregate of 964 images, out of which, 450 images are glaucomatous and 514 normal images.The subtleties of the datasets utilized are organized in Table 1.The train and test data were divided in the ratio of 4:1.

Pre-Processing
In order to reduce computational time and facilitate better results, pre-processing of the images was done by Diazet al. (2019) and by Phan et al. (2019) using the technique of binary masking.This technique helps in defining the Region of Interest (ROI) by assigning binary value to image pixels and the background.The image was converted into greyscale for effective processing.A binary mask was created by classifying each pixel as belonging to either background (taken pixel value of 0) or region of interest (taken pixel value of 1).The extracted ROI constituted an image of fixed resolution of 224 × 224 and was fed as input to the algorithms used.The extraction of ROI helped in accelerating the performance of glaucoma diagnosis on all the proposed algorithms.

Convolutional Neural Networks
The study employs usage of two pre-trained convolutional neural networks, DenseNet121 and Vgg16.Moreover, a recently evolved convolution network, ProspectNet, has also been proposed.The usage of pretrained networks makes it easier to solve a problem as it has already been trained on a larger database, thus reducing computational cost by Gómez-Valverde et al. (2019).The accuracy of both the pre-trained neural networks and the newly proposed neural networks were compared.The brief discussion of the pre-trained CNN and newly proposed CNN is given as follows:

Model Using VGG16
The model VGG16 is a CNN architecture proposed by Simonyan and Zisserman (2014) from the University of Oxford.The VGG16, a convolution neural network of 16 layers, when trained with ImageNet gives an accuracy of about 92.7%.The ImageNet is a huge dataset containing about 14 million images belonging to a large variety of classes.The architecture VGG16 model used in the work is shown in Fig. 2.
The VGG16 takes an input of fixed size 224 × 224 RGB image for its first convolution layer.The sequential model constitutes the following layers:  Two convolution layers of 64 channels of 3 × 3 dimension followed by one max pooling layer of 2 × 2 pixel window. Two convolution layers of 128 channels of 3 × 3 dimension followed by one max pooling layer of 2 × 2 pixel window. Three convolution layers of 256 channels of 3 × 3 dimension followed by one max pooling layer of 2 × 2 pixel window. Three convolution layers of 512 channels of 3 × 3 dimension followed by one max pooling layer of 2 × 2 pixel window and this set of layers is repeated.
The 3 × 3 dimension filter has a very small receptive field essential to capture the concept of left, right, up , down and centre by Simonyan and Zisserman (2014).The max pooling layers are used for performing spatial pooling for those convolution layers, which are preceded by the max pooling layers.The last max pooling layer is trailed by a flatten layer, which is essential for converting the twodimensional image matrix of features into a vector.This transformed vector is then converted to the stack of three fully connected layers.For VGG16 utilized in the study, the initial two fully connected layers have 4096 channels each.The third layer plays out a binary classification between glaucoma and normal, and therefore outputs one category.The hidden layers are utilized with the rectification (ReLU) non-linearity.This activation function has primarily been used keeping in mind that it yields best results to similar problems.

Fig. 2. Architecture of VGG16
Equation 1 is the mathematical expression of the ReLU.The model was involving Adam optimization and Binary Cross Entropy as loss function with the final layer having sigmoid function as activation by Gómez-Valverde et al. (2019).Equation 2gives the mathematical expression for the sigmoid activation function.Additional usage of Dropout and Data augmentation were also done.The data augmentation included random rotation, random vertical and horizontal flip.The developed model was trained for about 30 epochs the training was carried out using Google Colaboratory notebook. (2)

Model Using DenseNet121
The problem of glaucoma classification was also addressed by using another CNN, DenseNet121.
The DenseNet121 architecture comprises the simplest architectures of other Dense Convolution networks trained in ImageNet.In a Dense CNN, each layer is associated with each and every layer in a feed-forward style by Huang et al. (2017).In a DenseNet, the element guides of all preceding layers are employed as inputs for each layer and its own feature-maps are utilized as inputs to ensuing layers.The architecture of DenseNet used for the classification of glaucoma is demonstrated in the Fig. 3.  One convolution layer of 7 × 7 dimension, followed by one max pooling layer of 3 × 3 pixel window and then the first dense block. One convolution layer of 1 × 1 dimension followed by one average pooling layer of 2 × 2 pixel window and then the second dense block. One convolution layer of 1 × 1 dimension followed by one average pooling layer of 2 × 2 pixel window and then the third dense block. One convolution layer of 1 × 1 dimension followed by one average pooling layer of 2 × 2 pixel window and then the fourth dense block.
A dense block consists of a set of convolution layers with 3 × 3 filters and another set of convolution layers with 1x1 filters.The convolution layer and average pooling layer with activation function as ReLU, comprises a transition block.The difference in the dense blocks depends on the number of the filters used.There are in total 121 layers (4 convolution layers out of the dense block, 1 fully connected layer at the end, 4 dense block containing 6, 12, 24, 16 convolution layers respectively, each of having both 1 × 1 and 3 × 3 filter making it to a total of 121) which gives the significance of 121 in DenseNet121.Before the completely associated layer, a dropout layer is placed.The completely associated layer utilizes sigmoid activation function and also comprises 1 channel for categorisation of glaucomatous and normal images.The compilation of the model was done using Adam optimizer and binary cross entropy as loss function and was trained for 30 epochs.

Proposed Model using ProspectNet
A new CNN architecture was developed and tested for the problem of glaucoma diagnosis.The network architecture, as shown in Fig. 5, takes input image of 224 × 224 and feeds to a convolution layer.The architecture comprises the following layers:  One convolution layer of 64 channels of 3 × 3 dimension followed by one max pooling layer of 3 × 3 pixel window. One convolution layer of 32 channels of 3 × 3 dimension followed by one max pooling layer of 3 × 3 pixel window. One flatten layer followed by three fully connected layers having 128, 64 and 1 channel respectively.The third fully connected layer consists of 1 channel in order to suffice the glaucoma diagnosis algorithm.The model is finally compiled by using Adam optimiser and binary cross entropy as the loss function.The model was trained for 30 epochs.

RESULTS AND DISCUSSION
In order to evaluate the performance of the CNN architectures on glaucoma diagnosis, Receiver Operating Characteristic (ROC) analysis by Gómez-Valverde et al. ( 2019) was performed and comparison was made based on several performance metrics.Each of the architectures was fed with the same dataset and was trained for 30 epochs.The dataset was divided using train test split method where 25 percent of images from each set was put into test images.
After training of the models, the models were applied on the test set and predictions obtained were tabulated in the form of confusion matrix as shown in Fig. 6.The true positive (tp), true negative (tn), false positive (fp) and false negative (fn) values were obtained from confusion matrix, to suffice the calculation of various performance metrics for comparison of the mentioned CNNs.The computational time, accuracy, precision, AUC, sensitivity and specificity were taken as performance metrics for comparison.The VGG16, trained in ImageNet, gave a validation accuracy of about 80.28% when trained on the used dataset, while Densenet121 and ProspectNet proved to be exceptional in terms of accuracy, giving values of 96.25% and 96.63% respectively.
The sensitivity (true positive rate, tpr) is defined as the number of true positives divided by the sum of the number of true positives and false negatives, as is represented in Equation 3. Similarly specificity (true negative rate, tnr) is the number of true negatives divided by sum of true negatives and the false positives, as shown in Equation 4. Sensitivity shows percentage of the glaucomatous cases correctly predicted and specificity does the same for correctly predicted normal cases.
Sensitivity (tpr) = tp/(tp + fn) (3) Specificity (tnr) = tn/(tn + fp) (4) For VGG16, sensitivity was found to be 0.85 and specificity to be 0.79.The Densenet121 showed better metrics, with both sensitivity and specificity of 0.96.The new proposed model, ProspectNet gave a promising result with sensitivity value of 0.95 and specificity of 0.98.Further performance visualization of the models was done by plotting the ROC curve.The ROC graph is two dimensional representations with the sensitivity in Y axis and specificity in X axis.The AUC was obtained from the ROC curve and served as another metric for performance comparison.As per the graphs' results, ProspectNet gave the most fruitful results of AUC 0.991, while DenseNet121 gave an AUC of 0.990 and VGG16, an AUC of 0.892.The ROC curves of three models are shown in Fig. 7.
It is to be noted that besides the specificity and accuracy, ProspectNet yielded lower computational complexity and time compared to VGG16 and DenseNet121.Furthermore, ProspectNET has a lighter architecture (just 96 pre-trained layers) than DenseNET121, which has 121 pre-trained layers, and this was expected to result in improved accuracy with faster computing.Table 2 shows the comparison among the various parameters under observation for all the three models that have tested.
The above-mentioned results validate that CNNs yield results with high specificity and sensitive values on being trained with even a medium-sized data set.The novelty of the approach lies in the high accuracy obtained, in spite of the absence of specifications of lesion-based features and a homogenous data set.The images used in this work do not belong to the same source or size or format.

CONCLUSION AND FUTURE WORK
This paper suggests three distinct (VGG16, DenseNet12, and ProspectNet) glaucoma detection architectures.Our method has the benefit of not requiring any particular feature selection, which is one of its benefits.The majority of currently used approaches rely on optic cup and discbased factors such cup to disc ratio.These methods are sensitive to the fundus image quality.Such methods may not be able to handle visual noise, raising concerns about their security we introduced a novel approach of glaucoma classification utilizing deep CNNs using colour fundus images.
The three networks were examined using a variety of performance measures and computing time on a specific dataset.It is undeniably evident that the VGG16 model is unable to deliver useful results for the used data set.Although DenseNet121 is only marginally less contextually superior to ProspectNet, it is necessary to consider processing time and complexity when deciding whether method is superior.Since DenseNet121 includes 121 pretrained layers, the algorithm using it is unquestionably much more computationally complex.As can be shown from Table 2, ProspectNet performs better in terms of glaucoma classification accuracy and specificity, as well as processing efficiency.
The limiting factor for this study is that it has a huge dependency on the availability of labelled data.The success of deep neural networks in giving out huge amount of accuracy is largely relied on labelled dataset which posses a difficulty for us as annotation of eye images is very much time consuming and expensive.Additionally, the algorithm is limited to binary classification, which can be improved to be able to do multi class classification.Another scope for future improvements would be to add more advanced transfer learning approaches for improved classification results.
The use of deep learning methods to large-scale fundus photo benchmarks appears to be a promising topic for future research in this field.Additionally, it has been noticed that there aren't many openly available image benchmarks, and most researchers use their own fundus image benchmarks to judge the quality of their own work.It is required to create a substantial, readily available image standard in order to evaluate future research in this field.This can help create an effective CAD for glaucoma detection and will be useful in identifying the best performance of future studies evaluated on the same data set.

ACKNOWLEDGMENT
Authors would like to thank Department of Science and Technology (DST) for funding support for the project sanction number IDP/MED/53/2016 under Biomedical Device and Technology (BDTD).We thankfully recognize the resources from the School of Electrical Engineering, VIT, Vellore for giving the information sources and ideas to work on the paper.Also, the authors like to express their sincere thanks to Dr. Ravikiran, Medevplus for giving specialized data sources and direction.

Fig. 3 .
Fig. 3. Architecture of DenseNet used for glaucoma classification The CNN utilized in the study takes an input image of 224 × 224 image size and passes it to the sequential model of the DenseNet121 neural network.The architecture of DenseNet121 is shown in Fig. 4. The architecture comprises the following layers: One convolution layer of 7 × 7 dimension, followed by one max pooling layer of 3 × 3 pixel window and then the first dense block. One convolution layer of 1 × 1 dimension followed by one average pooling layer of 2 × 2 pixel window and then the second dense block. One convolution layer of 1 × 1 dimension followed by one average pooling layer of 2 × 2 pixel window and then the third dense block. One convolution layer of 1 × 1 dimension followed by one average pooling layer of 2 × 2 pixel window and then the fourth dense block.

Table 1 .
Database showing the number of images belonging to each of the category of normal and glaucomatous subsets along with the image formats

Table 2 .
Summary of the performance parameters and model attributes of the different CNNs incorporated Bressler, N.M., Webster, D.R., Abramoff, M. 2019.Deep learning in ophthalmology: The technical and clinical considerations.Progress in Retinal and Eye Research, 72, 100759.