·Meta-Analysis·

Current applications of machine learning in the screening and diagnosis of glaucoma: a systematic review and Meta-analysis

Patrick Murtagh¹, Garrett Greene², Colm O'Brien¹

¹Department of Ophthalmology, Mater Misericordiae University Hospital, Eccles Street, Dublin D07 R2WY, Ireland

²RCSI Education and Research Centre, Beaumont Hospital, Dublin D05 AT88, Ireland

Correspondence to: Patrick Murtagh. 123 Melvin Road, Terenure, Dublin D6W FN29, Ireland. murtagp@tcd.ie

Received: 2019-08-05 Accepted: 2019-09-23

Abstract

AIM: To compare the effectiveness of two well described machine learning modalities, ocular coherence tomography (OCT) and fundal photography, in terms of diagnostic accuracy in the screening and diagnosis of glaucoma.

METHODS: A systematic search of Embase and PubMed databases was undertaken up to 1^st of February 2019. Articles were identified alongside their reference lists and relevant studies were aggregated. A Meta-analysis of diagnostic accuracy in terms of area under the receiver operating curve (AUROC) was performed. For the studies which did not report an AUROC, reported sensitivity and specificity values were combined to create a summary ROC curve which was included in the Meta-analysis.

RESULTS: A total of 23 studies were deemed suitable for inclusion in the Meta-analysis. This included 10 papers from the OCT cohort and 13 from the fundal photos cohort. Random effects Meta-analysis gave a pooled AUROC of 0.957 (95%CI=0.917 to 0.997) for fundal photos and 0.923 (95%CI=0.889 to 0.957) for the OCT cohort. The slightly higher accuracy of fundal photos methods is likely attributable to the much larger database of images used to train the models (59 788 vs 1743).

CONCLUSION: No demonstrable difference is shown between the diagnostic accuracy of the two modalities. The ease of access and lower cost associated with fundal photo acquisition make that the more appealing option in terms of screening on a global scale, however further studies need to be undertaken, owing largely to the poor study quality associated with the fundal photography cohort.

KEYWORDS: machine learning; glaucoma; ocular coherence tomography; fundal photography; diagnosis; Meta-analysis

DOI:10.18240/ijo.2020.01.22

Citation: Murtagh P, Greene G, O’Brien C. Current applications of machine learning in the screening and diagnosis of glaucoma: a systematic review and Meta-analysis. Int J Ophthalmol 2020;13(1):149-162

INTRODUCTION

Glaucoma is a term used to describe a group of optic neuropathies which cause damage to retinal ganglion cells, and is the second leading cause of permanent blindness in developed countries^[1]. The damage caused by glaucoma is irreversible and therefore early detection and treatment is vital to halt visual damage^[2]. From a global perspective, the number of people diagnosed with the disease is expected to almost double from 76 million in 2020 to 112 million in 2040^[3]. Glaucoma is usually related to an increase in intraocular pressure which leads to stress induced damage on the retinal ganglion cells resulting in a characteristic appearance of the optic nerve head and associated visual field defects^[4]. Pressure lowering medications and/or surgery can be used to halt its progression, especially if it is detected at an early stage. However, the disease has an insidious onset and so therefore patients can remain asymptomatic for many years before they attend for investigation and/or treatment. Early detection is essential to ensure patients continue to have an adequate quality of life and allow them to retain their independence and the ability to drive^[5]. The economic and social impact that glaucomatous optic neuropathy can have on society has been well described^[6]. The pathogenesis and progression of glaucoma is still poorly understood^[7].

Currently there are numerous methods used to diagnose and screen for glaucoma, however these techniques are expensive, time consuming, require skilled operators and are manual^[8]. Four modalities are routinely used; perimetry to detect a visual field defect, pachymetry to detect corneal thickness, tonometry to measure intraocular pressure and fundoscopy to examine the optic nerve head. Glaucoma is a disease that is seen to increase significantly with advancing age^[9] and the projected increase in the population over the age of 50 is expected to double over the next 20y^[10]. It is therefore imperative that an efficient screening and/or diagnosing system is established to halt both the disease burden and the burden on ophthalmic departments.

Glaucomatous optic neuropathy can result in a thinning of the retinal nerve fibre layer (RNFL) and an associated enlargement of the cup-to-disc ratio (CDR). Peripapillary atrophy is also a well-known sign associated with glaucoma^[11], however this can be seen in other ocular pathologies such as high myopia^[12]. Confusingly, individuals with myopia have an increased risk of developing glaucoma^[13].

The Anderson Patella criteria is the gold standard criteria for manifest glaucoma diagnosis. It is stated for a diagnosis to be made, the following must be seen on a 30-2 Humphrey visual field test (Humphrey Field Analyser, Carl Zeiss Meditec, Dublin, California): 1) abnormal glaucoma hemi-field test, 2) three or more non-edge points, which are contiguous, must be depressed with a P<5% with at least one of these having a P<1%, 3) this must be demonstrated on two or more field tests^[14].

Ocular coherence tomography (OCT) is a non-invasive imaging technique which provides micrometre resolution cross sectional views of the retina^[15]. It can be utilised to assess RNFL thinning around the optic nerve head and macular area. Fundal photography is another imaging technique which takes photographs of the inner retina mainly using a widefield fundus camera. Parameters including the size of the optic nerve head and the CDR, alongside peripapillary atrophy, vessel branching and tortuosity can be examined using fundal photos^[16]. OCT scanners can interpret these parameters alongside RNFL thickness. OCT scans are accurate, reproducible and are not patient dependant. They can supply us with information about change in thickness of the RNFL and can be used in differentiate glaucomatous from non-glaucomatous eyes^[17]. RNFL thickness as determined by OCT scans has shown a high correlation with the functional status of the optic nerve^[18].

Fundal photos^[19] and OCT scanning techniques^[20] have proven to be useful screening and diagnosing modalities for glaucoma. A contrast between the two modalities exist relating to the ease of access and speed of acquisition of fundal photos in comparison to cost and user expertise associated with OCT scanning. The use of a fundal camera negates the price associated with more expensive diagnostic equipment or in setting where one is just not available (e.g., lower income countries). The average cost of an OCT scanner is approximately $40 000 whereas one can augment their smartphone with a lens to aid in the acquisition of basic fundal photos^[21]. Fundal photos can be used concomitantly to diagnose other ocular pathologies such as age related macular degeneration (ARMD) or diabetic retinopathy (DR)^[22].

Artificial intelligence (AI), in particular “Machine learning”, has seen a recent upsurge, particularly its use in medicine, namely ophthalmology, and is currently being developed as a screening and diagnostic tool in many ophthalmic conditions. Machine learning refers to any process in which an algorithm is iteratively improved or “trained” in performing a task, usually a classification or identification task, by repeated exposure to many examples, known as the training data or training set. The trained algorithm can then be tested by measuring its performance in classifying novel unseen data (the test set).

In particular, “supervised” learning algorithms have proven highly successful in automating binary classification tasks, such as determining the presence or absence of a pathology. “Supervised learning” refers to training regimes in which the algorithm is given both the input (e.g., an OCT or fundus image) and the correct output (e.g., the correct diagnosis) for each element in the training set. In this way, the algorithm implicitly learns a mathematical function which maps each input to the correct output. If new data is applied to this function, the machine learning algorithm should be able to classify it correctly. Machine learning can be utilised when we can’t directly express how a problem should be solved using an algorithm but we can illustrate to the machine examples that are both positive and negative and allow it to identify a function for itself. As a result, the validity of a machine learning algorithm depends heavily on the size and quality of the training data, and so validation of algorithms is highly important to ensure that the results will generalise. Steps for constructing an AI model include pre-processing the raw data, training the model, validating it and then testing it^[23].

Machine Learning Algorithms

Artificial neural networks Artificial neural network (ANN) models are inspired by the structure of the brain, in particular the human visual system, making them highly useful in automated image analysis. ANNs consists of many simple simulated processing units (“neurons”) connected in one or more layers. Neurons receive input from preceding layers, combine the inputs according to simple summation rules, and generate an output which is fed forward to the next layer. The lowest layer of the network represents the input (e.g., image pixel values), while the final layer represents the output or classification. Inputs from one neuron to another are “weighted”, with values analogous to synaptic weights in neural connections. As the algorithm is trained, these weights are updated according to simple feedback rules to improve the accuracy of the classification.

Support vector machine Support vector machines (SVMs) apply a multi-dimensional transform to the input data (image pixels). The algorithm then attempts to identify the hyperplane in this higher-dimensional space which best separates the training data into the desired categories (e.g., glaucomatous and non-glaucomatous). The further away the data points lie form the plane, the more confident the model is that it identified them correctly^[24]. The algorithm’s objective is to find the plane with the greatest margin, i.e., the greatest distance away from points and the plane so that it can achieve the greatest accuracy.

Random forest Random forest (RAN) uses multiple non-correlated decision trees. Each decision tree predicts an output and the output with the highest prediction rate is the one which has the greatest likelihood to be correct. In essence, the outcome which gets the greatest number of “votes” from multiple non-correlated prediction models is the answer which is presumed to be the most accurate^[25].

K-nearest neighbour This is an algorithm that works on the principal that data with similar characteristics will lie in close proximity to each other. For a new piece of data, the algorithm determines how close it lies in relation to another piece of predesignated data and will then make an assumption on whether the new data has a positive or negative value^[26].

Validation Validation is a process in which the trained model is evaluated with a testing data set. It is used to determine how well the algorithm can classify images that it has never seen before. Cross validation is a commonly utilised method of testing the validity of machine learning algorithms to reduce the risk of overfitting. Overfitting is when a machine learning algorithm learns the details and noise in a testing set too well and subsequently impacts negatively on the classification of future data^[27]. The most commonly applied cross-validation method is “K-fold cross-validation”. In this method, the dataset is randomly split into k subsets of equal size. Common choices of k are 5 or 10 (5-fold or 10-fold cross validation). The training is then performed using k-1 of the subsets as the training set, and the remaining 1 subset as the test set. This process is repeated k times, leaving a different subset out of the training each time. The final estimate of the model’s accuracy is given by pooling the results of the validation in each of the k subsets. Although k-fold cross-validation uses the same data for both training and testing, an individual data point is never included in both the training and test sets for a particular iteration, reducing the likelihood of overfitting. This model has proven to be effective to avoid the overfitting or under fitting of data^[28]. It has been stated that cross validation is a better method for testing and training than random allocation^[29]. Random allocation is when the training and validation set are randomly split, with one section used for training and another used for validation.

Ophthalmology and machine learning Machine learning is a technology that is still in its embryonic stage. Deep learning (a subset of machine learning focusing on ANN), which only found its feet in the 2000s, is a technology with widespread use in modern society including speech recognition, real time language translation and, most notably, image recognition^[30]. Its transition to medical imaging analysis was an obvious step.

Ophthalmology is an ideal specialty for the implantation of machine learning due to the ability to obtain high resolution images of the posterior of the eye in the form of fundal photos or OCT scans. These are non-invasive techniques with no radiation or potential for harm. A recent study^[31] examining the ability of a machine learning algorithm to identify referable retinal diseases from OCT scans revealed that its success was comparable with clinical retinal specialists. It demonstrated that it could work in a real-world setting, with the benefit of being able to diagnose multiple pathologies.

Multiple machine learning parameters are currently being assessed by numerous authors to aid in glaucoma diagnosis. There is debate over which screening frame work is the most sensitive and/or specific. On review of the literature, the most utilised screening modalities include either fundal photographs (which incorporates optic nerve head assessment and retinal vascular geometry) or OCT imaging. As previously stated, techniques offer the advantage of quick assessment time, however the OCT scanner is a significantly more expensive than the fundal camera and requires a skilled operator.

The literature has indicated that there are many studies examining the efficacy of these modalities in the screening/diagnosing of glaucoma, but none have compared the sensitivity and specificity of these tests in comparison to one another. To facilitate the mass screening of glaucoma, it would be beneficial to identify the most appropriate diagnostic test and to date the literature has failed to examine this.

Study aims and objectives To determine the diagnostic accuracy, in terms of sensitivity and specificity and/or area under the receiver operating curve (AUROC), of machine learning (including, but not restricted to, SVM, ANN, convolutional neural network (CNN), K-nearest neighbour, least square SVM (LS-SVM), naïve Bayes and sequential minimal optimisation in diagnosing glaucoma and identifying those at risk. The two imaging modalities to be examined are fundal photographs of the optic disc, retinal vessels and OCT imaging.

These will be compared to the current reference test which is defined by the Anderson Patella criteria^[14] for glaucoma diagnosis. Referable glaucomatous optic neuropathy is a term used when there is an increased CDR and therefore an associated suspicion of glaucoma. Not all suspicious discs are glaucomatous and a proportion of the normal population will have an increased CDR of greater than 0.7^[32]. Functional damage, as represented by specific loss of peripheral visual field, has been demonstrated to be the most crucial component in the diagnosis of glaucoma^[33].

In essence, we hope to elucidate from the general population (Population) which method of machine learning screening for glaucoma (Experimental test) is most accurate when we compare it to the gold standard test which is perimetry as defined by the Anderson Patella criteria (Reference test). This will allow us to determine the most appropriate imaging modality to utilise in terms of the automation of the mass screening of glaucoma.

MATERIALS AND METHODS

Search Strategy A search of Pubmed and Embase was undertaken up to the first of February 2019. The search terms used in PubMed included (“glaucoma”[MeSH Terms] OR “glaucoma”[All Fields] OR “glaucomatous”[All Fields] OR “glaucomatous”[All Fields]) AND (“machine learning”[All Fields] OR “deep learning”[All Fields] OR “Computer Aided”[All]) AND (“diagnosis”[All Fields] OR “detection”[All Fields] OR “Screening”[All fields]). The search terms in Embase included (‘glaucoma’/exp OR glaucoma) AND (‘machine learning’ OR ‘deep learning’ OR ‘computer aided diagnosis’) AND (‘diagnosis’ OR ‘detection’ OR ‘screening’).

The retrieved studies were imported into RevMan 5 (version 5.3. Copenhagen: The Nordic Cochrane Center, the Cochrane Collaboration, 2014). All duplicates were deleted. The titles and abstracts of the remaining articles were reviewed by two authors (Murtagh P and Greene G) and those that did not meet the inclusion criteria were removed. For completion, the reference lists from the selected studies were also examined.

Inclusion and Exclusion Criteria Machine learning in diagnostic imaging is a field which is still in its infancy and so therefore there were a limited number of robust papers on the subject. Inclusion criteria consisted of all observational studies examining machine learning in the diagnosis and/or screening of glaucoma involving fundal photographs and OCT imaging. Exclusion criteria included studies which used human interpretation of fundal images, those whose machine learning was based on perimetry, those only associated with diabetic macular oedema or ARMD, participants under 18 years old and those with neurological or other disorders which may confound visual field results. Some of these studies did not define their diagnostic criteria but stated that the diagnosis was of glaucoma was made by an ophthalmologist.

Data Extraction For each study we recorded the name of the principal author, year of publication, the number of eyes involved in the study (both glaucomatous and healthy), the machine learning classifier used (if multiple were utilised, the classifier with the most favourable result was taken), how the classifier was trained and tested, their definition of glaucoma diagnosis, make of OCT scanner in the cohort that used OCT and their results.

Measurements of Diagnostic Accuracy Results were recorded in the papers as either the AUROC or in terms of sensitivity and specificity. A receiver operating characteristic (ROC) curve is a statistical representation which demonstrates the diagnostic ability of a binary classifier at varying discrimination thresholds^[34]. A ROC curve is generated by plotting true positive rates against false positive rates or by (1-specificty) on the x-axis and sensitivity on the y-axis. The AUROC informs us about the ability of the model to distinguish between different classes. It is the outcome measure most used to assess the reliability of a machine learning diagnosis. The results range from 0.5-1, the closer the result is to one, the better the performance of the machine leaning model^[35]. Sensitivity is the proportion of true positives that are correctly identified by the test. Specificity is the proportion of true negatives that are correctly identified by the test^[36]. Although these two outcome parameters are not directly comparable, we used an algorithm to calculate an average of the sensitivity and specificity values in terms of AUROC in the papers that failed to define one. ROC curves illustrate sensitivity and specificity at different cut-off values. If only sensitivity and specificity are stated in the studies, then there must be a single cut-off value being used, but this is not always stated (and may not be known, since it is sometimes a hidden parameter of the machine learning model).

Assessment of Study Quality All studies available were observational studies and therefore there was no defined standard evaluation of bias. We consequently established an adapted scoring system based on the Newcastle-Ottawa Scale (NOS)^[37]. Each study was assessed on the following criteria: 1) sample size (greater than 100, 1 point; less than 100, 0 points); 2) validation technique (cross validation, 1 point; other, 0 points); 3) unique database (unique database, 1 point; previously utilised database, 0 points); 4) their definition of glaucoma and diagnostic criteria (Anderson Patella criteria, 1 point; other, 0 points); 5) inclusion of a confidence intervals (CIs) around reported outcomes (yes, 1 point; no, 0 points), and 6) their interpretation and reporting of results (AUROC, 1 point; other, 0 points). Studies which scored four points or greater were deemed to be of a higher methodological standard.

Statistical Analysis Results of studies employing fundal images and OCT were extracted using Cochrane RevMan software (version 5.3. Copenhagen: The Nordic Cochrane Center, the Cochrane Collaboration, 2014) and Meta-analysis was performed to compare the accuracy of diagnosis. In the majority of studies, diagnostic accuracy was summarised by the AUROC. Summary estimates of the combined AUROC for each imaging methodology were estimated by inverse-variance weighted Meta-analysis following the method of Zhou et al^[38].

However, several studies employing fundal images reported only a single sensitivity and specificity point. A single summary AUROC value for these studies was derived by estimating a Heirarchical Summary ROC curve (HSROC)^[39]. The HSROC was estimated by hierarchical logistic regression using the Metandi and Midas^[40] packages in Stata version 15.0 (StataCorp, College Station, TX, USA). This summary AUROC value and associated standard error (SE) were included in the Meta-analysis of fundal image studies.

Comparison of accuracy of fundal image and OCT studies was performed by comparing pooled estimates and SE obtained through Meta-analysis of each cohort. Significant difference was defined by a P value less than 0.05.

Potential Confounders Potential Confounders include the same data set being used by different studies, training and testing on the same data set, discrepancy in glaucoma diagnosis, concomitant neurological disorders which may confound results, use of crowdsourcing platforms and focus on computer methods as opposed to clinical outcomes.

RESULTS

Selection Process and Search Results Our search parameters returned a total of 131 papers from PubMed and 154 from Embase giving us a total of 285 studies. The titles and abstracts of these studies were reviewed, duplicates were removed, alongside the papers that did not fulfil the inclusion or exclusion criteria, and 36 papers were deemed suitable for revision in full text. Following comprehensive appraisal, a total of 23 papers were deemed suitable for inclusion in this Meta-analysis. This consisted of 13 papers which examined machine learning in the diagnosis of glaucoma using fundal photos and 10 using OCT technology. All studies were population based observational studies. Figure 1 outlines the selection process.

Patrick Murtagh1

Figure 1 Flow diagram depicting the selection process for inclusion in the Meta-analysis.

Table 1 tabulates the data with regards machine learning and fundal images. Ten of the thirteen studies were from Asia, nine of which were from India^[41-49] and one from South Korea^[50]. Of the remaining three studies, one utilised a dataset from Germany^[51], one used fundal photos from two previous American studies^[52] (the African Descent and Glaucoma Evaluation Study and the Diagnostic Innovations in Glaucoma Study) and the comprehensive study by Li et al^[53] used the large online dataset Labelme (a crowdsourcing platform for labelling fundal photographs). Of the studies undertaken in India, six utilised the Kasturba Medical College dataset^{[42-44,46,48-49]} and two used the Venu Eye Research Centre dataset^[45,47] but using different machine learning algorithms. The studies were published between 2009 and 2018. A total of 59 788 eyes were included in the studies, 39 745 coming from a single study^[53].

Table 1 A summary of studies depicting automated diagnosis of glaucoma using fundal images

Paper	Classifier	Number, age	Training and testing	Results	Glaucoma diagnosis	Database
Nayak et al 2009^[42]	ANN	61, 37 G, 24 H, 25 to 60	46 images used for training, 15 images used for testing	AUROC 0.984 (sensitivity 100%, specificity 80%), no CI	Ill-defined but by an ophthalmologist	Kasturba Medical College, Manipal, India
Bock et al 2010^[51]	SVM	575, 239 G, 336 N, 56.1±11.4	5 fold cross validation	AUROC 0.88 P<0.07, sensitivity 73%, specificity 85%	Ill defined, stated gold standard	Erlangen Glaucoma Registry, Germany
Acharya et al 2011^[43]	SVM	60, 30 G, 30 N, 20-70	5 fold cross validation	91% accuracy, no CI, stated P significant is <0.05	Ill defined	Kasturba Medical College, Manipal, India
Mookiah et al 2012^[44]	SVM	60, 30 G, 30 N, 20-70	3 fold stratified cross validation	Accuracy 93.33%, sensitivity 86.67%, specificity 93.33%, AUROC 0.984, no CI, stated P significant is <0.05	Ill-defined but by an ophthalmologist	Kasturba Medical College, Manipal, India
Chakrabarty et al 2016^[41]	CNN	314, 169 G, 145 N	1926 to train, 314 to test	AUROC 0.792	Gold standard. Diagnosed by 4 glaucoma specialist	Aravind Eye Hospital, Madurai and Coimbatore, India
Issac et al 2015^[45]	SVM	67, 32 G, 35 N, 18-75	Leave one out cross validation	Accuracy 94.11%, sensitivity 100%, specificity 90%, no CI, P significant if less than 0.05	Ill-defined but by an ophthalmologist	Venu Eye Research Centre, New Delhi, India
Maheshwari et al 2017^[46]	SVM	Two databases, 60, 30 G, 30 N, 505, 250 G, 255 N, no age range	Three fold and tenfold cross validation	Accuracy 98.33%, sensitivity 100%, specificity 96.67%, no CI, P significant if less than 0.05	Ill-defined but by an ophthalmologist	Medical Images analysis Group Kasturba Medical College, Manipal, India
Singh et al 2016^[47]	SVM	63, 33 G, 30 N, 18-75	Leave one out cross validation, 44 to train 19 to check	Accuracy 95.24%, sensitivity 96.97%, specificity 93.33%, no CI, P significant if less than 0.05	Ill-defined but by an ophthalmologist	Venu Eye Research Centre, New Delhi, India
Maheshwari et al 2017^[48]	LS-SVM	488, 244 G, 244 N, no age range	Three fold and tenfold, cross validation	Accuracy 94.79%, sensitivity 93.62%, specificity 95.88%	Ill-defined but by an ophthalmologist	Kasturba Medical College, Manipal, India
Raghavendra et al 2018^[49]	SVM	1426, 837 G, 589 N	70% raining, 30% testing, repeated 50 times, random training and testing partitions	Accuracy 98.13%, sensitivity 98%, specificity 98.3%, no CI, P significant if less than 0.05	Ill-defined but by an ophthalmologist	Kasturba Medical College, Manipal, India
Ahn et al 2018^[63]	CNN	1542, 756 G, 786 N, no age range	Randomly partitioned into 754 training, 324 validation and 464 test datasets	AUROC 0.94, accuracy 87.9%, no CI	Ill-defined but likely Anderson Patella Criteria	Kim’s Eye Hospital, Seoul, South Korea
Christopher et al 2018^[52]	CNN	14822, 5633 G, 9189 N	10 fold cross validation	AUROC 0.91 (0.9-0.91 CI)	Independent masked graders	The ADAGES study, New and Alabama DIGS Study, California
Li et al 2018^[53]	CNN	39745, 9279 G, 30466 N	8000 images as the validation set, and 31745 images as training set	AUROC 0.986 (95%CI, 0.984-0.988)	Grading by trained ophthalmologists	Label me Data Set

G: Glaucoma; N: Normal; AUROC: Area under the receiver operating characteristics curve; CI: Confidence interval; CNN: Convolutional neural networks; ANN: Artificial neural network; SVM: Support vector machine; LS-SVM: Least squares support vectors machine; ADAGES: African descent and glaucoma evaluation study; DIGS: Diagnostic innovations in glaucoma study.

Table 2 illustrates the data with regards machine learning and OCT imaging techniques. There is a total of ten studies published between 2005 and 2019. Five of the studies are from the USA^[54-58], two are from Japan^[50,59], two are from Brazil^[60-61] and the remaining study is from Sweden^[62]. There was no overlap between the between the studies as regards data sets. Three studies used the Stratus OCT^[54-55,62], three studies used the Cirrus OCT (one standard definition^[60] and two high definition^[56,61]), Topcon OCT was used in two^[50,57] and the RS 3000^[50] and Spectralis^[58] was used in one study each. The studies were published between 2005 and 2019. A total of 1743 eyes were included in the OCT studies.

Table 2 A summary of studies depicting automated diagnosis of glaucoma using OCT

Paper	Classifier	Number	Training and testing	Results	OCT	Glaucoma diagnosis	Database
Burgansky-Eliash et al 2005^[54]	Multiple-take SVM	89, 47 G, 42 N	Six fold validation, leave one out	AUROC 0.981, no CI	Stratus OCT	Anderson Patella Criteria	Recruitment of Subjects, Pennsylvania
Bowd et al 2008^[55]	RVM	225, 156 G, 69 N	Tenfold cross validation	AUROC 0.809, no CI	Stratus OCT	Anderson Patella Criteria	Observational Cross Sectional Study, California
Bizios et al 2010^[62]	SVM	152, 62 G, 90 N	Tenfold cross validation	AUROC 0.977, CI 0.959-0.999	Stratus OCT	Anderson Patella Criteria	Observational Cross Sectional Study, Citizens of Malmo Sweden
Barella et al 2013^[60]	RAN	103, 57 G, 46 N	Tenfold cross validation resampling	AUROC 0.877, CI 0.810-0.944	Cirrus SD OCT	Anderson Patella Criteria	Glaucoma Service UNICAMP, Brazil, prospective, observational cross sectional
Silva et al 2013^[61]	RAN	110, 62 G, 48 N	Tenfold cross validation	AUROC 0.807, CI 0.721-0.876	Cirrus HD OCT	Anderson Patella Criteria	Glaucoma Service UNICAMP, Brazil, observational cross sectional
Xu et al 2013^[56]	Boosted logistic regression	192, 148 G, 44N	Normative database, Tenfold cross validation	AUROC 0.903, no CI	Cirrus HD OCT	Anderson Patella Criteria	PITT trial, Pennsylvania
Muhammad et al 2017^[57]	CNN	102, 57 G, 45 N	Pretrained, leave one out cross validation	AUROC 0.945, CI 0.955-0.947	Topcon OCT	Anderson Patella Criteria	From previous study for OCT and early glaucoma diagnosis, New York
Asaoka et al 2019^[59]	SVM	178, 94 G, 84 N	Pre training, glaucoma OCT database	AUROC 0.937, CI 0.906-0.968	RS 3000	Anderson Patella Criteria	Japanese Archives of Multicentral Images of Glaucomatous OCT database, Japan
Christopher et al 2018^[58]	PCA	235, 179 G, 56 N	Leave one out approach	AUROC 0.95, CI 0.92-0.98	Spectralis OCT	Ill defined	DIGS dataset, California
An et al 2019^[50]	CNN	357, 208 G, 149 N	Tenfold cross validation	AUROC 0.963, Mean±SD 0.029	Topcon OCT	Anderson Patella Criteria	Observational Cross Sectional Study, Japan

Assessment of Study Quality An assessment of study quality can be seen in Tables 3 and 4 with regards to the fundal photo and OCT groups respectively. We defined a superior methodology as a score of four or greater. It can be observed that the OCT group have a superior methodological standard than the fundal photo group. All of the OCT group have a score of four point of greater, while only 5 of the 13 (38.46%) studies in the fundal photo group achieved this score.

Table 3 Assessment of study quality using modified NOS with respect to fundal images

Paper	Sample size	Validation technique	Unique database	Definition of glaucoma	CI	AUROC	Total
Nayak et al 2009^[42]						X	1
Bock et al 2010^[51]	X	X	X	X		X	5
Acharya et al 2011^[43]		X					1
Mookiah et al 2012^[44]		X				X	2
Chakrabarty et al 2016^[41]	X		X	X		X	4
Issac et al 2015^[45]		X					1
Maheshwari et al 2017^[46]	X	X					2
Singh et al 2016^[47]		X					1
Maheshwari et al 2017^[48]	X	X					2
Raghavendra et al 2018^[49]	X						1
Ahn et al 2018^[63]	X		X	X		X	4
Christopher et al 2018^[52]	X	X	X		X	X	5
Li et al 2018^[53]	X		X		X	X	4

CI: Confidence interval; AUROC: Area under the receiver operating characteristics curve; NOS: Newcastle-Ottawa Scale; OCT: Ocular coherence tomography.

Table 4 Assessment of study quality using modified NOS with respect to OCT scans

Paper	Sample size	Validation technique	Unique database	Definition of glaucoma	CI	AUROC	Total
Burgansky-Eliash et al 2005^[54]		X	X	X		X	4
Bowd et al 2008^[55]	X	X	X	X		X	5
Bizios et al 2010^[62]	X	X	X	X	X	X	6
Barella et al 2013^[60]	X	X		X	X	X	4
Silva et al 2013^[61]	X	X		X	X	X	5
Xu et al 2013^[56]	X	X	X	X		X	5
Muhammad et al 2017^[57]	X	X	X	X	X	X	6
Asaoka et al 2019^[59]	X		X	X	X	X	5
Christopher et al 2018^[58]	X	X	X			X	4
An et al 2019^[50]	X	X	X	X	X	X	6

CI: Confidence interval, AUROC: Area under the receiver operating characteristics curve; NOS: Newcastle-Ottawa Scale; OCT: ocular coherence tomography.

Definition of Glaucoma Definition of glaucoma diagnosis varied between the studies. In the fundal photo study cohort, the majority^[42-49] were ill defined but stated that they were diagnosed by an ophthalmologist, three^[41,51,63] stated that the diagnosis was gold standard and was likely the Anderson Patella Criteria and the remaining 2 studies^[52-53] were diagnosed using trained independent masked graders. In the OCT studies group, nine of the ten studies^{[50,54-57,59-62]} had their glaucoma diagnosis defined by the Anderson Patella criteria. In the remaining paper^[58], the diagnosis was ill-defined but stated to be by two independent masked graders.

Machine Learning Classifier As regards the machine learning classifier, SVM^{[43-47,49-51]} was used in seven of the fundal photo group. Neural networks were used in five^{[41-42,52-53,63]} and LS-SVM^[48] was used in one each. In the OCT cohort, the classifier used was more varied. Three studies utilised SVM^[54,59,62]. RAN^[60-61] and CNN^[50,57] was utilised by two studies each. Principal Component Analysis^[58] and Relevance Vector Machine^[55] were used in one study and the final study^[56] employed boosted logistic regression.

Validation Training and testing protocols are outlined in Tables 1 and 2. Cross validation was the most common method with tenfold, fivefold, threefold and leave one out cross validation accounting for the practices in nine^{[46,48,50,52,55-56,60-62]}, two^[43,51], one^[44] and five^{[45,47,54,57-58]} of the studies respectively. A random partitioning of training and testing occurred in four of the studies^{[49,53,59,63]}.

Meta-Analysis Since AUROC was the most widely reported and informative measure of diagnostic precision employed in the included studies, Meta-analysis was performed based on pooling AUROC estimates for each cohort.

Nine of the fundal image studies did not report an AUROC, but gave only a single value of sensitivity and specificity^[42-49,51]. In order to include these studies in the larger Meta-analysis, we obtained a single pooled AUROC value by estimating a HSROC. This is shown in Figure 2. AUROC curve was calculated to be 0.979; 95%CI: 0.887-0.996. Studies which only reported an AUROC but did not include an estimate of variance or uncertainty (i.e., SE or CI) could not be included in the Meta-analysis.

Patrick Murtagh2

Figure 2 An HSROC estimated by pooling results of nine fundal photo studies who did not report a AUROC value (sensitivity and specificity only) Area under the summary ROC curve: 0.979; 95%CI: 0.887-0.996.

Tables 5 and 6 outline the results of the Meta-analysis in terms of the fundal photo cohort (with the HSROC addition) and the OCT cohort respectively.

Table 5 Meta-analysis, AUROC and estimated HSROC of studies relating to fundal photos

Study	AUROC	SE	95%CI	z	P	Weight (%)
Study	AUROC	SE	95%CI	z	P	Fixed	Random
Christopher et al 2018^[52]	0.910	0.00200	0.906 to 0.914			19.92	34.92
Li et al 2018^[53]	0.986	0.00100	0.984 to 0.988			79.67	35.01
Others (HSROC estimate)	0.979	0.0140	0.952 to 1.000			0.41	30.08
Total (random effects)	0.957	0.0204	0.917 to 0.997	46.9	<0.001	100.00	100.00

AUROC: Area under the receiver operating characteristic curve; CI: Confidence interval; HSROC: Hierarchical summary receiver operating characteristic curve; SE: Standard error.

Table 6 Meta-analysis and AUROC of studies relating to OCT studies

Study	AUROC	SE	95%CI	z	P	Weight (%)
Study	AUROC	SE	95%CI	z	P	Fixed	Random
Burgansky-Eliash et al 2005^[54]	0.981	0.0330	0.916 to 1.000			2.12	8.69
Bowd at al 2008^[55]	0.817	0.0300	0.758 to 0.876			2.56	9.19
Bizios et al 2010^[62]	0.977	0.0102	0.957 to 0.997			22.16	12.12
Barella et al 2013^[60]	0.877	0.0342	0.810 to 0.944			1.97	8.49
Silva et al 2013^[61]	0.807	0.0395	0.730 to 0.884			1.48	7.64
Xu et al 2013^[56]	0.903	0.0472	0.810 to 0.996			1.04	6.54
Muhammad et al 2017^[57]	0.945	0.0102	0.925 to 0.965			22.16	12.12
Asaoka et al 2019^[59]	0.937	0.0158	0.906 to 0.968			9.22	11.45
Christopher et al 2018^[58]	0.950	0.0153	0.920 to 0.980			9.85	11.52
An et al 2019^[50]	0.963	0.00917	0.945 to 0.981			27.44	12.22
Total (random effects)	0.923	0.0174	0.889 to 0.957	53.1	<0.001	100.00	100.00

OCT: Ocular coherence tomography; AUROC: Area under the receiver operating characteristic curve; CI: Confidence interval; SE: Standard error.

It can be seen that there is no statistically significant difference with respect to machine learning between fundal photos and OCT images in diagnosing or screening for glaucoma. The total AUROC, in terms of random effects, for the fundal photo cohort was calculated to be 0.957 (CI: 0.917 to 0.997, P<0.001) and 0.923 (CI: 0.889 to 0.957, P<0.001) for the OCT cohort.

Figures 3 and 4 are Forest plots depicting a graphical representation of weight and AUROC of both the fundal image cohort and the OCT cohort respectively.

Patrick Murtagh3

Figure 3 Forest plot of the AUROC of the fundal images cohort.

Patrick Murtagh4

Figure 4 Forest plot of the AUROC of the OCT cohort.

Funnel Plots for risk of bias were also performed for the two cohorts and are outlined in Figures 5 and 6. The fundal image group only has three points and therefore it has an ill-fitting funnel plot.

Patrick Murtagh5

Figure 5 Funnel plot for fundal image studies.

Patrick Murtagh6

Figure 6 Funnel plot for OCT studies.

The OCT funnel has every study include and is a greater assessment of study bias in comparison to the fundal image plot. Due to the fact that, unlike with standard diagnostic tests, diagnostic accuracy is expected to increase with sample size in machine learning studies, one would expect funnel plots in machine learning Meta-analysis to be asymmetric, with the majority of studies falling in the lower left quadrant. A large number of studies falling to the bottom-right would be suggestive of publication bias or perhaps overfitting of machine learning models.

Tests for heterogeneity were also performed and these are outlined in Table 7. The I²value for the fundal image cohort and the OCT cohort is 99.83% and 81.66% respectively which is indicative of a high level of heterogeneity.

Table 7 Test for heterogeneity for fundal image studies and OCT studies

Parameters	Q	Significance level	I² (inconsistency)	95%CI for I²
Fundal image studies	1155.5417	P<0.0001	99.83%	99.77 to 99.87
OCT studies	49.0860	P<0.0001	81.66%	67.42 to 89.68

A comparison of sample size in terms of numbers used for validation versus diagnostic accuracy (given as AUROC) was performed to examine if any correlation existed.

Table 8 outlines the result of a Meta-analysis of the fundal image group without the Li et al's^[53] study. The study was excluded due to the very high numbers, and therefore effect it may have on the outcome of the analysis. It can be seen that the total AUROC in terms of random effects has decreased from 0.957 to 0.942, a difference of 0.015.

Table 8 Meta-analysis, AUROC and estimated HSROC of studies relating to fundal photos with the exclusion of the Li study

Study	ROC area	SE	95%CI	z	P	Weight (%)
Study	ROC area	SE	95%CI	z	P	Fixed	Random
Christopher et al 2018^[52]	0.910	0.00200	0.906 to 0.914			98.00	54.06
Others (SROC estimate)	0.979	0.0140	0.952 to 1.000			2.00	45.94
Total (random effects)	0.942	0.0242	0.894 to 0.989	38.858	<0.001	100.0	100.00

AUROC: Area under the receiver operating characteristic curve; CI: Confidence interval; HSROC: Hierarchical summary receiver operating characteristic curve; SE: Standard error.

DISCUSSION

Meta-Analysis The findings of this Meta-analysis have indicated that there is no statistically significant difference with respect to machine learning between fundal photos and OCT images in diagnosing or screening for glaucoma.

The total AUROC, in terms of random effects, for the fundal photo cohort was calculated to be 0.957 (95%CI: 0.917 to 0.997, P<0.001) and 0.923 (95%CI: 0.889 to 0.957, P<0.001) for the OCT cohort. Although there is a difference of 0.034 between the two results, the CIs of both groups overlap and there is no significant difference in diagnostic accuracy between the two cohorts (P=0.34; t-test based on pooled AUROC values and SE).

Sample Size There is a notable discrepancy between the sample sizes of the OCT group (n=1743) and the fundal images group (n=59 788). Although the number of studies is approximately on par (10 studies for OCT and 13 for fundal photos), there is over a 30-fold increase in the numbers of eyes participating in the fundal photo group in comparison to the OCT group. However, the majority (39 745) of these eyes come from a single study^[53]. If we remove this study from our Meta-analysis, as in seen in Table 8, the AUROC in terms of random effects is 0.942, leaving a difference of just 0.019 between the two groups. It is known that machine learning and deep learning are techniques that benefit from large databases for training hence the bigger the training database, the more accurate the model^[64]. This can be illustrated with our data. Figure 7 depicts a scatter plot of validation numbers versus AUROC. A linear trend line shows that for increasing validation numbers there is an associated rise in diagnostic accuracy. A similar trend can be seen in the funnel plot with regards to the OCT cohort. There are multiple study sizes with low sample size and low accuracy. Accuracy improves as the sample size gets bigger (smaller SE and hence higher accuracy).

Patrick Murtagh7

Figure 7 Graph of number used for validation versus AUROC.

One reason for the variance in numbers is the ease of acquisition of fundal photos in comparison to OCT scans. Large data sets such as the “Labelme” dataset (http://www.labelme.org/assessed on 21/04/2019) is a crowdsourcing platform for fundal images which contains thousands of retinal fundal images from diverse populations and is available online. There is no comparable online database of OCT scans. Crowdsourcing is a method of obtaining information about data from a large number of people, usually through the internet^[65].

Data Sets There is substantial overlap between the datasets used in the fundal imaging cohort. Six studies in this group used the Kasturba Medical College dataset^{[42-44,46,48-49]} and two used the Venu Eye Research Centre dataset^[45,47]. They used different machine learning models, however, three of the studies^[43-44,48] had the same number of participants, both glaucomatous and healthy, and it is possible that they used exactly the same fundal photos as it does not state how many images the dataset contains. This does not skew the findings but potentially hampers the power of these studies. It is also seen that there is a heavy Asian majority with regards number of papers in the fundal photo cohort. Ten of the thirteen papers came from Asia, nine of them from India. There is an obvious population based bias in this group and the same machine learning technique may not be comparable to glaucoma diagnosis in a different ethnic cohort^[66]. There is a greater ethnic diversity observed in the OCT study groups. These are mainly population based studies and by their design should limit the effect of selection bias.

Validation As stated in the results section above, cross validation was the most utilised method of training and testing with some form of it being used in seventeen out of the twenty-three studies. It has been stated that cross validation is a better method for testing and training than random allocation which occurred in four of the studies^[29]. This is due to the fact that when random sampling is used, there is a chance that the sampling set does not contain the disease or features associated with the disease process. Four of our studies^{[49,53,59,63]} used random sampling and although initially may appear as the better teaching process, the more robust technique of cross validation may make their models more accurate.

Machine Classifier The studies used a range of different classifiers or machine learning algorithms. The most commonly used algorithm was SVM being utilised in 10 of the studies. This is useful in classifying linear features and as such is the option of choice in classifying fundal images^[67]. Problems arise with this classifier when non-linear features are extracted such as are employed in OCT scanning techniques. Hence more convoluted classifiers may be more appropriate when interpreting these scans, e.g., CNN and ANN, and this is reflected in the data.

The algorithms used in these studies were solely constructed to aid in the diagnosis of glaucoma. This is usually a binary classification as outlined with the abundant use of the SVM classifier. However, in the clinical setting many patients can suffer from a multitude of eye pathologies. Cataract, for instance, is an extremely common finding especially in the elderly. Significant cataract can hamper the acquisition of fundal photos and OCT images. It can also increase the amount of “noise” in the attained scans making it more difficult for the algorithm to interpret it^[64]. Patients may also have features of DR and/or ARMD, pathologies which are very common in the aging population. A deep learning algorithm was previously developed^[68] and tested on a small cohort (n=60) which aimed to detect a range of retinal disease from fundal images. Accuracy dropped from 87.4% in the cohort which had DR alone to 30.5% when multiple aetiologies were included. However, the small dataset used for 10 identifiable diseases is likely to inflict bias on the results.

Glaucoma Diagnosis Glaucoma diagnosis is a multifactorial process. It is a significant proportion of the workload of general ophthalmologists. In order to make a definitive diagnosis perimetry, fundoscopy, gonioscopy and tonometry must all be undertaken. There is significant variance in the agreed diagnosis of glaucoma in the above studies. Many have the diagnostic criteria ill-defined but state that is was established by an ophthalmologist or multiple masked graders. The reason for this inconsistency is that a significant proportion of the reviewed papers were published in journals with an interest in computer methods as opposed to clinical and ophthalmological findings. Their definitions are vaguer than those published in the clinical journals. The variation in robust diagnoses can also be observed between the fundal images and OCT cohort with all but one^[58] of the OCT group having their glaucoma diagnosis defined by the Anderson Patella criteria.

Incorporation of other patient parameters into the model process, e.g., age, smoking status, intraocular pressure, visual field testing has been shown to increase diagnostic accuracy^[67], although the incorporation of perimetry is likely to prove to be a time and resource heavy inclusion parameter.

Methodological Quality The methodological quality of the studies was assessed using a modified NOS. It can be witnessed that the OCT studies group have a greater standard of quality as opposed to the fundal image group. Only 5 of the 13 (38.46%) studies in the fundal image group received a methodological score of 4 or greater on our modified scale. This could be a potential confounder with respect to the results of the analysis but due to the fact the studies with a low score amount to 3.82% (2285 of 59 788) of the total number of eyes in the fundal photos group, its effect is unlikely to be statistically significant in the pooled analysis.

Publication Bias The funnel plots, outlined in Figures 5 and 6, indicate that there is low risk of publication bias especially in OCT group. A total of three points are on the fundal image funnel plot and so therefore it is difficult to make an assumption about the bias of these studies. It can be seen that a larger sample size (and as such a smaller SE) will give a large degree of accuracy. Due to the fact that, unlike with standard diagnostic tests, diagnostic accuracy is expected to increase with sample size in machine learning studies, one would expect funnel plots in machine learning Meta-analysis to be asymmetric, with the majority of studies falling in the lower left quadrant. A large number of studies falling to the bottom-right would be suggestive of publication bias or perhaps overfitting of machine learning models.

Heterogeneity There is a large degree of heterogeneity as outlined in Table 7. The I²value for the fundal image cohort and the OCT cohort is 99.83% and 81.66% respectively. This is not surprising given the different methods, sample sizes and algorithms used in each.

Glaucoma Prevalence The prevalence of primary open angle glaucoma in the general population is approximately 2% over the age of 40 and increases with age to affect 4% of those over 80^[69]. On review of our data sets, it is seen that that the proportional of glaucomatous eyes in our cohorts ranged from 23.35%^[53] to 77.08%^[56], with an average of 53.05%±11.66% SD. This indicates that, during the training and testing process, the algorithm is more likely to observe glaucomatous eye on average thirteen times more frequently than it would in the general population. Validation on such datasets may lead to an increase in false positives either by the algorithm “expecting” to have more positive results than it has or secondary to overfitting of potential disease characteristics.

Unsupervised Machine Learning Although our studies solely examined supervised machine learning, unsupervised machine learning in the form of deep learning will generate decisions based on high dimensional interpretation and neural networks that humans cannot interpret and is likely to be the next step in evolution of computer aided diagnosis. We fundamentally do not know how they make their judgments^[70]. Areas of the image can be highlighted, but often they are not associated with the pathological process from our understanding. This can aid us to look for new aetiologies for retinal disease process. However, this is termed a “black box” as we don’t fully understand how the algorithms are coming to their conclusions.

Challenges In a recent review by Ting et al^[71], a number of potential challenges for AI implementation into clinical practice were identified. The algorithm requires a large number of pathological images to train. The sharing of images between centres is currently an ethical grey area but to ensure adequate classification by the algorithm, data must be shared between centres. This includes a range of date from diverse populations. Rare ocular disease may also prove an issue, as there may not be enough images to adequately train an algorithm to recognise these diseases.

Limitations Limitations of our studies include the high prevalence of articles in computer method journals as opposed to clinical journals especially in the fundal photo group. They were more preoccupied with how the algorithm and classifier functioned from a computer science point of view and their definitions of glaucoma were not as robust as they were in the clinical journals. Although our studies scored low on our assessment of quality, these studies had to be included due to the paucity of papers on the subject.

The use of the same database and crowdsourcing material has the potential to bias the results of the Meta-analysis. Ideally all studies would have the same definition of glaucoma, use a separate training, validation and testing set and use cross validation.

In conclusion, OCT scanning provides micrometre resolution of the RNFL and one would assume that it should be related to a more accurate screening and diagnostic tool. However, we have demonstrated that the literature to date has failed to corroborate this with respect to machine learning. The ease of access and lower cost associated with fundal photo acquisition make that the more appealing option in terms of screening on a global scale, however further studies need to be undertaken on both groups owing largely to the poor study quality associated with the fundal photography cohort.

The prospect of machine learning in the screening and diagnosing of ocular disease is a very appealing prospect. It can take pressure off ophthalmology departments and allow a greater throughput of the population to enjoy a better vision related quality of life. However, care should be taken in interpreting these findings. During an ocular assessment, ophthalmologists take in a holistic view of the patient, including past medical history and medications, and may undertake multimodal imaging e.g. angiograms and/or perimetry to come to a complete diagnosis. We know that patients greatly value their interaction with their doctor^[72] and the “human-touch” associated with it, something which could be become extinct with the advent of AI. Another cause of concern is the “black box” nature of decision making in unsupervised machine learning. With so much emphasis nowadays on evidence-based medicine, it may not yet be time to place all blind trust in machines.

ACKNOWLEDGEMENTS

Conflicts of Interest: Murtagh P, None; Greene G, None; O’Brien C, None.

REFERENCES

1 Fingert JH, Alward WL, Kwon YH, Shankar SP, Andorf JL, Mackey DA, Sheffield VC, Stone EM. No association between variations in the WDR36 gene and primary open-angle glaucoma. Arch Ophthalmol 2007;125(3):434-436. https://doi.org/10.1001/archopht.125.3.434-b PMid:17353431

2 Greco A, Rizzo MI, De Virgilio A, Gallo A, Fusconi M, de Vincentiis M. Emerging concepts in glaucoma and review of the literature. Am J Med 2016;129(9):1000.e7-1001000.e13. https://doi.org/10.1016/j.amjmed.2016.03.038 PMid:27125182

3 Tham YC, Li X, Wong TY, Quigley HA, Aung T, Cheng CY. Global prevalence of glaucoma and projections of glaucoma burden through 2040: a systematic review and meta-analysis. Ophthalmology 2014;121(11):2081-2090. https://doi.org/10.1016/j.ophtha.2014.05.013 PMid:24974815

4 Alhadeff PA, De Moraes CG, Chen M, Raza AS, Ritch R, Hood DC. The association between clinical features seen on fundus photographs and glaucomatous damage detected on visual fields and optical coherence tomography scans. J Glaucoma 2017;26(5):498-504. https://doi.org/10.1097/IJG.0000000000000640 PMid:28333890 PMCid:PMC5408322

5 Chauhan BC, Garway-Heath DF, Goñi FJ, Rossetti L, Bengtsson B, Viswanathan AC, Heijl A. Practical recommendations for measuring rates of visual field change in glaucoma. Br J Ophthalmol 2008;92(4):569-573. https://doi.org/10.1136/bjo.2007.135012 PMid:18211935 PMCid:PMC2564806

6 Varma R, Lee PP, Goldberg I, Kotak S. An assessment of the health and economic burdens of glaucoma. Am J Ophthalmol 2011;152(4): 515-522. https://doi.org/10.1016/j.ajo.2011.06.004 PMid:21961848 PMCid:PMC3206636

7 Weinreb RN, Aung T, Medeiros FA. The pathophysiology and treatment of glaucoma: a review. JAMA 2014;311(18):1901-1911. https://doi.org/10.1001/jama.2014.3192 PMid:24825645 PMCid:PMC4523637

8 Lim TC, Chattopadhyay S, Acharya UR. A survey and comparative study on the instruments for glaucoma detection. Med Eng Phys 2012;34(2):129-139. https://doi.org/10.1016/j.medengphy.2011.07.030 PMid:21862378

9 Tuck MW, Crick RP. The age distribution of primary open angle glaucoma. Ophthalmic Epidemiol 1998;5(4):173-183. https://doi.org/10.1076/opep.5.4.173.4192 PMid:9894803

10 Dodds MK, Codd MB, Looney A, Mulhall KJ. Incidence of hip fracture in the Republic of Ireland and future projections: a population-based study. Osteoporos Int 2009;20(12):2105-2110. https://doi.org/10.1007/s00198-009-0922-1 PMid:19337676

11 Mataki N, Tomidokoro A, Araie M, Iwase A. Beta-peripapillary atrophy of the optic disc and its determinants in Japanese eyes: a population-based study. Acta Ophthalmol 2018;96(6):e701-e706. https://doi.org/10.1111/aos.13702 PMid:29575565

12 Liu W, Gong L, Li Y, Zhu X, Stewart JM, Wang C. Peripapillary atrophy in high myopia. Curr Eye Res 2017;42(9):1308-1312. https://doi.org/10.1080/02713683.2017.1307992 PMid:28557535

13 Marcus MW, de Vries MM, Junoy Montolio FG, Jansonius NM. Myopia as a risk factor for open-angle glaucoma: a systematic review and meta-analysis. Ophthalmology 2011;118(10):1989-1994.e2. https://doi.org/10.1016/j.ophtha.2011.03.012 PMid:21684603

14 Anderson DR, Chauhan B, Johnson C, Katz J, Patella VM, Drance SM. Criteria for progression of glaucoma in clinical management and in outcome studies. Am J Ophthalmol 2000;130(6):827-829. https://doi.org/10.1016/S0002-9394(00)00665-6

15 Huang D, Swanson EA, Lin CP, et al. Optical coherence tomography. Science 1991;254(5035):1178-1181. https://doi.org/10.1126/science.1957169 PMid:1957169

16 Hagiwara Y, Koh JEW, Tan JH, Bhandary SV, Laude A, Ciaccio EJ, Tong L, Acharya UR. Computer-aided diagnosis of glaucoma using fundus images: a review. Comput Methods Programs Biomed 2018;165:1-12. https://doi.org/10.1016/j.cmpb.2018.07.012 PMid:30337064

17 Grewal DS, Tanna AP. Diagnosis of glaucoma and detection of glaucoma progression using spectral domain optical coherence tomography. Curr Opin Ophthalmol 2013;24(2):150-161. https://doi.org/10.1097/ICU.0b013e32835d9e27 PMid:23328662

18 Schuman JS, Hee MR, Puliafito CA, Wong C, Pedut-Kloizman T, Lin CP, Hertzmark E, Izatt JA, Swanson EA, Fujimoto JG. Quantification of nerve fiber layer thickness in normal and glaucomatous eyes using optical coherence tomography. Arch Ophthalmol 1995;113(5):586-596. https://doi.org/10.1001/archopht.1995.01100050054031 PMid:7748128

19 Myers JS, Fudemberg SJ, Lee D. Evolution of optic nerve photography for glaucoma screening: a review. Clin Exp Ophthalmol 2018;46(2):169-176. https://doi.org/10.1111/ceo.13138 PMid:29280542

20 Bussel II, Wollstein G, Schuman JS. OCT for glaucoma diagnosis, screening and detection of glaucoma progression. Br J Ophthalmol 2014;98 Suppl 2:ii15-19. https://doi.org/10.1136/bjophthalmol-2013-304326 PMid:24357497 PMCid:PMC4208340

21 Katherine M, Michael W. Direct ophthalmoscopy… soon to be forgotten? Ulster Med J 2019;88(2):115-117.

22 Agurto C, Barriga ES, Murray V, Nemeth S, Crammer R, Bauman W, Zamora G, Pattichis MS, Soliz P. Automatic detection of diabetic retinopathy and age-related macular degeneration in digital fundus images. Invest Ophthalmol Vis Sci 2011;52(8):5862-5871. https://doi.org/10.1167/iovs.10-7075 PMid:21666234 PMCid:PMC3176039

23 LeCun Y, Bengio Y, Hinton G. Deep learning. Nature 2015;521(7553): 436-444. https://doi.org/10.1038/nature14539 PMid:26017442

24 Rau CS, Wu SC, Chuang JF, Huang CY, Liu HT, Chien PC, Hsieh CH. Machine learning models of survival prediction in trauma patients. J Clin Med 2019;8(6):E799. https://doi.org/10.3390/jcm8060799 PMid:31195670 PMCid:PMC6616432

25 Rigatti SJ. Random Forest. J Insur Med 2017;47(1):31-39. https://doi.org/10.17849/insm-47-01-31-39.1 PMid:28836909

26 Ghaneei M, Ekyalimpa R, Westover L, Parent EC, Adeeb S. Customized k-nearest neighbourhood analysis in the management of adolescent idiopathic scoliosis using 3D markerless asymmetry analysis. Comput Methods Biomech Biomed Engin 2019;22(7):696-705. https://doi.org/10.1080/10255842.2019.1584795 PMid:30849240

27 Subramanian J, Simon R. Overfitting in prediction models-is it a problem only in high dimensions? Contemp Clin Trials 2013;36(2): 636-641. https://doi.org/10.1016/j.cct.2013.06.011 PMid:23811117

28 Zhang YC, Kagen AC. Machine learning interface for medical image analysis. J Digit Imaging 2017;30(5):615-621. https://doi.org/10.1007/s10278-016-9910-0 PMid:27730415 PMCid:PMC5603426

29 Pérez-Guaita D, Kuligowski J, Lendl B, Wood BR, Quintás G. Assessment of discriminant models in infrared imaging using constrained repeated random sampling-cross validation. Anal Chim Acta 2018;1033:156-164. https://doi.org/10.1016/j.aca.2018.05.019 PMid:30172321

30 Lu W, Tong Y, Yu Y, Xing Y, Chen C, Shen Y. Applications of artificial intelligence in ophthalmology: general overview. J Ophthalmol 2018;2018:5278196. https://doi.org/10.1155/2018/5278196 PMid:30581604 PMCid:PMC6276430

31 De Fauw J, Ledsam JR, Romera-Paredes B, et al. Clinically applicable deep learning for diagnosis and referral in retinal disease. Nat Med 2018;24(9):1342-1350. https://doi.org/10.1038/s41591-018-0107-6 PMid:30104768

32 Ulas F, Dogan Ü, Kaymaz A, Çelik F, Çelebi S. Evaluation of subjects with a moderate cup to disc ratio using optical coherence tomography and Heidelberg retina tomograph 3: impact of the disc area. Indian J Ophthalmol 2015;63(1):3-8. https://doi.org/10.4103/0301-4738.151454 PMid:25686054 PMCid:PMC4363953

33 Sharma P, Sample PA, Zangwill LM, Schuman JS. Diagnostic tools for glaucoma detection and management. Surv Ophthalmol 2008;53 Suppl1:S17-32. https://doi.org/10.1016/j.survophthal.2008.08.003 PMid:19038620 PMCid:PMC2643302

34 Park SH, Goo JM, Jo CH. Receiver operating characteristic (ROC) curve: practical review for radiologists. Korean J Radiol 2004;5(1):11-18. https://doi.org/10.3348/kjr.2004.5.1.11 PMid:15064554 PMCid:PMC2698108

35 Hajian-Tilaki K. Receiver operating characteristic (ROC) curve analysis for medical diagnostic test evaluation. Caspian J Intern Med 2013;4(2):627-635.

36 Altman DG, Bland JM. Diagnostic tests. 1: Sensitivity and specificity. BMJ 1994;308(6943):1552. https://doi.org/10.1136/bmj.308.6943.1552 PMid:8019315 PMCid:PMC2540489

37 Stang A. Critical evaluation of the Newcastle-Ottawa scale for the assessment of the quality of nonrandomized studies in Meta-analyses. Eur J Epidemiol 2010;25(9):603-605. https://doi.org/10.1007/s10654-010-9491-z PMid:20652370

38 Zhou XH, McClish DK, Obuchowski NA. Statistical methods in diagnostic medicine. John Wiley & Sons; 2009.

39 Takwoingi Y, Guo B, Riley RD, Deeks JJ. Performance of methods for meta-analysis of diagnostic test accuracy with few studies or sparse data. Stat Methods Med Res 2017;26(4):1896-1911. https://doi.org/10.1177/0962280215592269 PMid:26116616 PMCid:PMC5564999

40 Harbord RM, Whiting P. Metandi: Meta-analysis of diagnostic accuracy using hierarchical logistic regression. Stata Jornal 2009;9(2): 211-229. https://doi.org/10.1177/1536867X0900900203

41 Chakrabarty L, Joshi GD, Chakravarty A, Raman GV, Krishnadas SR, Sivaswamy J. Automated detection of glaucoma from topographic features of the optic nerve head in color fundus photographs. J Glaucoma 2016;25(7):590-597. https://doi.org/10.1097/IJG.0000000000000354 PMid:26580479

42 Nayak J, Acharya UR, Bhat PS, Shetty N, Lim TC. Automated diagnosis of glaucoma using digital fundus images. J Med Syst 2009;33(5):337-346. https://doi.org/10.1007/s10916-008-9195-z PMid:19827259

43 Acharya UR, Dua S, Du X, Sree SV, Chua CK. Automated diagnosis of glaucoma using texture and higher order spectra features. IEEE Trans Inf Technol Biomed 2011;15(3):449-455. https://doi.org/10.1109/TITB.2011.2119322 PMid:21349793

44 Mookiah MRK, Acharya UR, Lim C, Petznick A, Suri J. Data mining technique for automated diagnosis of glaucoma using higher order spectra and wavelet energy features. Knowl Based Syst 2012;33:73-82. https://doi.org/10.1016/j.knosys.2012.02.010

45 Issac A, Partha Sarathi M, Dutta MK. An adaptive threshold based image processing technique for improved glaucoma detection and classification. Comput Methods Programs Biomed 2015;122(2):229-244. https://doi.org/10.1016/j.cmpb.2015.08.002 PMid:26321351

46 Maheshwari S, Pachori RB, Acharya UR. Automated diagnosis of glaucoma using empirical wavelet transform and correntropy features extracted from fundus images. IEEE J Biomed Health Inform 2017;21(3):803-813. https://doi.org/10.1109/JBHI.2016.2544961 PMid:28113877

47 Singh A, Dutta MK, ParthaSarathi M, Uher V, Burget R. Image processing based automatic diagnosis of glaucoma using wavelet features of segmented optic disc from fundus image. Comput Methods Programs Biomed 2016;124:108-120. https://doi.org/10.1016/j.cmpb.2015.10.010 PMid:26574297

48 Maheshwari S, Pachori RB, Kanhangad V, Bhandary SV, Acharya UR. Iterative variational mode decomposition based automated detection of glaucoma using fundus images. Comput Biol Med 2017;88:142-149. https://doi.org/10.1016/j.compbiomed.2017.06.017 PMid:28728059

49 Raghavendra U, Fujita H, Bhandary SV, Gudigar A, Tan JH, Acharya UR. Deep convolution neural network for accurate diagnosis of glaucoma using digital fundus images. Inf Sci 2018;441:41-49. https://doi.org/10.1016/j.ins.2018.01.051

50 An G, Omodaka K, Hashimoto K, Tsuda S, Shiga Y, Takada N, Kikawa T, Yokota H, Akiba M, Nakazawa T. Glaucoma diagnosis with machine learning based on optical coherence tomography and color fundus images. J Healthc Eng 2019;2019:4061313. https://doi.org/10.1155/2019/4061313 PMid:30911364 PMCid:PMC6397963

51 Bock R, Meier J, Nyúl LG, Hornegger J, Michelson G. Glaucoma risk index: automated glaucoma detection from color fundus images. Med Image Anal 2010;14(3):471-481. https://doi.org/10.1016/j.media.2009.12.006 PMid:20117959

52 Christopher M, Belghith A, Bowd C, Proudfoot JA, Goldbaum MH, Weinreb RN, Girkin CA, Liebmann JM, Zangwill LM. Performance of deep learning architectures and transfer learning for detecting glaucomatous optic neuropathy in fundus photographs. Sci Rep 2018;8(1):16685. https://doi.org/10.1038/s41598-018-35044-9 PMid:30420630 PMCid:PMC6232132

53 Li Z, He Y, Keel S, Meng W, Chang RT, He M. Efficacy of a deep learning system for detecting glaucomatous optic neuropathy based on color fundus photographs. Ophthalmology 2018;125(8):1199-1206. https://doi.org/10.1016/j.ophtha.2018.01.023 PMid:29506863

54 Burgansky-Eliash Z, Wollstein G, Chu TJ, Ramsey JD, Glymour C, Noecker RJ, Ishikawa H, Schuman JS. Optical coherence tomography machine learning classifiers for glaucoma detection: a preliminary study. Invest Ophthalmol Vis Sci 2005;46(11):4147-4152. https://doi.org/10.1167/iovs.05-0366 PMid:16249492 PMCid:PMC1941765

55 Bowd C, Hao J, Tavares IM, Medeiros FA, Zangwill LM, Lee TW, Sample PA, Weinreb RN, Goldbaum MH. Bayesian machine learning classifiers for combining structural and functional measurements to classify healthy and glaucomatous eyes. Invest Ophthalmol Vis Sci 2008;49(3):945-953. https://doi.org/10.1167/iovs.07-1083 PMid:18326717

56 Xu J, Ishikawa H, Wollstein G, Bilonick RA, Folio LS, Nadler Z, Kagemann L, Schuman JS. Three-dimensional spectral-domain optical coherence tomography data analysis for glaucoma detection. PLoS One 2013;8(2):e55476. https://doi.org/10.1371/journal.pone.0055476 PMid:23408988 PMCid:PMC3569462

57 Muhammad H, Fuchs TJ, De Cuir N, De Moraes CG, Blumberg DM, Liebmann JM, Ritch R, Hood DC. Hybrid deep learning on single wide-field optical coherence tomography scans accurately classifies glaucoma suspects. J Glaucoma 2017;26(12):1086-1094. https://doi.org/10.1097/IJG.0000000000000765 PMid:29045329 PMCid:PMC5716847

58 Christopher M, Belghith A, Weinreb RN, Bowd C, Goldbaum MH, Saunders LJ, Medeiros FA, Zangwill LM. Retinal nerve fiber layer features identified by unsupervised machine learning on optical coherence tomography scans predict glaucoma progression. Invest Ophthalmol Vis Sci 2018;59(7):2748-2756. https://doi.org/10.1167/iovs.17-23387 PMid:29860461 PMCid:PMC5983908

59 Asaoka R, Murata H, Hirasawa K, Fujino Y, Matsuura M, Miki A, Kanamoto T, Ikeda Y, Mori K, Iwase A, Shoji N, Inoue K, Yamagami J, Araie M. Using deep learning and transfer learning to accurately diagnose early-onset glaucoma from macular optical coherence tomography images. Am J Ophthalmol 2019;198:136-145. https://doi.org/10.1016/j.ajo.2018.10.007 PMid:30316669

60 Barella KA, Costa VP, Gonçalves Vidotti V, Silva FR, Dias M, Gomi ES. Glaucoma diagnostic accuracy of machine learning classifiers using retinal nerve fiber layer and optic nerve data from SD-OCT. J Ophthalmol 2013;2013:789129. https://doi.org/10.1155/2013/789129 PMid:24369495 PMCid:PMC3863536

61 Silva FR, Vidotti VG, Cremasco F, Dias M, Gomi ES, Costa VP. Sensitivity and specificity of machine learning classifiers for glaucoma diagnosis using spectral domain OCT and standard automated perimetry. Arq Bras Oftalmol 2013;76(3):170-174. https://doi.org/10.1590/S0004-27492013000300008 PMid:23929078

62 Bizios D, Heijl A, Hougaard JL, Bengtsson B. Machine learning classifiers for glaucoma diagnosis based on classification of retinal nerve fibre layer thickness parameters measured by Stratus OCT. Acta Ophthalmol 2010;88(1):44-52. https://doi.org/10.1111/j.1755-3768.2009.01784.x PMid:20064122

63 Ahn JM, Kim S, Ahn KS, Cho SH, Lee KB, Kim US. A deep learning model for the detection of both advanced and early glaucoma using fundus photography. PLoS One 2018;13(11):e0207982. https://doi.org/10.1371/journal.pone.0207982 PMid:30481205 PMCid:PMC6258525

64 Lee JG, Jun S, Cho YW, Lee H, Kim GB, Seo JB, Kim N. Deep learning in medical imaging: general overview. Korean J Radiol 2017;18(4):570-584. https://doi.org/10.3348/kjr.2017.18.4.570 PMid:28670152 PMCid:PMC5447633

65 Wazny K. Applications of crowdsourcing in health: an overview. J Glob Health 2018;8(1):010502. https://doi.org/10.7189/jogh.08.010502 PMid:29564087 PMCid:PMC5840433

66 Kosoko-Lasaki O, Gong G, Haynatzki G, Wilson MR. Race, ethnicity and prevalence of primary open-angle glaucoma. J Natl Med Assoc 2006;98(10):1626-1629.

67 Kim SJ, Cho KJ, Oh S. Development of machine learning models for diagnosis of glaucoma. PLoS One 2017;12(5):e0177726. https://doi.org/10.1371/journal.pone.0177726 PMid:28542342 PMCid:PMC5441603

68 Choi JY, Yoo TK, Seo JG, Kwak J, Um TT, Rim TH. Multi-categorical deep learning neural network to classify retinal images: a pilot study employing small database. PLoS One 2017;12(11):e0187336. https://doi.org/10.1371/journal.pone.0187336 PMid:29095872 PMCid:PMC5667846

69 Quigley HA, Broman AT. The number of people with glaucoma worldwide in 2010 and 2020. Br J Ophthalmol 2006;90(3):262-267. https://doi.org/10.1136/bjo.2005.081224 PMid:16488940 PMCid:PMC1856963

70 Quellec G, Charrière K, Boudi Y, Cochener B, Lamard M. Deep image mining for diabetic retinopathy screening. Med Image Anal 2017;39:178-193. https://doi.org/10.1016/j.media.2017.04.012 PMid:28511066

71 Ting DSW, Peng L, Varadarajan AV, Keane PA, Burlina PM, Chiang MF, Schmetterer L, Pasquale LR, Bressler NM, Webster DR, Abramoff M, Wong TY. Deep learning in ophthalmology: the technical and clinical considerations. Prog Retin Eye Res 2019;72:100759. https://doi.org/10.1016/j.preteyeres.2019.04.003 PMid:31048019

72 Pilnick A, Dingwall R. On the remarkable persistence of asymmetry in doctor/patient interaction: a critical review. Soc Sci Med 2011;72(8): 1374-1382. https://doi.org/10.1016/j.socscimed.2011.02.033 PMid:21454003