TY - JOUR
T1 - Accuracy of Deep Learning Algorithms for the Diagnosis of Retinopathy of Prematurity by Fundus Images
T2 - A Systematic Review and Meta-Analysis
AU - Zhang, Jingjing
AU - Liu, Yangyang
AU - Mitsuhashi, Toshiharu
AU - Matsuo, Toshihiko
N1 - Publisher Copyright:
© 2021 Jingjing Zhang et al.
PY - 2021
Y1 - 2021
N2 - Background. Retinopathy of prematurity (ROP) occurs in preterm infants and may contribute to blindness. Deep learning (DL) models have been used for ophthalmologic diagnoses. We performed a systematic review and meta-Analysis of published evidence to summarize and evaluate the diagnostic accuracy of DL algorithms for ROP by fundus images. Methods. We searched PubMed, EMBASE, Web of Science, and Institute of Electrical and Electronics Engineers Xplore Digital Library on June 13, 2021, for studies using a DL algorithm to distinguish individuals with ROP of different grades, which provided accuracy measurements. The pooled sensitivity and specificity values and the area under the curve (AUC) of summary receiver operating characteristics curves (SROC) summarized overall test performance. The performances in validation and test datasets were assessed together and separately. Subgroup analyses were conducted between the definition and grades of ROP. Threshold and nonthreshold effects were tested to assess biases and evaluate accuracy factors associated with DL models. Results. Nine studies with fifteen classifiers were included in our meta-Analysis. A total of 521,586 objects were applied to DL models. For combined validation and test datasets in each study, the pooled sensitivity and specificity were 0.953 (95% confidence intervals (CI): 0.946-0.959) and 0.975 (0.973-0.977), respectively, and the AUC was 0.984 (0.978-0.989). For the validation dataset and test dataset, the AUC was 0.977 (0.968-0.986) and 0.987 (0.982-0.992), respectively. In the subgroup analysis of ROP vs. normal and differentiation of two ROP grades, the AUC was 0.990 (0.944-0.994) and 0.982 (0.964-0.999), respectively. Conclusions. Our study shows that DL models can play an essential role in detecting and grading ROP with high sensitivity, specificity, and repeatability. The application of a DL-based automated system may improve ROP screening and diagnosis in the future.
AB - Background. Retinopathy of prematurity (ROP) occurs in preterm infants and may contribute to blindness. Deep learning (DL) models have been used for ophthalmologic diagnoses. We performed a systematic review and meta-Analysis of published evidence to summarize and evaluate the diagnostic accuracy of DL algorithms for ROP by fundus images. Methods. We searched PubMed, EMBASE, Web of Science, and Institute of Electrical and Electronics Engineers Xplore Digital Library on June 13, 2021, for studies using a DL algorithm to distinguish individuals with ROP of different grades, which provided accuracy measurements. The pooled sensitivity and specificity values and the area under the curve (AUC) of summary receiver operating characteristics curves (SROC) summarized overall test performance. The performances in validation and test datasets were assessed together and separately. Subgroup analyses were conducted between the definition and grades of ROP. Threshold and nonthreshold effects were tested to assess biases and evaluate accuracy factors associated with DL models. Results. Nine studies with fifteen classifiers were included in our meta-Analysis. A total of 521,586 objects were applied to DL models. For combined validation and test datasets in each study, the pooled sensitivity and specificity were 0.953 (95% confidence intervals (CI): 0.946-0.959) and 0.975 (0.973-0.977), respectively, and the AUC was 0.984 (0.978-0.989). For the validation dataset and test dataset, the AUC was 0.977 (0.968-0.986) and 0.987 (0.982-0.992), respectively. In the subgroup analysis of ROP vs. normal and differentiation of two ROP grades, the AUC was 0.990 (0.944-0.994) and 0.982 (0.964-0.999), respectively. Conclusions. Our study shows that DL models can play an essential role in detecting and grading ROP with high sensitivity, specificity, and repeatability. The application of a DL-based automated system may improve ROP screening and diagnosis in the future.
UR - http://www.scopus.com/inward/record.url?scp=85113180393&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85113180393&partnerID=8YFLogxK
U2 - 10.1155/2021/8883946
DO - 10.1155/2021/8883946
M3 - Review article
AN - SCOPUS:85113180393
SN - 2090-004X
VL - 2021
JO - Journal of Ophthalmology
JF - Journal of Ophthalmology
M1 - 8883946
ER -