Open Access

Multi-class classification of pathological myopia based on fundus photography

Jiaqing Zhao

https://orcid.org/0009-0000-2946-7518

Shanghai Institute of Technology, No. 100 Haiquan Road, Fengxian District, Shanghai, P. R. China

E-mail Address: jqzhao9712@163.com

Corresponding author.

Search for more papers by this author

Guogang Cao

https://orcid.org/0000-0001-5991-1210

Shanghai Institute of Technology, No. 100 Haiquan Road, Fengxian District, Shanghai, P. R. China

Search for more papers by this author

Jiangnan He

https://orcid.org/0000-0002-4192-546X

Shanghai Eye Diseases Prevention & Treatment Center, Shanghai Eye Hospital, School of Medicine, Tongji University, Shanghai 200331, P. R. China

E-mail Address: hejiangnan85@126.com

Search for more papers by this author

, and

Cuixia Dai

Shanghai Institute of Technology, No. 100 Haiquan Road, Fengxian District, Shanghai, P. R. China

E-mail Address: sdadai7412@163.com

Search for more papers by this author

https://doi.org/10.1142/S1793545824500160Cited by:0 (Source: Crossref)

Abstract

Pathological myopia (PM) is a severe ocular disease leading to blindness. As a traditional noninvasive diagnostic method, fundus color photography (FCP) is widely used in detecting PM due to its high fidelity and precision. However, manual examination of fundus photographs for PM is time-consuming and prone to high error rates. Existing automated detection technologies have yet to study the detailed classification in diagnosing different stages of PM lesions. In this paper, we proposed an intelligent system which utilized Resnet101 technology to multi-categorically diagnose PM by classifying FCPs with different stages of lesions. The system subdivided different stages of PM into eight subcategories, aiming to enhance the precision and efficiency of the diagnostic process. It achieved an average accuracy rate of 98.86% in detection of PM, with an area under the curve (AUC) of 98.96%. For the eight subcategories of PM, the detection accuracy reached 99.63%, with an AUC of 99.98%. Compared with other widely used multi-class models such as VGG16, Vision Transformer (VIT), EfficientNet, this system demonstrates higher accuracy and AUC. This artificial intelligence system is designed to be easily integrated into existing clinical diagnostic tools, providing an efficient solution for large-scale PM screening.

Keywords:

1. Introduction

Pathological myopia (PM)¹ is a vision condition characterized by extreme nearsightedness. It poses a significant and growing concern for young people worldwide.² In recent years, its prevalence among adolescents and young adults has been increasing.³ PM is defined as having a refractive error of −6.00 diopters or more, which significantly increases the risk of developing various ocular complications, such as retinal detachment, glaucoma, and macular degeneration.⁴ These conditions can potentially lead to vision impairment or blindness, impacting the quality of life for young individuals. Preventive measures, early intervention, and education about eye health are essential to mitigate the potential long-term impact on the vision and overall well-being of youth worldwide.

Fundus color photography (FCP)⁵ is a medical imaging technique used to capture color images of the posterior segment of the eye, including the retina, optic nerve, choroid, and other critical structures. This technology plays a crucial role in ophthalmic diagnosis and the monitoring of eye conditions. FCP is also a non-invasive examination technique that is typically safe and painless for patients. It allows ophthalmologists to gain a better understanding of ocular conditions, facilitating early intervention and treatment to preserve vision and overall eye health. The manifestations of PM in FCPs vary with the progression of the condition. The traditional diagnostic approach is the manual diagnosis by physicians, which is inefficient and consumes medical resources. With development of artificial intelligence (AI), deep learning is increasingly maturing which can be used to automatically diagnose and grade the severity of fundus diseases.^6,7,8

In recent years, deep learning techniques have been deeply investigated for automated detection of PM, such as EfficientNet, Resnet, VGG, Vision Transformer. Among these, Du et al.⁹ employed the EfficientNet network model to train on 7020 fundus images, achieving an accuracy of 87.53%. Tang et al.¹⁰ combined ResNet-50 with DeepLabv $3 +$ $3 +$ to classify Myopic Maculopathy, and achieving an accuracy of 90.55%. Wu et al. $^{11}$ $^{11}$ employed ResNet-34 and the spatial pyramid pooling (ASPP) module to classify PM using a dataset combining fundus photographs with OCT images, achieving an accuracy of 91.67%. In order to further increase the diagnostic accuracy, large FCP dataset was adopted to address problem of data scarcity. Among these studies, Lu et al.¹² used a Faster R-CNN based on 32,010 fundus images of PM, achieving an AUC and accuracy of 97.36%. Li et al.¹³ utilized a dual-stream Deep Convolutional Neural Network (DCNN-DS) model, obtaining a sensitivity of 93.3% and a specificity of 99.6% for detecting PM.

Besides the basic diagnosis of PM, efforts are dedicated in categorizing PM into different stages of the disease. Choi et al.¹⁴ utilized classical models as backbone architectures and employed both single-column and multi-column convolutional neural network (CNN) models to screen PM in OCT images, dividing into three general categories. Wang et al.¹⁵ used EfficientNet to classify PM into four different stages of lesions, achieving an accuracy of 95%. However, in these studies, only simple rough PM classification was performed. Clinically, diagnosing PM is complex due to the diversity of disease stages and the frequent co-occurrence of other retinal pathologies. Therefore, to accurately diagnose different stages of the disease, a more detailed classification is necessary.

In this study, multi-class classification of PM was performed. In order to be consistent with the clinical practice which demands both diagnosis and classification, two nonoverlapping datasets were constructed, one for the preliminary diagnosis of PM and another for the detailed classification of PM. The preliminary diagnosis can differentiate between different stages of PM lesions, while the detailed classification identifies the specific types of PM lesions, providing more comprehensive diagnostic support for doctors. In the multi-class diagnosis task, Resnet101¹⁶ was used which was proved to provide satisfactory results in medical classification.^17,18 The residual connection is introduced in Resnet101, which solves the problem of gradient disappearance and enables the network to support more than 100 layers without causing performance degradation. The structural optimization of the model ensures finer feature extraction capabilities, making it particularly suitable for complex image processing tasks. As results shown, the preliminary classification achieved an accuracy of 98.96% in the testing phase, and the detailed classification achieved an even higher accuracy of 99.63%.

2. Materials and Methods

2.1. Dataset

In this study, the dataset was sourced from the Shanghai Eye Disease Prevention and Treatment Center (SEDPTC), China. Physicians employed the TOPCON DRI Triton device to obtain 45^∘ FCPs centered on the macula from participants. To ensure high-quality image data, all images where the macular region was not fully visible or was obscured by more than 50% were excluded. A total of 7114 images with clear visibility of the macula was selected for the development of the model. The dataset was meticulously classified and annotated by three independent retinal specialists whose diagnostic evaluations were carefully documented. These evaluations were utilized for a comparative analysis against the performance of the artificial intelligence model. In instances of diagnostic disagreement, a consensus was reached through discussion with an additional senior expert, thereby confirming the final diagnoses. This rigorous approach ensures a robust dataset for advancing the accuracy of AI-driven diagnostic tools in ophthalmology. In this study, the use of retinal fundus images was approved by the Ethics Committee of Shanghai General Hospital, Shanghai Jiao Tong University School of Medicine, and adhered to the tenets of the Declaration of Helsinki (Approval ID: No. 2015KY156). Written informed consent forms were obtained from all participants.

To effectively distinguish PM from other retinal diseases, Dataset 1 was developed. This dataset includes a coarse classification of PM and other ocular diseases, enabling effective diagnosis and preliminary categorization of PM among a wide array of retinal conditions. This approach not only enhances diagnostic accuracy but also streamlines the diagnostic process. To enable detailed classification of PM, we constructed Dataset 2, which includes four types of early-stage PM and four types of late-stage PM. Early-stage PM primarily encompasses four stages of diffuse choroidal thinning, while late-stage PM includes four stages of patchy atrophy. This classification aids in the precise diagnosis and study of the progression of PM. It is important to note that the images in Dataset 2 are entirely distinct from those in Dataset 1, ensuring the independence and diversity of the data across both datasets.

2.1.1. Dataset 1

Dataset 1 contains 6365 FCPs of PM at different stages of pathological changes. The dataset was classified using the META-PM (Meta Analyses of Pathologic Myopia) classification rules and made minor modifications to individual images based on the risk of disease progression and its impact on vision. In this study, severe lattice basement membrane was defined as equal to or more severe than grade 2. The dataset was divided into five groups: A0: normal or mild lattice basement membrane, A1: severe lattice basement membrane, A2: early PM, A3: late PM, Other: other types of diseases (shown in Fig. 1). Details are given in Table 1.

Fig. 1. Pathologic myopic fundus color photographs. (a) Normal or mildly tessellated fundus, (b) severely tessellated fundus, (c) early pathologic myopia, (d) advanced pathologic myopia, and (e) other diseases.

**Table 1. Detailed classification of myopic maculopathy.**
Myopic maculopathy	Definition
Normal fundus or mild tessellated fundus	No macular lesions or blurred choroidal vessels outside the macular region (Fig. 1(a))
Severe tessellated fundus	Equal to or more serious than Grade 2 proposed (Fig.1(b))
Early-stage PM	Diffuse chorioretinal atrophy or lacquer cracks alone (Fig. 1(c))
Advanced-stage PM	Patchy chorioretinal atrophy, macular atrophy, myopic CNV, or Fuchs’ spot (Fig. 1(d))
Other	Other categories of fundus diseases (Fig. 1(e))

2.1.2. Dataset 2

Dataset 2 includes 581 fundus photographs depicting early PM with diffuse chorioretinal myopic degeneration, and 168 photographs showing patchy atrophy associated with late PM.

Diffuse chorioretinal myopic degeneration is an ophthalmic condition closely related to PM, characterized by extensive abnormal changes in the choroid and retina at the back of the eye. In cases of myopia, excessive elongation of the ocular axis leads to abnormal stretching and deformation of the intraocular retina and choroid, which may evolve into diffuse chorioretinal degeneration. This condition results in a thinner, distorted, and constricted ocular fundus structure, potentially causing various complications such as changes in the retinal pigment epithelium, retinal tears, or detachments. Consequently, patients may experience progressively worsening vision, visual field defects, flashes of light within the eye, floaters, or the presence of blurred dark shadows. These 581 fundus photographs showing diffuse chorioretinal myopic degeneration were classified into four different stages of lesion development, as shown in Figs. 2(a)–2(d).

Fig. 2. Dataset 2: (a)–(d) Diffuse choroidal retinal images, (e)–(h) Patchy atrophy fundus color photographs.

Patchy atrophy represents a more severe retinal condition than diffuse chorioretinal myopic degeneration. In cases of PM, the excessive elongation of the ocular axis results in persistent traction and deformation of the macular region, leading to cellular degeneration and dystrophy within the macula, culminating in areas of patchy atrophy. Patients with this condition not only suffer from severe refractive errors but also from macular degeneration, further increasing the risk of visual impairment. The presence of patchy atrophy can significantly exacerbate visual disabilities in patients with PM, causing symptoms such as blurred central vision, visual distortion, and reduced contrast, which, in severe cases, may lead to complete blindness. To facilitate in-depth analysis, 168 fundus images of PM showing patchy atrophy were divided into four categories, as illustrated in Figs. 2(e)–2(h). These categories include:

(1) Patchy atrophy with a single lesion located in the periphery of the macula, not involving the macula, and characterized by defects in the peripheral Bruch’s membrane plane (Fig. 2(e)). (2) Patchy atrophy with a single lesion and associated macular atrophy, not involving the macula (Fig. 2(f)). (3) Patchy atrophy with multiple lesions located in the periphery of the macula, not involving the macula, characterized by defects in the peripheral Bruch’s membrane plane (Fig. 2(g)). (4) Patchy atrophy with multiple lesions and associated macular atrophy, not involving the macula (Fig. 2(h)).

2.1.3. Image preprocess

To enhance the accuracy of the proposed classification model, a series of image preprocessing steps were performed on the original fundus photographs. These steps were categorized into several modules: removing unnecessary black background, data augmentation, data normalization, and standardization. In the process of removing black backgrounds, the objective was to eliminate black edges that could interfere with key feature detection, retaining only the effective areas of the image. Specifically, a mask of the colored regions, which contains the color information and the corresponding coordinates, was extracted from the RGB images by setting the pixel values of the background to zero. Subsequently, by using the OpenCV toolkit, pixel information was traversed to determine the bounding box of the image. Based on this bounding box, the RGB image was cropped to include only essential visual information.

During the data augmentation phase, various techniques were employed to expand the dataset, such as horizontal and vertical flipping, contrast enhancement, and grayscale conversion. In the subsequent normalization step, the pixel values of the augmented images were scaled to a range of 0 to 1, preparing for model training. Finally, prior to training the deep learning model, the z-score method was applied normalization to standardize the image data, ensuring that the model could learn and generalize more effectively.

Data augmentation techniques were employed to enhance Dataset, with the distribution of data for each disease category shown in Fig. 3.

2.2. Methods

2.2.1. Model

In this study, ResNet-101 was utilized for multi-classification of fundus images. Since there is the risk of overfitting in performance using Dataset 1 which contains complex data and redundant features from various stages of PM lesions, Gaussian noise was introduced into the parameters of the final layer to prevent overfitting. Gaussian noise is proved to enhance the performance of deep architectures^19,20,21 and introducing Gaussian noise into model parameters can improve the model’s generalization ability on new data, which is used to prevent the model from overly relying on specific features in the training data. In this study, the modification of Gaussian noise adding in the ResNet101 effectively prevents overfitting, thereby enhancing the overall performance of the model.^21,22 Random Gaussian noise with a standard deviation of 0.3 was added using “torch.randn_like(x)”. This function generates a noise tensor with the same shape as the input tensor x, following a standard normal distribution. The noise tensor is then multiplied by the standard deviation and added to the input tensor x, as shown in Fig. 4 which displays the architecture of the model. Given the small size of Dataset 2, Gaussian noise was not added to the parameters of the model.

As shown in Fig. 4, the structure of ResNet-101 includes a convolutional layer, a max pooling layer, four residual blocks, Gaussian noise, and a fully connected layer (FC). This configuration effectively supports deep network learning, enhancing the model’s processing capability and accuracy. This model can accurately identify and classify various lesions in fundus images, greatly supporting clinical diagnostics.

2.2.2. Experimental environment

Experiments were performed on a Dell Precision 5820 workstation containing an Nvidia 24 G memory GeForce RTX 3090 and an Intel (R) Xeon (R)W-2245 CPU, running on a Win10 system with Python 3.9. Pytorch is used as the deep learning framework.

2.2.3. Evaluation standard

To assess the performance of the model, metrics were employed, such as accuracy, confusion matrix, Receiver Operating Characteristic (ROC) curve, and Area Under the Curve (AUC) for model evaluation. Accuracy is the simplest and most intuitive evaluation metric. It represents the proportion of samples that the model correctly classifies out of the total number of samples, demonstrating the model’s classification accuracy across the entire dataset.

The formulas are as follows :

accuracy=TP+TNTP+FP+TN+FN,accuracy=TP+TNTP+FP+TN+FN,<math display="block" altimg="eq-00003.gif"><mstyle><mtext mathvariant="normal">accuracy</mtext></mstyle><mo>=</mo><mfrac><mrow><mstyle><mtext mathvariant="normal">TP</mtext></mstyle><mo>+</mo><mstyle><mtext mathvariant="normal">TN</mtext></mstyle></mrow><mrow><mstyle><mtext mathvariant="normal">TP</mtext></mstyle><mo>+</mo><mstyle><mtext mathvariant="normal">FP</mtext></mstyle><mo>+</mo><mstyle><mtext mathvariant="normal">TN</mtext></mstyle><mo>+</mo><mstyle><mtext mathvariant="normal">FN</mtext></mstyle></mrow></mfrac><mo>,</mo></math>(1)

Recall=TPTP+FN,Recall=TPTP+FN,<math display="block" altimg="eq-00004.gif"><mstyle><mtext mathvariant="normal">Recall</mtext></mstyle><mo>=</mo><mfrac><mrow><mstyle><mtext mathvariant="normal">TP</mtext></mstyle></mrow><mrow><mstyle><mtext mathvariant="normal">TP</mtext></mstyle><mo>+</mo><mstyle><mtext mathvariant="normal">FN</mtext></mstyle></mrow></mfrac><mo>,</mo></math>(2)

Precision=FPFP+TN,Precision=FPFP+TN,<math display="block" altimg="eq-00005.gif"><mstyle><mtext mathvariant="normal">Precision</mtext></mstyle><mo>=</mo><mfrac><mrow><mstyle><mtext mathvariant="normal">FP</mtext></mstyle></mrow><mrow><mstyle><mtext mathvariant="normal">FP</mtext></mstyle><mo>+</mo><mstyle><mtext mathvariant="normal">TN</mtext></mstyle></mrow></mfrac><mo>,</mo></math>(3)

F1_Score=recall×precisionrecall+precision×2.F1_Score=recall×precisionrecall+precision×2.<math display="block" altimg="eq-00006.gif"><mi>F</mi><mn>1</mn><mo>_</mo><mstyle><mtext mathvariant="normal">Score</mtext></mstyle><mo>=</mo><mfrac><mrow><mstyle><mtext mathvariant="normal">recall</mtext></mstyle><mo>×</mo><mstyle><mtext mathvariant="normal">precision</mtext></mstyle></mrow><mrow><mstyle><mtext mathvariant="normal">recall</mtext></mstyle><mo>+</mo><mstyle><mtext mathvariant="normal">precision</mtext></mstyle></mrow></mfrac><mo>×</mo><mn>2</mn><mo>.</mo></math>(4)

TP (True Positives) is the number of samples correctly classified as the positive class by the model. TN (True Negatives) is the number of samples correctly classified as the negative class by the model. FP (False Positives) is the number of samples incorrectly classified as the positive class by the model. FN (False Negatives) is the number of samples incorrectly classified as the negative class by the model.

The F1 Score combines Precision and Recall, serving as a balanced metric, especially useful for handling imbalanced class situations. It ranges from 0 to 1, with higher values indicating a better balance between Precision and Recall in the model.

2.2.4. Loss function

Cross-entropy loss is applicable to multi-class classification problems where samples can be categorized into multiple distinct classes. It can handle relationships between multiple classes to determine the most probable class and assess the model’s performance on each class.

The cross-entropy loss function is used to measure the difference between the model’s output probability distribution and the true labels. It is composed of negative log-likelihood loss (NLL Loss) and the softmax function. Equation (5) is the cross-entropy loss function, Eq. (6) is the NLL Loss function, and Eq. (7) is the softmax function. The formulas are as follows :

loss(p,q)=−n∑i=1p(xi)log(exp(xi)∑jexp(xi)),loss(p,q)=−n∑i=1p(xi)log(exp(xi)∑jexp(xi)),<math display="block" altimg="eq-00007.gif"><mstyle><mtext mathvariant="normal">loss</mtext></mstyle><mo stretchy="false">(</mo><mi>p</mi><mo>,</mo><mi>q</mi><mo stretchy="false">)</mo><mo>=</mo><mo>−</mo><munderover accentunder="true" accent="true"><mrow><mo>∑</mo></mrow><mrow><mi>i</mi><mo>=</mo><mn>1</mn></mrow><mrow><mi>n</mi></mrow></munderover><mi>p</mi><mo stretchy="false">(</mo><msub><mrow><mi>x</mi></mrow><mrow><mi>i</mi></mrow></msub><mo stretchy="false">)</mo><mstyle><mtext mathvariant="normal">log</mtext></mstyle><mfenced separators="" open="(" close=")"><mrow><mfrac><mrow><mo>exp</mo><mfenced separators="" open="(" close=")"><mrow><msub><mrow><mi>x</mi></mrow><mrow><mi>i</mi></mrow></msub></mrow></mfenced></mrow><mrow><munder><mrow><mo>∑</mo></mrow><mrow><mi>j</mi></mrow></munder><mstyle><mtext mathvariant="normal">exp</mtext></mstyle><mo stretchy="false">(</mo><msub><mrow><mi>x</mi></mrow><mrow><mi>i</mi></mrow></msub><mo stretchy="false">)</mo></mrow></mfrac></mrow></mfenced><mo>,</mo></math>(5)

H (p, q) = - n \sum i = 1 p (x i) log (q (x i)), H (p, q) = - n \sum i = 1 p (x_{i}) log (q (x_{i})), <math display="block" altimg="eq-00008.gif"><mi>H</mi><mfenced separators="" open="(" close=")"><mrow><mi>p</mi><mo>,</mo><mi>q</mi></mrow></mfenced><mo>=</mo><mo>-</mo><munderover accentunder="true" accent="true"><mrow><mo>\sum</mo></mrow><mrow><mi>i</mi><mo>=</mo><mn>1</mn></mrow><mrow><mi>n</mi></mrow></munderover><mi>p</mi><mo stretchy="false">(</mo><msub><mrow><mi>x</mi></mrow><mrow><mi>i</mi></mrow></msub><mo stretchy="false">)</mo><mstyle><mtext mathvariant="normal">log</mtext></mstyle><mo stretchy="false">(</mo><mi>q</mi><mo stretchy="false">(</mo><msub><mrow><mi>x</mi></mrow><mrow><mi>i</mi></mrow></msub><mo stretchy="false">)</mo><mo stretchy="false">)</mo><mo>,</mo></math> (6)

softmax(xi)=exp(xi)∑jexp(x).softmax(xi)=exp(xi)∑jexp(x).<math display="block" altimg="eq-00009.gif"><mstyle><mtext mathvariant="normal">softmax</mtext></mstyle><mo stretchy="false">(</mo><msub><mrow><mi>x</mi></mrow><mrow><mi>i</mi></mrow></msub><mo stretchy="false">)</mo><mo>=</mo><mfrac><mrow><mo>exp</mo><mfenced separators="" open="(" close=")"><mrow><msub><mrow><mi>x</mi></mrow><mrow><mi>i</mi></mrow></msub></mrow></mfenced></mrow><mrow><munder><mrow><mo>∑</mo></mrow><mrow><mi>j</mi></mrow></munder><mstyle><mtext mathvariant="normal">exp</mtext></mstyle><mo stretchy="false">(</mo><mi>x</mi><mo stretchy="false">)</mo></mrow></mfrac><mo>.</mo></math>(7)

Here, “p” represents the true value, “q” represents the predicted value, “x” represents the predicted result, which is a vector

x = [x 1, x 2, x 3, x 4]

$x = [x 1, x 2, x 3, x 4]$ , and the number of elements is the same as the number of categories. “i” represents the actual label of the sample. For example,

x_{i}

$x_{i}$ refers to the i element in the vector.

2.2.5. Experimental parameters

In this experiment, we set some parameters to optimize the training results. StepLR was used to dynamically adjust the learning rate, decreasing it by a fixed step size to help the model converge more stably. Specifically, StepLR was set to multiply the learning rate by 0.1 every 10 epochs, gradually reducing the learning rate. To prevent overfitting, the weight of L2 regularization was set to 1e-4. Table 2 shows the specific optimizer and parameter settings, as well as some experimental data.

**Table 2. Experimental parameter.**
Experimental parameter	Values
Image size	$224 \times 224$ $224 \times 224$
Learning rate	0.0001
Epochs	100
Batch size	10
Loss function	Cross-entropy loss
L2	Weight_decay=1e-4
learning rate optimization	StepLR
Optimizer	Adam
Training time for Dataset 1	13′45″/epoch
Training time for Dataset 2	2′29″/epoch

3. Results

3.1. Ablation experiments

The standard Resnet101 was employed on datasets. Specifically, when training Dataset 1 with ResNet101, there was slight overfitting observed in the model. This was characterized by a significant difference in accuracy between the validation set and the training set. To mitigate this risk, Gaussian noise was introduced into the model.

The integration of Gaussian noise produced tangible effects. Results of ablation experiments presented in Table 3 clearly indicate that incorporating Gaussian noise effectively reduced overfitting, thereby enhancing the generalization capability of the model. This improvement, in turn, bolstered the model’s stability and reliability. These findings provide substantial support and guidance for the clinical application.

**Table 3. Ablation experiments on Dataset 1.**
Model	Val-Acc	Train-Acc
ResNet101	81.10	98.99
ResNet101+Salt-and-pepper Noise	73.20	90.20
ResNet101+Gaussian noise	98.93	99.31

3.2. Comparative experiments

In the comparative experiment performance, classic deep learning models were used both on Dataset 1 and Dataset 2, including ResNet18, ResNet50, ResNet101, VGG,²³ VIT,²⁴ Swin-T,²⁵ and EfficientNet.²⁶ These models are classic deep learning models used for multi-class image classification, and have demonstrated excellent performance in previous studies. Results of these models on the test datasets are enumerated in Table 4. Our model demonstrated excellent accuracy, outperforming the VGG model by 7.87% and EfficientNet by 1.28%. Compared to the attention-based VIT and Swin-T models, it achieved higher accuracy by 44.25% and 43.24%, respectively. Additionally, our model’s training results also surpassed other models in terms of the AUC (Area Under the Curve) metric. Notably, the ResNet101 model integrated with Gaussian noise demonstrated the most exemplary performance when trained on Dataset 1. This finding emphasizes the pivotal role of Gaussian noise in enhancing model performance, indicating its effectiveness in boosting the accuracy and generalization capabilities of neural networks.

**Table 4. Comparative experiments on Dataset 1.**
Model	AUC	ACC	F1
VGG16	93.27	90.99	91.35
EfficientNet	98.52	97.58	97.50
VIT	56.29	54.61	52.70
Swin-T	57.43	55.62	58.41
Ours	98.96	98.86	98.84

When trained with Dataset 2, the results on the test set are shown in Table 5. Compared to its series counterparts, ResNet18 and ResNet50, using ResNet101 improved accuracy by 0.3% and 0.15%, respectively. Compared to the CNN series models VGG and EfficientNet, the accuracy increased by 0.88% and 0.37%, respectively. Against the attention-based VIT and Swin-T models, the accuracy improvements were 42.94% and 39.28%, respectively. These comparisons demonstrate ResNet101’s superior performance across various architectures. The ResNet101 exhibits a high accuracy rate of 99.67%, which exceeds the results of previous studies. Additionally, this model achieves an exceptional Area Under the Curve (AUC) value of 99.98%. The ResNet101 model demonstrated exceptional performance, surpassing the results of previous studies.

**Table 5. Comparative experiments on Dataset 2.**
Model	AUC	ACC	F1
ResNet18	99.98	99.37	99.66
ResNet50	99.97	99.52	99.70
ResNet101	99.98	99.67	99.73
VGG16	99.65	98.79	99.20
EfficientNet	99.53	99.30	99.63
ViT	60.45	56.73	57.26
Swin-T	67.75	60.39	61.54

3.3. Experimental results

In Dataset 1, ResNet-101 model integrated with Gaussian noise was employed. As depicted in Fig. 5(a), the changes of accuracy and loss with 100 training cycles are illustrated. During the initial 10 epochs, the model experiences a rapid decrease of loss accompanied with a sharp increase of accuracy. After the 20th epoch, both the loss and accuracy values tend to be saturated, indicating that the model was converged to a position close to the optimal solution. The results demonstrate the effectiveness of incorporating Gaussian noise in enhancing the model’s training dynamics and achieving early stabilization.

In Dataset 2, only ResNet101 was employed in 100 epochs. As depicted in Fig. 5(b), the graph chronicles the changes of accuracy and loss with the training periods. In the initial phase, specifically within the first 10 epochs, the model’s loss value rapidly decreases and subsequently stabilizes post the 10th round, reaching an optimal solution. The accuracy rapidly increases before the 10th epoch and then maintains a stability thereafter. These results underscore the efficacy and stability of the Dataset 2 and the model in achieving rapid convergence to a stable state.

The horizontal axis of the ROC curve represents the False Positive Rate (FPR), or the rate of false alarms, which indicates the proportion of negative samples incorrectly identified as positive among all negative samples. The vertical axis of the ROC curve represents the True Positive Rate (TPR), also known as recall or sensitivity, which measures the proportion of positive samples correctly identified as positive out of all positive samples. The shape of the ROC curve and the area under the curve (AUC) provide an intuitive display of the model’s performance: the closer the curve approaches the top left corner, and the higher the AUC value, the better the model’s performance. A ROC curve that hugs the top left corner signifies that the model achieves a high recall rate while maintaining a low false positive rate, indicating excellent performance. An AUC value close to 1 denotes outstanding model performance. Ideally, the ROC curve should be smooth, without sharp angles or fluctuations, reflecting the model’s robustness against noise and sample variability.

In Fig. 6(a), the ROC curve of Dataset 1 in the designed model exhibits a distinct inflection point near the upper left corner, indicating a significant improvement in model performance. This is generally a positive feature, reflecting the stability of the model’s predictions. The AUC values have reached over 97%, denoting superior performance. In Fig. 6(b), the ROC curve and AUC value demonstrate outstanding performance. These results thoroughly showcase the outstanding capabilities of the dataset and model.

The confusion matrix plays a crucial role in the field of deep learning, particularly when evaluating the performance of classification models. By presenting the model’s predictive results alongside the actual labels in a matrix format, the confusion matrix enables us to intuitively grasp the model’s classification capabilities across different categories. This matrix is an $N \times N$ $N \times N$ square matrix, where N represents the total number of categories. In this matrix, each row corresponds to the actual category, and each column to the predicted category by the model. Each element within the matrix denotes the number of samples that the model has classified, located at the intersection of the corresponding actual and predicted categories.

In Fig. 7, the concentration of data points along the main diagonal of the confusion matrices demonstrates the models’ high accuracy, with a low error rate in misclassifying samples of one category as another. These figures allow for a comprehensive assessment of the models’ performance across different categories, illustrating that Dataset 1 and Dataset 2 exhibit high accuracy, recall, precision, and balanced F1 scores on ResNet101, respectively. These comprehensive metrics indicate good classification performance of the models.

To gain a more detailed understanding of the model’s performance, the precision for each disease in Dataset 1 was calculated. As shown in Table 6, the “Advanced-stage PM” category had the highest precision at 99.79%, while the “Severe tessellated fundus” category had the lowest precision, but still reached 97.75%.

**Table 6. The precision for each class in Dataset 1.**
Class	Precision
Normal fundus or mild tessellated fundus	99.56
Severe tessellated fundus	97.75
Early-stage PM	99.16
Advanced-stage PM	99.79
Other	98.04

Additionally, Table 7 shows the accuracy of each disease in Dataset 2. Among the 8 diseases, 5 achieved an accuracy of 100%, with the lowest accuracy being 98.81%, demonstrating the excellent performance of the model.

**Table 7. The precision for each class in Dataset 2.**
Class	Precision
Diffuse choroidal retinal images 0	100
Diffuse choroidal retinal images 1	99.41
Diffuse choroidal retinal images 2	98.81
Diffuse choroidal retinal images 3	100
Patchy atrophy fundus color photographs 0	100
Patchy atrophy fundus color photographs 1	100
Patchy atrophy fundus color photographs 2	98.86
Patchy atrophy fundus color photographs 3	100

4. Discussion

Pathological Myopia (PM) is an ocular condition that can lead to severe visual problems. Early diagnosis and classification of PM are crucial. In this study, we utilized the ResNet101 deep learning model to classify PM in retinal fundus images. To precisely differentiate between the various stages of PM lesions, we specifically constructed two datasets: Dataset 1 for coarse classification of PM, and Dataset 2 for fine classification, a first in previous studies.

In this paper, three novel contributions were proposed: (1) We established two fundus color photograph datasets, where Dataset 1 covered various stages of pathological myopia (PM) lesions, and Dataset 2 provided a detailed subclassification for early and late stages of PM. This classification approach aligns more closely with clinical needs and offers more precise diagnostic support for doctors. (2) Resnet101 was applied to these two independent PM datasets, achieving outstanding results and establishing an automated classification system specifically for PM. (3) Gaussian noise was integrated into the parameters of Resnet, effectively reducing the risk of overfitting. Compared to other deep learning methods, the ResNet101 model exhibits exceptional image classification performance, effectively extracting complex features from retinal fundus images. With technological advancements, ResNet101 and its extensions are expected to play a greater role in ophthalmic healthcare, significantly enhancing the efficiency of early diagnosis and disease monitoring.

On the classification of PM, we observed that the risk of overfitting increases with the depth of the model. To address this issue, we introduced Gaussian noise. However, the applicability of this method may still be questionable in other FCPs. In the future, we plan to conduct experiments on more fundus color photograph datasets to validate the universality of the method. Moreover, Dataset 2 lacks detail in PM classification; hence, we will further refine the classification within this dataset and conduct in-depth studies on different stages of lesion development. Additionally, we will delve deeper into analyzing both Dataset 1 and Dataset 2, focusing on extracting features from fundus vasculature and detecting lesion areas to provide more accurate diagnoses from a vascular perspective.

5. Conclusion

In summary, we proposed an automated multiclassification method for PM based on FCPs. Coarse classification across various stages of PM and eight detailed classifications of early and late PM were performed. ResNet101 was applied to achieve an average accuracy of 98.86% for coarse classification, and 99.67% for fine classification. This method effectively provides a tool for early detection of PM and demonstrates significant potential in the field of ophthalmic diagnostics.

Acknowledgments

This work was supported by the Natural National Science Foundation of China (62175156), the Science and technology innovation project of Shanghai Science and Technology Commission (22S31903000), and Collaborative Innovation Project of Shanghai Institute of Technology (XTCX2022-27).

Conflict of Interest

Dr. zhao reports grants from Collaborative Innovation Project of Shanghai Institute of Technology (XTCX2022-27), grants from the National Natural Science Foundation of China (62175156), grants from the Science and Technology Commission of Shanghai Municipality (22S31903000), during the conduct of the study. The authors declare that there are no conflicts of interest relevant to this paper.

ORCID

Jiaqing Zhao https://orcid.org/0009-0000-2946-7518

Guogang Cao https://orcid.org/0000-0001-5991-1210

Jiangnan He https://orcid.org/0000-0002-4192-546X

References

1. V. S. E. Jeganathan, S.-M. Saw, T.-Y. Wong, Ocular morbidity of pathological myopia, Myopia, pp. 97–120, World Scientific (2010). Link, Google Scholar
2. B. A. Holden, T. R. Fricke, D. A. Wilson, M. Jong, K. S. Naidoo, P. Sankaridurg et al., “Global prevalence of myopia and high myopia and temporal trends from 2000 through 2050,” Ophthalmology 123(5), 1036–1042 (2016), https://doi.org/10.1016/j.ophtha.2016.01.006 Crossref, Web of Science, Google Scholar
3. Y. Tang, X. Wang, J. Wang, W. Huang, Y. Gao, Y. Luo, Y. Lu, “Prevalence and causes of visual impairment in a Chinese adult population: The Taizhou eye study,” Ophthalmology 122(7), 1480–1488 (2015), https://doi.org/10.1016/j.ophtha.2015.03.022 Crossref, Web of Science, Google Scholar
4. K. Ohno-Matsui, R. Kawasaki, J. B. Jonas, C. M. Cheung, S. M. Saw, V. J. Verhoeven et al., International photographic classification and grading system for myopic maculopathy, Am. J. Ophthalmol. 159(5), 877–83.e7 (2015), https://doi.org/10.1016/j.ajo.2015.01.022 Crossref, Web of Science, Google Scholar
5. T. Sherman, Fundus photographs and other fundus imaging, Fundamentals of Ophthalmic Practice, pp. 229–237, World Scientific (2023). Link, Google Scholar
6. W. Zhang, X. Zhao, Y. Chen, J. Zhong, Z. Yi, “DeepUWF: An automated ultrawide-field fundus screening system via deep learning,” IEEE J. Biomed. Health Informatics 25(8), 2988–2996 (2021). Crossref, Web of Science, Google Scholar
7. Y. Tang, Y. Xu, Diabetic retinopathy analysis based on retinal fundus photographs via deep learning, Advances in Pattern Recognition and Artificial Intelligence, pp. 61–76, World Scientific (2021). Link, Google Scholar
8. Z. Gao, J. Li, J. Guo, Y. Chen, Z. Yi, J. Zhong, “Diagnosis of diabetic retinopathy using deep neural networks,” IEEE Access 7, 3360–3370 (2019). Crossref, Web of Science, Google Scholar
9. R. Du, S. Xie, Y. Fang, T. Igarashi-Yokoi, M. Moriyama, S. Ogata et al., “Deep learning approach for automated detection of myopic maculopathy and pathologic myopia in fundus images,” Ophthalmol. Retina 5(12), 1235–1244 (2021), https://doi.org/10.1016/j.oret.2021.02.006 Crossref, Google Scholar
10. J. Tang, M. Yuan, K. Tian, Y. Wang, D. Wang, J. Yang et al., “An artificial-intelligence-based automated grading and lesions segmentation system for myopic maculopathy based on color fundus photographs,” Transl. Vis. Sci. Technol. 11(6), 16 (2022), https://doi.org/10.1167/tvst.11.6.16 Crossref, Web of Science, Google Scholar
11. Z. Wu, W. Cai, H. Xie, S. Chen, Y. Wang, B. Lei et al., “Predicting optical coherence tomography-derived high myopia grades from fundus photographs using deep learning,” Front. Med. (Lausanne) 9, 842680 (2022), https://doi.org/10.3389/fmed.2022.842680 Crossref, Web of Science, Google Scholar
12. L. Lu, P. Ren, X. Tang, M. Yang, M. Yuan, W. Yu et al., “AI-model for identifying pathologic myopia based on deep learning algorithms of myopic maculopathy classification and “plus” lesion detection in fundus images,” Front. Cell Dev. Biol. 9, 719262 (2021) https://doi.org/10.3389/fcell.2021.719262. Crossref, Web of Science, Google Scholar
13. J. Li, L. Wang, Y. Gao, Q. Liang, L. Chen, X. Sun et al., “Automated detection of myopic maculopathy from color fundus photographs using deep convolutional neural networks,” Eye Vis. (Lond.) 9(1), 13 (2022), https://doi.org/10.1186/s40662-022-00285-3 Crossref, Web of Science, Google Scholar
14. K. J. Choi, J. E. Choi, H. C. Roh, J. S. Eun, J. M. Kim, Y. K. Shin et al., “Deep learning models for screening of high myopia using optical coherence tomography,” Sci. Rep. 11(1), 21663 (2021), https://doi.org/10.1038/s41598-021-00622-x Crossref, Web of Science, Google Scholar
15. R. Wang, J. He, Q. Chen, L. Ye, D. Sun, L. Yin et al., “Efficacy of a deep learning system for screening myopic maculopathy based on color fundus photographs,” Ophthalmol. Ther. 12(1), 469–484 (2023), https://doi.org/10.1007/s40123-022-00621-9 Crossref, Web of Science, Google Scholar
16. K. He, X. Zhang, S. Ren, J. Sun, “Deep residual learning for image recognition,” Proc. IEEE Conf. Computer Vision and Pattern Recognition, pp. 770–778 (2016). Crossref, Google Scholar
17. Y. Pan, J. Liu, Y. Cai, X. Yang, Z. Zhang, H. Long et al., “Fundus image classification using Inception V3 and ResNet-50 for the early diagnostics of fundus diseases,” Front. Physiol. 14, 1126780 (2023). Crossref, Web of Science, Google Scholar
18. N. Chea, Y. Nam, “Classification of fundus images based on deep learning for detecting eye diseases,” Comput., Mater. Continua 67(1), 411–426 (2021). Crossref, Web of Science, Google Scholar
19. R. Benzi, A. Sutera, A. Vulpiani, “The mechanism of stochastic resonance,” J. Phys. A: Math. Gen. 14, L453 (1999), https://doi.org/10.1088/0305-4470/14/11/006 Crossref, Google Scholar
20. L. Pereira, Y. Taya, I. Kobayashi, “Multi-layer random perturbation training for improving model generalization efficiently,” Proc. Fourth BlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP, pp. 303–310 (2021). Crossref, Google Scholar
21. X. Yu, Y. Xue, L. Zhang, L. Wang, T. Liu, D. Zhu, “Exploring the influence of information entropy change in learning systems” (2023), arXiv:230910625. Google Scholar
22. Y. Shao, S. C. Liew, D. Gündüz, “Denoising noisy neural networks: A bayesian approach with compensation,” IEEE Trans. Signal Process. 71, 2460–2474 (2023). Crossref, Web of Science, Google Scholar
23. K. Simonyan, A. Zisserman, “Very deep convolutional networks for large-scale image recognition” (2014), arXiv:14091556. Google Scholar
24. A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner et al., “An image is worth 16×16 words: Transformers for image recognition at scale” (2020), arXiv:201011929. Google Scholar
25. Z. Liu, Y. Lin, Y. Cao, H. Hu, Y. Wei, Z. Zhang et al., “Swin transformer: Hierarchical vision transformer using shifted windows, Proc. IEEE/CVF Int. Conf. Computer Vision 2021, pp. 10012–10022 (2021). Crossref, Google Scholar
26. M. Tan, Q. Le, “Efficientnet: Rethinking model scaling for convolutional neural networks,” Int. Conf. Machine Learning: PMLR, pp. 6105–6114 (2019). Google Scholar

Vol. 17, No. 06

Metrics

Downloaded 588 times

History

Received 29 April 2024

Accepted 7 July 2024

Published: 17 August 2024

Information

This is an Open Access article. It is distributed under the terms of the Creative Commons Attribution 4.0 (CC-BY) License. Further distribution of this work is permitted, provided the original work is properly cited.

Keywords

PDF download