World Scientific
Skip main navigation

Cookies Notification

We use cookies on this site to enhance your user experience. By continuing to browse the site, you consent to the use of our cookies. Learn More
×

System Upgrade on Tue, May 28th, 2024 at 2am (EDT)

Existing users will be able to log into the site and access content. However, E-commerce and registration of new users may not be available for up to 12 hours.
For online purchase, please visit us again. Contact us at customercare@wspc.com for any enquiries.

Multi-class classification of pathological myopia based on fundus photography

    https://doi.org/10.1142/S1793545824500160Cited by:0 (Source: Crossref)

    Abstract

    Pathological myopia (PM) is a severe ocular disease leading to blindness. As a traditional noninvasive diagnostic method, fundus color photography (FCP) is widely used in detecting PM due to its high fidelity and precision. However, manual examination of fundus photographs for PM is time-consuming and prone to high error rates. Existing automated detection technologies have yet to study the detailed classification in diagnosing different stages of PM lesions. In this paper, we proposed an intelligent system which utilized Resnet101 technology to multi-categorically diagnose PM by classifying FCPs with different stages of lesions. The system subdivided different stages of PM into eight subcategories, aiming to enhance the precision and efficiency of the diagnostic process. It achieved an average accuracy rate of 98.86% in detection of PM, with an area under the curve (AUC) of 98.96%. For the eight subcategories of PM, the detection accuracy reached 99.63%, with an AUC of 99.98%. Compared with other widely used multi-class models such as VGG16, Vision Transformer (VIT), EfficientNet, this system demonstrates higher accuracy and AUC. This artificial intelligence system is designed to be easily integrated into existing clinical diagnostic tools, providing an efficient solution for large-scale PM screening.

    1. Introduction

    Pathological myopia (PM)1 is a vision condition characterized by extreme nearsightedness. It poses a significant and growing concern for young people worldwide.2 In recent years, its prevalence among adolescents and young adults has been increasing.3 PM is defined as having a refractive error of −6.00 diopters or more, which significantly increases the risk of developing various ocular complications, such as retinal detachment, glaucoma, and macular degeneration.4 These conditions can potentially lead to vision impairment or blindness, impacting the quality of life for young individuals. Preventive measures, early intervention, and education about eye health are essential to mitigate the potential long-term impact on the vision and overall well-being of youth worldwide.

    Fundus color photography (FCP)5 is a medical imaging technique used to capture color images of the posterior segment of the eye, including the retina, optic nerve, choroid, and other critical structures. This technology plays a crucial role in ophthalmic diagnosis and the monitoring of eye conditions. FCP is also a non-invasive examination technique that is typically safe and painless for patients. It allows ophthalmologists to gain a better understanding of ocular conditions, facilitating early intervention and treatment to preserve vision and overall eye health. The manifestations of PM in FCPs vary with the progression of the condition. The traditional diagnostic approach is the manual diagnosis by physicians, which is inefficient and consumes medical resources. With development of artificial intelligence (AI), deep learning is increasingly maturing which can be used to automatically diagnose and grade the severity of fundus diseases.6,7,8

    In recent years, deep learning techniques have been deeply investigated for automated detection of PM, such as EfficientNet, Resnet, VGG, Vision Transformer. Among these, Du et al.9 employed the EfficientNet network model to train on 7020 fundus images, achieving an accuracy of 87.53%. Tang et al.10 combined ResNet-50 with DeepLabv3+3+ to classify Myopic Maculopathy, and achieving an accuracy of 90.55%. Wu et al.1111 employed ResNet-34 and the spatial pyramid pooling (ASPP) module to classify PM using a dataset combining fundus photographs with OCT images, achieving an accuracy of 91.67%. In order to further increase the diagnostic accuracy, large FCP dataset was adopted to address problem of data scarcity. Among these studies, Lu et al.12 used a Faster R-CNN based on 32,010 fundus images of PM, achieving an AUC and accuracy of 97.36%. Li et al.13 utilized a dual-stream Deep Convolutional Neural Network (DCNN-DS) model, obtaining a sensitivity of 93.3% and a specificity of 99.6% for detecting PM.

    Besides the basic diagnosis of PM, efforts are dedicated in categorizing PM into different stages of the disease. Choi et al.14 utilized classical models as backbone architectures and employed both single-column and multi-column convolutional neural network (CNN) models to screen PM in OCT images, dividing into three general categories. Wang et al.15 used EfficientNet to classify PM into four different stages of lesions, achieving an accuracy of 95%. However, in these studies, only simple rough PM classification was performed. Clinically, diagnosing PM is complex due to the diversity of disease stages and the frequent co-occurrence of other retinal pathologies. Therefore, to accurately diagnose different stages of the disease, a more detailed classification is necessary.

    In this study, multi-class classification of PM was performed. In order to be consistent with the clinical practice which demands both diagnosis and classification, two nonoverlapping datasets were constructed, one for the preliminary diagnosis of PM and another for the detailed classification of PM. The preliminary diagnosis can differentiate between different stages of PM lesions, while the detailed classification identifies the specific types of PM lesions, providing more comprehensive diagnostic support for doctors. In the multi-class diagnosis task, Resnet10116 was used which was proved to provide satisfactory results in medical classification.17,18 The residual connection is introduced in Resnet101, which solves the problem of gradient disappearance and enables the network to support more than 100 layers without causing performance degradation. The structural optimization of the model ensures finer feature extraction capabilities, making it particularly suitable for complex image processing tasks. As results shown, the preliminary classification achieved an accuracy of 98.96% in the testing phase, and the detailed classification achieved an even higher accuracy of 99.63%.

    2. Materials and Methods

    2.1. Dataset

    In this study, the dataset was sourced from the Shanghai Eye Disease Prevention and Treatment Center (SEDPTC), China. Physicians employed the TOPCON DRI Triton device to obtain 45 FCPs centered on the macula from participants. To ensure high-quality image data, all images where the macular region was not fully visible or was obscured by more than 50% were excluded. A total of 7114 images with clear visibility of the macula was selected for the development of the model. The dataset was meticulously classified and annotated by three independent retinal specialists whose diagnostic evaluations were carefully documented. These evaluations were utilized for a comparative analysis against the performance of the artificial intelligence model. In instances of diagnostic disagreement, a consensus was reached through discussion with an additional senior expert, thereby confirming the final diagnoses. This rigorous approach ensures a robust dataset for advancing the accuracy of AI-driven diagnostic tools in ophthalmology. In this study, the use of retinal fundus images was approved by the Ethics Committee of Shanghai General Hospital, Shanghai Jiao Tong University School of Medicine, and adhered to the tenets of the Declaration of Helsinki (Approval ID: No. 2015KY156). Written informed consent forms were obtained from all participants.

    To effectively distinguish PM from other retinal diseases, Dataset 1 was developed. This dataset includes a coarse classification of PM and other ocular diseases, enabling effective diagnosis and preliminary categorization of PM among a wide array of retinal conditions. This approach not only enhances diagnostic accuracy but also streamlines the diagnostic process. To enable detailed classification of PM, we constructed Dataset 2, which includes four types of early-stage PM and four types of late-stage PM. Early-stage PM primarily encompasses four stages of diffuse choroidal thinning, while late-stage PM includes four stages of patchy atrophy. This classification aids in the precise diagnosis and study of the progression of PM. It is important to note that the images in Dataset 2 are entirely distinct from those in Dataset 1, ensuring the independence and diversity of the data across both datasets.

    2.1.1. Dataset 1

    Dataset 1 contains 6365 FCPs of PM at different stages of pathological changes. The dataset was classified using the META-PM (Meta Analyses of Pathologic Myopia) classification rules and made minor modifications to individual images based on the risk of disease progression and its impact on vision. In this study, severe lattice basement membrane was defined as equal to or more severe than grade 2. The dataset was divided into five groups: A0: normal or mild lattice basement membrane, A1: severe lattice basement membrane, A2: early PM, A3: late PM, Other: other types of diseases (shown in Fig. 1). Details are given in Table 1.

    Fig. 1.

    Fig. 1. Pathologic myopic fundus color photographs. (a) Normal or mildly tessellated fundus, (b) severely tessellated fundus, (c) early pathologic myopia, (d) advanced pathologic myopia, and (e) other diseases.

    Table 1. Detailed classification of myopic maculopathy.

    Myopic maculopathyDefinition
    Normal fundus or mild tessellated fundusNo macular lesions or blurred choroidal vessels outside the macular region (Fig. 1(a))
    Severe tessellated fundusEqual to or more serious than Grade 2 proposed (Fig.1(b))
    Early-stage PMDiffuse chorioretinal atrophy or lacquer cracks alone (Fig. 1(c))
    Advanced-stage PMPatchy chorioretinal atrophy, macular atrophy, myopic CNV, or Fuchs’ spot (Fig. 1(d))
    OtherOther categories of fundus diseases (Fig. 1(e))

    2.1.2. Dataset 2

    Dataset 2 includes 581 fundus photographs depicting early PM with diffuse chorioretinal myopic degeneration, and 168 photographs showing patchy atrophy associated with late PM.

    Diffuse chorioretinal myopic degeneration is an ophthalmic condition closely related to PM, characterized by extensive abnormal changes in the choroid and retina at the back of the eye. In cases of myopia, excessive elongation of the ocular axis leads to abnormal stretching and deformation of the intraocular retina and choroid, which may evolve into diffuse chorioretinal degeneration. This condition results in a thinner, distorted, and constricted ocular fundus structure, potentially causing various complications such as changes in the retinal pigment epithelium, retinal tears, or detachments. Consequently, patients may experience progressively worsening vision, visual field defects, flashes of light within the eye, floaters, or the presence of blurred dark shadows. These 581 fundus photographs showing diffuse chorioretinal myopic degeneration were classified into four different stages of lesion development, as shown in Figs. 2(a)–2(d).

    Fig. 2.

    Fig. 2. Dataset 2: (a)–(d) Diffuse choroidal retinal images, (e)–(h) Patchy atrophy fundus color photographs.

    Patchy atrophy represents a more severe retinal condition than diffuse chorioretinal myopic degeneration. In cases of PM, the excessive elongation of the ocular axis results in persistent traction and deformation of the macular region, leading to cellular degeneration and dystrophy within the macula, culminating in areas of patchy atrophy. Patients with this condition not only suffer from severe refractive errors but also from macular degeneration, further increasing the risk of visual impairment. The presence of patchy atrophy can significantly exacerbate visual disabilities in patients with PM, causing symptoms such as blurred central vision, visual distortion, and reduced contrast, which, in severe cases, may lead to complete blindness. To facilitate in-depth analysis, 168 fundus images of PM showing patchy atrophy were divided into four categories, as illustrated in Figs. 2(e)–2(h). These categories include:

    (1) Patchy atrophy with a single lesion located in the periphery of the macula, not involving the macula, and characterized by defects in the peripheral Bruch’s membrane plane (Fig. 2(e)). (2) Patchy atrophy with a single lesion and associated macular atrophy, not involving the macula (Fig. 2(f)). (3) Patchy atrophy with multiple lesions located in the periphery of the macula, not involving the macula, characterized by defects in the peripheral Bruch’s membrane plane (Fig. 2(g)). (4) Patchy atrophy with multiple lesions and associated macular atrophy, not involving the macula (Fig. 2(h)).

    2.1.3. Image preprocess

    To enhance the accuracy of the proposed classification model, a series of image preprocessing steps were performed on the original fundus photographs. These steps were categorized into several modules: removing unnecessary black background, data augmentation, data normalization, and standardization. In the process of removing black backgrounds, the objective was to eliminate black edges that could interfere with key feature detection, retaining only the effective areas of the image. Specifically, a mask of the colored regions, which contains the color information and the corresponding coordinates, was extracted from the RGB images by setting the pixel values of the background to zero. Subsequently, by using the OpenCV toolkit, pixel information was traversed to determine the bounding box of the image. Based on this bounding box, the RGB image was cropped to include only essential visual information.

    During the data augmentation phase, various techniques were employed to expand the dataset, such as horizontal and vertical flipping, contrast enhancement, and grayscale conversion. In the subsequent normalization step, the pixel values of the augmented images were scaled to a range of 0 to 1, preparing for model training. Finally, prior to training the deep learning model, the z-score method was applied normalization to standardize the image data, ensuring that the model could learn and generalize more effectively.

    Data augmentation techniques were employed to enhance Dataset, with the distribution of data for each disease category shown in Fig. 3.

    Fig. 3.

    Fig. 3. Distribution of Dataset. (a) Distribution of Dataset 1. 0: normal or mild lattice basement membrane, 1: severe lattice basement membrane, 2: early PM, 3: late PM, 4: other types of diseases. (b) Diffuse choroidal retinal images. 0–3: Four different stages of diffuse choroidal retinal degeneration. (c) Data distribution of four types of patchy atrophy. 0: Patchy atrophy with a single lesion located in the periphery of the macula, not involving the macula, and characterized by defects in the peripheral Bruch’s membrane plane. 1: Patchy atrophy with a single lesion and associated macular atrophy, not involving the macula. 2: Patchy atrophy with multiple lesions located in the periphery of the macula, not involving the macula, characterized by defects in the peripheral Bruch’s membrane plane. 3: Patchy atrophy with multiple lesions and associated macular atrophy, not involving the macula.

    2.2. Methods

    2.2.1. Model

    In this study, ResNet-101 was utilized for multi-classification of fundus images. Since there is the risk of overfitting in performance using Dataset 1 which contains complex data and redundant features from various stages of PM lesions, Gaussian noise was introduced into the parameters of the final layer to prevent overfitting. Gaussian noise is proved to enhance the performance of deep architectures19,20,21 and introducing Gaussian noise into model parameters can improve the model’s generalization ability on new data, which is used to prevent the model from overly relying on specific features in the training data. In this study, the modification of Gaussian noise adding in the ResNet101 effectively prevents overfitting, thereby enhancing the overall performance of the model.21,22 Random Gaussian noise with a standard deviation of 0.3 was added using “torch.randn_like(x)”. This function generates a noise tensor with the same shape as the input tensor x, following a standard normal distribution. The noise tensor is then multiplied by the standard deviation and added to the input tensor x, as shown in Fig. 4 which displays the architecture of the model. Given the small size of Dataset 2, Gaussian noise was not added to the parameters of the model.

    Fig. 4.

    Fig. 4. Multi-classification model for PM.

    As shown in Fig. 4, the structure of ResNet-101 includes a convolutional layer, a max pooling layer, four residual blocks, Gaussian noise, and a fully connected layer (FC). This configuration effectively supports deep network learning, enhancing the model’s processing capability and accuracy. This model can accurately identify and classify various lesions in fundus images, greatly supporting clinical diagnostics.

    2.2.2. Experimental environment

    Experiments were performed on a Dell Precision 5820 workstation containing an Nvidia 24 G memory GeForce RTX 3090 and an Intel (R) Xeon (R)W-2245 CPU, running on a Win10 system with Python 3.9. Pytorch is used as the deep learning framework.

    2.2.3. Evaluation standard

    To assess the performance of the model, metrics were employed, such as accuracy, confusion matrix, Receiver Operating Characteristic (ROC) curve, and Area Under the Curve (AUC) for model evaluation. Accuracy is the simplest and most intuitive evaluation metric. It represents the proportion of samples that the model correctly classifies out of the total number of samples, demonstrating the model’s classification accuracy across the entire dataset.

    The formulas are as follows :

    accuracy=TP+TNTP+FP+TN+FN,accuracy=TP+TNTP+FP+TN+FN,(1)
    Recall=TPTP+FN,Recall=TPTP+FN,(2)
    Precision=FPFP+TN,Precision=FPFP+TN,(3)
    F1_Score=recall×precisionrecall+precision×2.F1_Score=recall×precisionrecall+precision×2.(4)
    TP (True Positives) is the number of samples correctly classified as the positive class by the model. TN (True Negatives) is the number of samples correctly classified as the negative class by the model. FP (False Positives) is the number of samples incorrectly classified as the positive class by the model. FN (False Negatives) is the number of samples incorrectly classified as the negative class by the model.

    The F1 Score combines Precision and Recall, serving as a balanced metric, especially useful for handling imbalanced class situations. It ranges from 0 to 1, with higher values indicating a better balance between Precision and Recall in the model.

    2.2.4. Loss function

    Cross-entropy loss is applicable to multi-class classification problems where samples can be categorized into multiple distinct classes. It can handle relationships between multiple classes to determine the most probable class and assess the model’s performance on each class.

    The cross-entropy loss function is used to measure the difference between the model’s output probability distribution and the true labels. It is composed of negative log-likelihood loss (NLL Loss) and the softmax function. Equation (5) is the cross-entropy loss function, Eq. (6) is the NLL Loss function, and Eq. (7) is the softmax function. The formulas are as follows :

    loss(p,q)=ni=1p(xi)log(exp(xi)jexp(xi)),loss(p,q)=ni=1p(xi)log(exp(xi)jexp(xi)),(5)
    H(p,q)=ni=1p(xi)log(q(xi)),H(p,q)=ni=1p(xi)log(q(xi)),(6)
    softmax(xi)=exp(xi)jexp(x).softmax(xi)=exp(xi)jexp(x).(7)
    Here, “p” represents the true value, “q” represents the predicted value, “x” represents the predicted result, which is a vector x=[x1,x2,x3,x4]x=[x1,x2,x3,x4], and the number of elements is the same as the number of categories. “i” represents the actual label of the sample. For example, xixi refers to the i element in the vector.

    2.2.5. Experimental parameters

    In this experiment, we set some parameters to optimize the training results. StepLR was used to dynamically adjust the learning rate, decreasing it by a fixed step size to help the model converge more stably. Specifically, StepLR was set to multiply the learning rate by 0.1 every 10 epochs, gradually reducing the learning rate. To prevent overfitting, the weight of L2 regularization was set to 1e-4. Table 2 shows the specific optimizer and parameter settings, as well as some experimental data.

    Table 2. Experimental parameter.

    Experimental parameterValues
    Image size224×224224×224
    Learning rate0.0001
    Epochs100
    Batch size10
    Loss functionCross-entropy loss
    L2Weight_decay=1e-4
    learning rate optimizationStepLR
    OptimizerAdam
    Training time for Dataset 113′45″/epoch
    Training time for Dataset 22′29″/epoch

    3. Results

    3.1. Ablation experiments

    The standard Resnet101 was employed on datasets. Specifically, when training Dataset 1 with ResNet101, there was slight overfitting observed in the model. This was characterized by a significant difference in accuracy between the validation set and the training set. To mitigate this risk, Gaussian noise was introduced into the model.

    The integration of Gaussian noise produced tangible effects. Results of ablation experiments presented in Table 3 clearly indicate that incorporating Gaussian noise effectively reduced overfitting, thereby enhancing the generalization capability of the model. This improvement, in turn, bolstered the model’s stability and reliability. These findings provide substantial support and guidance for the clinical application.

    Table 3. Ablation experiments on Dataset 1.

    ModelVal-AccTrain-Acc
    ResNet10181.1098.99
    ResNet101+Salt-and-pepper Noise73.2090.20
    ResNet101+Gaussian noise98.9399.31

    3.2. Comparative experiments

    In the comparative experiment performance, classic deep learning models were used both on Dataset 1 and Dataset 2, including ResNet18, ResNet50, ResNet101, VGG,23 VIT,24 Swin-T,25 and EfficientNet.26 These models are classic deep learning models used for multi-class image classification, and have demonstrated excellent performance in previous studies. Results of these models on the test datasets are enumerated in Table 4. Our model demonstrated excellent accuracy, outperforming the VGG model by 7.87% and EfficientNet by 1.28%. Compared to the attention-based VIT and Swin-T models, it achieved higher accuracy by 44.25% and 43.24%, respectively. Additionally, our model’s training results also surpassed other models in terms of the AUC (Area Under the Curve) metric. Notably, the ResNet101 model integrated with Gaussian noise demonstrated the most exemplary performance when trained on Dataset 1. This finding emphasizes the pivotal role of Gaussian noise in enhancing model performance, indicating its effectiveness in boosting the accuracy and generalization capabilities of neural networks.

    Table 4. Comparative experiments on Dataset 1.

    ModelAUCACCF1
    VGG1693.2790.9991.35
    EfficientNet98.5297.5897.50
    VIT56.2954.6152.70
    Swin-T57.4355.6258.41
    Ours98.9698.8698.84

    When trained with Dataset 2, the results on the test set are shown in Table 5. Compared to its series counterparts, ResNet18 and ResNet50, using ResNet101 improved accuracy by 0.3% and 0.15%, respectively. Compared to the CNN series models VGG and EfficientNet, the accuracy increased by 0.88% and 0.37%, respectively. Against the attention-based VIT and Swin-T models, the accuracy improvements were 42.94% and 39.28%, respectively. These comparisons demonstrate ResNet101’s superior performance across various architectures. The ResNet101 exhibits a high accuracy rate of 99.67%, which exceeds the results of previous studies. Additionally, this model achieves an exceptional Area Under the Curve (AUC) value of 99.98%. The ResNet101 model demonstrated exceptional performance, surpassing the results of previous studies.

    Table 5. Comparative experiments on Dataset 2.

    ModelAUCACCF1
    ResNet1899.9899.3799.66
    ResNet5099.9799.5299.70
    ResNet10199.9899.6799.73
    VGG1699.6598.7999.20
    EfficientNet99.5399.3099.63
    ViT60.4556.7357.26
    Swin-T67.7560.3961.54

    3.3. Experimental results

    In Dataset 1, ResNet-101 model integrated with Gaussian noise was employed. As depicted in Fig. 5(a), the changes of accuracy and loss with 100 training cycles are illustrated. During the initial 10 epochs, the model experiences a rapid decrease of loss accompanied with a sharp increase of accuracy. After the 20th epoch, both the loss and accuracy values tend to be saturated, indicating that the model was converged to a position close to the optimal solution. The results demonstrate the effectiveness of incorporating Gaussian noise in enhancing the model’s training dynamics and achieving early stabilization.

    Fig. 5.

    Fig. 5. Training set accuracy curve and loss curve. (a) Dataset 1 (different stages of lesion PM) and (b) Dataset 2 (early and late PM).

    In Dataset 2, only ResNet101 was employed in 100 epochs. As depicted in Fig. 5(b), the graph chronicles the changes of accuracy and loss with the training periods. In the initial phase, specifically within the first 10 epochs, the model’s loss value rapidly decreases and subsequently stabilizes post the 10th round, reaching an optimal solution. The accuracy rapidly increases before the 10th epoch and then maintains a stability thereafter. These results underscore the efficacy and stability of the Dataset 2 and the model in achieving rapid convergence to a stable state.

    The horizontal axis of the ROC curve represents the False Positive Rate (FPR), or the rate of false alarms, which indicates the proportion of negative samples incorrectly identified as positive among all negative samples. The vertical axis of the ROC curve represents the True Positive Rate (TPR), also known as recall or sensitivity, which measures the proportion of positive samples correctly identified as positive out of all positive samples. The shape of the ROC curve and the area under the curve (AUC) provide an intuitive display of the model’s performance: the closer the curve approaches the top left corner, and the higher the AUC value, the better the model’s performance. A ROC curve that hugs the top left corner signifies that the model achieves a high recall rate while maintaining a low false positive rate, indicating excellent performance. An AUC value close to 1 denotes outstanding model performance. Ideally, the ROC curve should be smooth, without sharp angles or fluctuations, reflecting the model’s robustness against noise and sample variability.

    In Fig. 6(a), the ROC curve of Dataset 1 in the designed model exhibits a distinct inflection point near the upper left corner, indicating a significant improvement in model performance. This is generally a positive feature, reflecting the stability of the model’s predictions. The AUC values have reached over 97%, denoting superior performance. In Fig. 6(b), the ROC curve and AUC value demonstrate outstanding performance. These results thoroughly showcase the outstanding capabilities of the dataset and model.

    Fig. 6.

    Fig. 6. ROC curves and AUC values for Dataset 1 (different stages of lesion PM) and Dataset 2 (early and late PM).

    The confusion matrix plays a crucial role in the field of deep learning, particularly when evaluating the performance of classification models. By presenting the model’s predictive results alongside the actual labels in a matrix format, the confusion matrix enables us to intuitively grasp the model’s classification capabilities across different categories. This matrix is an N×NN×N square matrix, where N represents the total number of categories. In this matrix, each row corresponds to the actual category, and each column to the predicted category by the model. Each element within the matrix denotes the number of samples that the model has classified, located at the intersection of the corresponding actual and predicted categories.

    In Fig. 7, the concentration of data points along the main diagonal of the confusion matrices demonstrates the models’ high accuracy, with a low error rate in misclassifying samples of one category as another. These figures allow for a comprehensive assessment of the models’ performance across different categories, illustrating that Dataset 1 and Dataset 2 exhibit high accuracy, recall, precision, and balanced F1 scores on ResNet101, respectively. These comprehensive metrics indicate good classification performance of the models.

    Fig. 7.

    Fig. 7. Confusion matrix. (a) Dataset 1 (different stages of lesion PM). (b) Dataset 2 (early and late PM).

    To gain a more detailed understanding of the model’s performance, the precision for each disease in Dataset 1 was calculated. As shown in Table 6, the “Advanced-stage PM” category had the highest precision at 99.79%, while the “Severe tessellated fundus” category had the lowest precision, but still reached 97.75%.

    Table 6. The precision for each class in Dataset 1.

    ClassPrecision
    Normal fundus or mild tessellated fundus99.56
    Severe tessellated fundus97.75
    Early-stage PM99.16
    Advanced-stage PM99.79
    Other98.04

    Additionally, Table 7 shows the accuracy of each disease in Dataset 2. Among the 8 diseases, 5 achieved an accuracy of 100%, with the lowest accuracy being 98.81%, demonstrating the excellent performance of the model.

    Table 7. The precision for each class in Dataset 2.

    ClassPrecision
    Diffuse choroidal retinal images 0100
    Diffuse choroidal retinal images 199.41
    Diffuse choroidal retinal images 298.81
    Diffuse choroidal retinal images 3100
    Patchy atrophy fundus color photographs 0100
    Patchy atrophy fundus color photographs 1100
    Patchy atrophy fundus color photographs 298.86
    Patchy atrophy fundus color photographs 3100

    4. Discussion

    Pathological Myopia (PM) is an ocular condition that can lead to severe visual problems. Early diagnosis and classification of PM are crucial. In this study, we utilized the ResNet101 deep learning model to classify PM in retinal fundus images. To precisely differentiate between the various stages of PM lesions, we specifically constructed two datasets: Dataset 1 for coarse classification of PM, and Dataset 2 for fine classification, a first in previous studies.

    In this paper, three novel contributions were proposed: (1) We established two fundus color photograph datasets, where Dataset 1 covered various stages of pathological myopia (PM) lesions, and Dataset 2 provided a detailed subclassification for early and late stages of PM. This classification approach aligns more closely with clinical needs and offers more precise diagnostic support for doctors. (2) Resnet101 was applied to these two independent PM datasets, achieving outstanding results and establishing an automated classification system specifically for PM. (3) Gaussian noise was integrated into the parameters of Resnet, effectively reducing the risk of overfitting. Compared to other deep learning methods, the ResNet101 model exhibits exceptional image classification performance, effectively extracting complex features from retinal fundus images. With technological advancements, ResNet101 and its extensions are expected to play a greater role in ophthalmic healthcare, significantly enhancing the efficiency of early diagnosis and disease monitoring.

    On the classification of PM, we observed that the risk of overfitting increases with the depth of the model. To address this issue, we introduced Gaussian noise. However, the applicability of this method may still be questionable in other FCPs. In the future, we plan to conduct experiments on more fundus color photograph datasets to validate the universality of the method. Moreover, Dataset 2 lacks detail in PM classification; hence, we will further refine the classification within this dataset and conduct in-depth studies on different stages of lesion development. Additionally, we will delve deeper into analyzing both Dataset 1 and Dataset 2, focusing on extracting features from fundus vasculature and detecting lesion areas to provide more accurate diagnoses from a vascular perspective.

    5. Conclusion

    In summary, we proposed an automated multiclassification method for PM based on FCPs. Coarse classification across various stages of PM and eight detailed classifications of early and late PM were performed. ResNet101 was applied to achieve an average accuracy of 98.86% for coarse classification, and 99.67% for fine classification. This method effectively provides a tool for early detection of PM and demonstrates significant potential in the field of ophthalmic diagnostics.

    Acknowledgments

    This work was supported by the Natural National Science Foundation of China (62175156), the Science and technology innovation project of Shanghai Science and Technology Commission (22S31903000), and Collaborative Innovation Project of Shanghai Institute of Technology (XTCX2022-27).

    Conflict of Interest

    Dr. zhao reports grants from Collaborative Innovation Project of Shanghai Institute of Technology (XTCX2022-27), grants from the National Natural Science Foundation of China (62175156), grants from the Science and Technology Commission of Shanghai Municipality (22S31903000), during the conduct of the study. The authors declare that there are no conflicts of interest relevant to this paper.

    ORCID

    Jiaqing Zhao  https://orcid.org/0009-0000-2946-7518

    Guogang Cao  https://orcid.org/0000-0001-5991-1210

    Jiangnan He  https://orcid.org/0000-0002-4192-546X