World Scientific
  • Search
  •   
Skip main navigation

Cookies Notification

We use cookies on this site to enhance your user experience. By continuing to browse the site, you consent to the use of our cookies. Learn More
×
Our website is made possible by displaying certain online content using javascript.
In order to view the full content, please disable your ad blocker or whitelist our website www.worldscientific.com.

System Upgrade on Tue, Oct 25th, 2022 at 2am (EDT)

Existing users will be able to log into the site and access content. However, E-commerce and registration of new users may not be available for up to 12 hours.
For online purchase, please visit us again. Contact us at [email protected] for any enquiries.

Accuracy improvement for classifying retinal OCT images by diseases using deep learning-based selective denoising approach

    https://doi.org/10.1142/S1793545823500086Cited by:0 (Source: Crossref)

    Abstract

    In ophthalmology, retinal optical coherence tomography (OCT) images with noticeable structural features help identify human eyes as healthy or diseased. The recently hot artificial intelligence (AI) realized this recognition process automatically. However, speckle noise in the original retinal OCT image reduces the accuracy of disease classification. This study presents a time-saving approach based on deep learning to improve classification accuracy by removing the noise from the original dataset. Firstly, four pre-trained convolutional neural networks (CNNs) from the ImageNet Large Scale Visual Recognition Challenge (ILSVRC) were trained to classify the original images into two categories: The noise reduction required (NRR) and the noise-free (NF) images. Among the CNNs, VGG19_BN performed best with 98% accuracy and 99% recall. Then, we used the block-matching and 3D filtering (BM3D) algorithm to denoise the NRR images. Those noise-removed NRR and the NF images form the processed dataset. The quality of images in the dataset is prominently ameliorated after denoising, which is valid to improve the models’ performance. The original and processed datasets were tested on the four pre-trained CNNs to evaluate the effectiveness of our proposed approach. We have compared the CNNs, and the results show the performance of the CNNs trained with the processed dataset is improved by an average of 2.04%, 5.19%, and 5.10% under overall accuracy (OA), Macro F1-score, and Micro F1-score, respectively. Especially for DenseNet161, the OA is improved to 98.14%. Our proposed method demonstrates its effectiveness in improving classification accuracy and opens a new solution to reduce denoising time-consuming for large datasets.

    1. Introduction

    Due to the advantages of noninvasive, high-resolution, and real-time imaging, optical coherence tomography (OCT) has been widely used in fundamental research and clinical applications.1 Especially in ophthalmology, OCT provides retinal imaging in vivo and helps ophthalmologists achieve many breakthroughs. For example, discovering glaucomatous structural damage through changes in the optic nerve and macular,2 assessing microvascular changes in the retinochoroidal layers and optic nerve head,3 and capillary-level three-dimensional visualization of vasculature perfusion.4 Recently, a new trend has emerged in research on OCT, which is the combination of OCT with artificial intelligence (AI). AI has already demonstrated its effectiveness in several medical fields, such as ophthalmology, radiology, and pathology.5 With the advent and growing use of OCT generating massive amounts of data, researchers worldwide have tried to adopt AI to analyze this plethora of information and assist in making clinical decisions in ophthalmology.6 As a subfield of AI, deep learning with more complex artificial neural networks has sparked tremendous interest in recent years.7,8

    According to the purpose, the application of deep learning in ophthalmology can be divided into segmentation, prediction, and classification. Segmentation is detecting and representing anatomical structures or pathological areas in retinal images. It helps ophthalmologists quantitatively analyze and make more accurate judgments on the condition of diseases. Liu et al. extracted foveal avascular zone (FAZ) from optical coherence tomography angiography (OCTA) images using an adaptive watershed algorithm.9 Zheng et al. proposed a deep learning-based algorithm to segment and quantify the choroidal thickness (CT) and vasculature.10 Prediction means the prediction of treatment results and the examination of whether an indicator is valuable. Typically, prediction plays a role in disease prognosis and the research of structure-function correlation. Prahs et al. trained a convolutional neural networks (CNNs) with 183,402 retinal OCT B-scans to predict whether a patient with neovascular age-related macular degeneration (AMD) should accept an anti-vascular endothelial growth factor (anti-VEGF) injection in the following 21 days.11 Classification refers to classifying retinal images into different categories according to types of diseases or stages of the same disease. Kermany et al. used a small quantity of collected central retinal images training pre-trained models through transfer learning and successfully classified central retinal B-scans into four categories, early AMD, choroidal neovascularization (CNV), diabetic macular edema (DME), and normal.12 Gulshan et al. fine-tuned pre-trained models with about 100,000 color fundus photograph (CFP) images and screened diabetic retinopathy (DR) with high accuracy.13 Jiang et al. classified OCT images into AMD, DME, and normal based on vision transformer.14

    Regarding retinal disease classification, there have always been many interests in improving the accuracy. For the deep learning model, the performance on the two-classification problem is better than that on the multi-classification. Under this consideration, Kim and Tran transformed the retinal disease classification problem into multiple binary-classification problems.15 Experimental results show that the accuracy of classification is as high as 98.7%. Nevertheless, this costs larger memory space, higher hardware requirements, and more training time. Due to interference in the acquisition environment, OCT images contain speckle noise.16 Deep learning models trained with noisy images always show lousy performance in the accuracy of classification.17 The denoising strategy mainly includes denoising of all images indiscriminately and artificially selecting and denoising the images that need to be denoised. The former takes much time, and processing high-quality images with denoising algorithms can lead to losing some details in the image. The latter is labor-intensive and slow. Some teams have tried to increase the accuracy of classification by improving the images’ quality in the dataset. However, deleting numerous low-quality images significantly reduces the size of the dataset. Thus, we attempted another method to ameliorate the images’ quality in the dataset without sacrificing the dataset’s scale.

    In this study, we propose a time-saving denoising approach based on deep learning to improve the accuracy of retinal disease classification. We aim to remove the noise from original poor-quality retinal OCT images rather than get rid of these images. We collected and collated 114,214 retinal OCT images containing five categories (AMD, CNV, DME, drusen, and normal). With a fraction of these original retinal OCT images, we first trained a CNN to classify the images into noise reduction required (NRR) and noise-free (NF) images based on the noise level of these images. Then, this CNN was applied to the original dataset to choose the NRR images. Four pre-trained CNNs (DenseNet161, InceptionResNetV2, SE-ResNet152, and VGG19_ BN) were selected for the experiment, each achieving excellent performance in the ImageNet Large Scale Visual Recognition Challenge (ILSVRC). VGG19_BN, with 98% accuracy and 99% recall, performs best among them. After that, we used the block-matching and 3D filtering (BM3D) algorithm to remove noise from the NRR images.18,19 The denoised NRR and the NF images make up the processed dataset. Finally, to demonstrate the effectiveness of our proposed method, we trained the four CNNs with the original dataset and the processed dataset, respectively, to classify retinal OCT images according to different diseases, and compared their performance. Results show that the four CNNs are improved by an average of 2.04%, 5.19%, and 5.10% under overall accuracy (OA), Macro-F1, and Micro-F1, respectively. Remarkably, the OA of the DenseNet161 is ameliorated to 98.14%. Regarding speed, our method only takes 20min to classify more than 11k retinal OCT images into NRR and NF images, far exceeding the speed of artificial screening. Compared with the denoising of all images indiscriminately, our approach reduces the image processing time by 81.12%.

    2. Materials and Methods

    2.1. Datasets

    In this study, we focus on the classification of four retinal diseases. Thus, we have collected and collated 114,214 retinal OCT images, of which 25,380, 37,944, 13,149, 9316, and 28,425 are AMD, CNV, DME, drusen, and normal, respectively. All the images are collected from the following three open-access datasets. From Dataset 1, we collected 88,628 retinal OCT images, including 37,944 CNV images, 13,149 DME images, 9316 drusen images, and 28,219 normal images.12 From Dataset 2, we collected 55 AMD and 206 normal images.20 From the Dataset 3, we collected 25,325 AMD images from an open-access data science platform named Kaggle.

    2.2. Methods

    Previous studies have already demonstrated that removing noise from OCT images is efficient to improve the effectiveness of subsequent image analysis work.21 Due to the speckle noise caused by interference in the measurement environment, a fraction of raw OCT images tend to be of poor quality, which erects barriers to classification. One way to improve the quality of images in the dataset is to remove noise images directly. However, it reduces the scale of the dataset. Another option is removing noise from poor-quality images. Denoising approaches are commonly used to remove the speckle noise in OCT images in the post-processing stage.22,23

    Since the image quality is better, the accuracy rate of disease identification is higher.24 We also adopt the same idea and used the post-process denoising method to improve OCT image quality. There are only a few low-quality images that need denoising. In comparison, the rest and most of the retinal imaging results are highly recognizable with apparent characteristics. If all images are denoised without discrimination, it will be a massive waste of time. Moreover, an unnecessary denoising process for high-quality images will lose details. Considering the above two reasons, we trained a deep learning model to tell whether an image needs denoising. The experimental procedure is shown in Fig. 1. Our approach contains the following steps:

    Fig. 1.

    Fig. 1. Schematic description of the deep learning-based selective denoising approach.

    2.2.1. Step 1: Classifying retinal OCT images into NRR and NF images based on deep learning

    We first chose a subset of OCT images from the original dataset. For each category (AMD, CNV, DME, drusen, and normal), there is 100 NRR and 100 NF images (1000 images in total). Then, we divided these images into train, validation, and test sets in a ratio of 6:2:2. The partitioning results are shown in Table 1. Subsequently, we used the datasets to train CNNs to classify retinal OCT images into NRR and NF images. After the training was completed, we used multiple evaluation indicators commonly used in deep learning to consider the performance of different models comprehensively. Finally, we chose the best-performing model to filter out the NRR images from all 114,214 images in the original dataset.

    Table 1. The subset for training CNNs to screen out NRR images.

    DatasetsNRR imagesNF images
    Train set300300
    Validation set100100
    Test set100100

    It is noted that when determining whether an image belongs to NRR or NF, we adopted an objective evaluation metric to evaluate the magnitude of noise. This metric comes from a patch-based noise level estimation algorithm, which has already demonstrated its sound performance in terms of accuracy and stability.25

    2.2.2. Step 2: Denoising NRR images for improving the quality of images in the dataset

    With the CNNs mentioned above, we can quickly and accurately select the NRR images from numerous retinal OCT images. It eliminates the need for blind denoising of all images. In this step, we will remove noise on those selected NRR images. The BM3D algorithm is an image-denoising approach based on enhanced sparse representation in the transform domain.18 It enhances the sparsity of images by grouping similar 2D blocks in images into 3D arrays. The 3D arrays are processed using 3D transformation, shrinkage of the transform spectrum, and inverse 3D transformation. 3D estimates of images are composed of jointly filtered image blocks in groups. In this process, the details are revealed, and the features in each image block are well-preserved by attenuating the noise in images. The BM3D algorithm has been compared with other image-denoising approaches in denoising OCT images. Results show that the BM3D algorithm outperforms the nonlocal means (NLM) algorithm and the CNN perceptually-sensitive loss (CNN-SSIM) approach in terms of peak signal-to-noise ratio (PSNR) and mean-squared loss (MSE).19 It demonstrated that the quality of BM3D’s denoising results is better than that of others and the BM3D reserves the most detailed information in the image. The speckles in the images denoised by the BM3D are removed effectively, and the boundaries between different tissues are distinct and effortless to observe. Therefore, we chose the BM3D algorithm and applied it to the selected NRR images. Finally, the denoised NRR and NF images were integrated into the processed dataset. We believe steps 1 and 2 can significantly improve the quality of the images in the dataset. It paves the way for improving the models’ accuracy of retinal disease classification.

    2.2.3. Step 3: Retinal disease classification based on deep learning

    To demonstrate the advantage of the processed dataset over the original dataset, we trained CNNs with these two datasets to see which achieves better image classification results. We aim to divide retinal OCT images into five categories, including four retinal diseases and normal condition. Thus, we divided the original and processed datasets into training, validation, and test sets in the same proportions, as shown in Table 2. To comprehensively evaluate the performance of the CNNs, we selected a variety of deep learning evaluation indicators, including accuracy (AC), recall (RE), specificity (SP), precision (PR), OA, F1-score, etc. The definitions of these evaluation metrics are shown in Sec. 2.4. In addition, we plotted the accuracy curve, receiver operating characteristic (ROC) curve, confusion matrix, and Grad-CAM heat map of the model.

    Table 2. Dataset for training CNNs to classify retinal OCT images based on different diseases.

    ClassTraining setValidating setTesting set
    AMD23,9801000400
    CNV36,5441000400
    DME11,7491000400
    Drusen79161000400
    Normal27,0251000400

    2.3. Transfer learning

    Training a deep learning model from scratch takes a lot of data and time. In some cases, the data needed for training is valuable and challenging to collect. Thus, a demand exists for high-performance models pre-trained with data more easily obtained from other fields.26 Transfer learning was first proposed in machine learning and then applied to deep learning. Transfer learning is a training strategy that improves the performance of models in new tasks by transferring knowledge from related tasks that have already been learned.27 The ILSVRC is one of the most sought-after and authoritative academic competitions in the field of machine vision, which providing many high-performance CNNs in recent years.28 CNN has already demonstrated its capability in solving object detection problems in retinal images, especially its advantages on image classification problems.29 Thus, we selected four pre-trained CNNs (DenseNet161, InceptionResNetV2, SE-ResNet152, and VGG19_BN) as experimental models from the networks that have performed well in the annual ILSVRC. The specific descriptions of these models are as follows:

    DenseNet161: The Dense Convolutional Network (DenseNet) builds connections between every layer and other layers in a feed-forward fashion.30 In the L-layers DenseNet, the number of direct connections between each layer and its subsequent layer is (L+1)/2 times that of traditional CNNs. Such a densely connected structure brings the DenseNet the following advantages: Moderating the vanishing-gradient problem, promoting feature propagation, and decreasing the number of parameters.

    InceptionResNetV2: InceptionResNetV2 introduces the residual connections into Inception architecture, which speeds up the model’s training and significantly improves the single-frame recognition performance on the ILSVRC 2012 classification task.31

    SE-ResNet152: SE-ResNet152 is a representative CNN built on the residual framework, and this framework eases the training of deep networks.32 Defining the layers as learning residual functions alleviates the vanishing-gradient problem. Deep networks using the residual framework perform well on ILSVRC with low complexity. The Squeeze-and-Excitation (SE) block improves the performance of networks with the minimal additional computational cost through recalibrating channel-wise feature responses.33

    VGG19_BN: The small-size convolution kernel allows the VGG to extend the depth to 16–19 layers.34 Another advantage of the small-size convolution kernel to VGG is the significant reduction in the number of parameters. The Network based on this architecture achieved excellent results in the ImageNet Challenge 2014.

    2.4. Evaluation metrics

    Figure 2 shows the evaluation metrics used in this study. The AC, RE, SP, and PR reflect the model’s performance in one specific class. OA does not focus on the model’s performance in each category. Conversely, it reflects the model’s AC on the entire test set. F1-score, also known as balanced F Score, is the harmonic mean of PR and RE.35 Macro F1-score is the average of F1-scores for all classes, which treats each category equally and does not consider the different amounts of data in each category.36 Micro F1-score applies to each type of data imbalance situation.

    Fig. 2.

    Fig. 2. Descriptions of evaluation metrics.

    3. Results

    3.1. Experimental setup

    The hardware configuration used for this experiment is an Intel Core i5-11400F processor 2.60 GHz and NVIDIA GeForce RTX 3080. The Experimental software environment is Pytorch 1.11.0, NVIDIA CUDA Toolkit 11.3, and Windows 10 Professional. We fine-tuned the pre-trained models with the following configuration: Stochastic gradient descent (SGD) with learning rate=0.001 (multiplied by 0.1 every seven steps), momentum=0.9, Nesterov momentum=True, batch size=32, and epochs=25.

    3.2. Results of classifying retinal OCT images into NRR and NF images

    3.2.1. Verify the validity of classifying the images in the subset into NRR and NF images based on deep learning

    We calculated the noise levels of the images in the subset mentioned in Step 1 using the noise level estimation algorithm, as shown in Fig. 3. Each blue asterisk indicates an NRR image, and every orange cross indicates an NF image. The red line shows the noise level of 1. As can be seen from the figure, the noise levels of NRR images are greater than 1, while the noise levels of NF images are no larger than 1. These images were used to train the CNNs to determine whether an image is an NRR or NF image.

    Fig. 3.

    Fig. 3. Noise levels of the images in the subset: (a) Original image and (b) zoomed image.

    The results of training the CNNs for classifying the images in the subset into NRR and NF images are shown in Table 3. As shown in the table, the AC of all four CNNs exceeded 95%, which shows that transfer learning achieved good results. VGG19_BN performs better than other models in these indicators, especially with AC, RE, and F1-score reaching 98.00%, 99.00%, and 98.02%, respectively.

    Table 3. Results of classifying retinal OCT images into NRR and NF images by CNNs.

    NetworkAC (%)RE (%)SP (%)PR (%)F1-score (%)
    DenseNet16196.5097.0096.0096.0496.52
    InceptionResNetV295.5094.0097.0096.9195.43
    SE-ResNet15297.5099.0096.0096.1297.54
    VGG19_BN98.0099.0097.0097.0698.02

    Figure 4 shows the results of classifying representative retinal OCT images using the trained VGG19_BN. It can be seen that the images classified into NRR images contain plenty of noise, and it is hard to distinguish the various layers of the fundus structure. Such images are not conducive to training deep learning models, so they need to be denoised. Conversely, there is almost no noticeable noise in images classified into NF images, and the layers of tissue is noticeable. Denoising such an image will blur the boundaries between different tissues, losing some detailed information.

    Fig. 4.

    Fig. 4. Schematic diagram of the effect of classifying retinal OCT images into NRR and NF images with VGG19_BN.

    3.2.2. Apply the verified method to the original dataset

    The results of classifying the 114,214 images in the original dataset into NRR and NF images using the trained VGG19_BN are shown in Table 4. Among the 114,214 retinal OCT images, 21,558 are classified as NRR images, and 92,656 are classified as NF images. The screening results show that less than 20% of images need to be denoised. The efficiency of deep learning was impressive, and the VGG19_BN processed 200 images within five seconds with 98% AC. It only took 20 min for VGG19_BN to classify all 114,214 images. Compared to manually selecting images, our approach has significant advantages in AC and speed. Compared with the denoising of all images indiscriminately, our method saves 81.12% of the processing time. It avoids losing detailed information in high-quality images due to post-processing.

    Table 4. Results of classifying images in the original dataset by the VGG19_BN.

    Total imagesNRR imagesNF images
    114,21421,55892,656

    3.3. Denoising NRR images to improve the quality of the images in the dataset

    Figure 5(a) shows a representative raw NRR image and its denoising results by the BM3D algorithm. From the denoising results, it can be seen that speckle noise in the raw image is well suppressed. We randomly selected 50 NRR images from each type of retinal OCT image, along with their denoising images. We then calculated the noise levels of these images using the noise level estimation algorithm, as shown in Fig. 5(b). Each blue asterisk indicates the image before denoising, and every orange cross indicates the image after denoising. The horizontal line in red is the dividing line where the noise level equals 1. As shown in the figure, the noise levels of the images are prominently reduced after denoising, and they are reduced to no more than 1. It indicates that the quality of NRR images has been significantly improved. After using steps 1 and 2 to process the original dataset, we successfully obtained a dataset with higher image quality — the processed dataset. It removes the obstacles for improving the AC of retinal disease classification.

    Fig. 5.

    Fig. 5. NRR images before and after denoising by BM3D algorithm: (a) Representative NRR images and (b) noise levels of the examples for NRR images.

    3.4. Improvement in the accuracy of retinal disease classification

    To prove that our approach can ameliorate the model’s performance in retinal disease classification problems, we selected four CNNs and trained them with the original and processed datasets, respectively. The results are shown in Table 5. DenseNet161 trained with the processed dataset achieves 100% RE for ‘normal,’ which is 25.5% higher than that trained with the original dataset. Using the processed dataset trained, VGG19_BN gets 99.81% SP for DME. In terms of PR, InceptionResNetV2 trained with processed dataset performs best with a PR of 99.24% for DME. Performance enhancements are the most prominent on the SE-ResNet152, and its OA, Macro-F1 and Micro-F1 scores are improved by 2.74%, 7.24%, and 6.85%, respectively. DenseNet161 trained with the processed dataset achieves the highest Macro-F1, Micro-F1 scores and OA, which are 95.34%, 95.35%, and 98.14%, respectively.

    Table 5. Results of classifying retinal OCT images by diseases using the CNNs trained with the original or processed dataset.

    NetworkDatasetClassRE (%)SP (%)PR (%)AC (%)Macro F1-score (%)Micro F1-score (%)OA (%)
    DenseNet 161Original datasetAMD95.5093.0677.4893.5590.3790.4096.16
    CNV98.2597.1289.5297.35
    DME96.5099.6298.4799.00
    Drusen87.2599.6298.3197.15
    Normal74.5098.5692.8393.75
    Processed datasetAMD91.7598.8895.3297.4595.3495.3598.14
    CNV98.2596.9488.9197.20
    DME97.2599.7598.9899.25
    Drusen89.5099.7598.9097.70
    Normal100.0098.8895.6999.10
    Inception ResNetV2Original datasetAMD90.0099.7598.9097.8091.9891.8096.72
    CNV98.7591.8775.2493.25
    DME89.0099.8199.1697.65
    Drusen82.2599.5097.6396.05
    Normal99.0098.8195.4298.85
    Processed datasetAMD92.2598.5093.8997.2594.8294.8597.94
    CNV98.7596.5687.7897.00
    DME98.2599.8199.2499.50
    Drusen85.5099.7598.8496.90
    Normal99.5098.9495.9099.05
    SE-ResNet152Original datasetAMD96.7588.7568.2590.3587.3387.7595.10
    CNV97.5097.9492.2097.85
    DME98.0099.5698.2599.25
    Drusen91.2599.4497.5997.80
    Normal55.2599.0093.2590.25
    Processed datasetAMD90.5098.8195.0197.1594.5794.6097.84
    CNV98.2596.3186.9596.70
    DME99.2599.4497.7899.40
    Drusen86.2599.5698.0196.90
    Normal98.7599.1296.5899.05
    VGG19_BNOriginal datasetAMD96.2589.6269.8790.9588.2188.4095.36
    CNV96.7597.9492.1497.70
    DME97.2599.5698.2399.10
    Drusen90.5099.5097.8497.70
    Normal61.2598.8893.1691.35
    Processed datasetAMD89.2598.1992.4996.4093.9393.9597.58
    CNV99.0096.1986.6596.75
    DME97.2599.8199.2399.30
    Drusen85.5099.6298.2896.80
    Normal98.7598.6294.7298.65

    Figure 6 shows the ROC curves and the area under the ROC curve (AUC) values for four CNNs trained with the original and processed dataset, respectively. When different thresholds are selected, the ROC curve reflects the model’s sensitivity and accuracy trend. The micro-average and macro-average ROC curves reflect the changing trend of model sensitivity from the perspective of each category and the whole, respectively. AUC is the area under the ROC curve enclosed by the coordinate axis. DenseNet161, trained with the processed dataset, performs best with 99.54% micro-average AUC value and 99.67% macro-average AUC value. Figure 7 shows the accuracy curves for four CNNs trained with the processed dataset. The blue lines represent the training set, and the yellow lines represent the validation set. Each training session includes 25 epochs. Considering the performance of each model under these evaluation indicators, we believe that the DenseNet161 performs best among the four CNNs.

    Fig. 6.

    Fig. 6. ROC curves and the AUC values for (a) DenseNet161, (b) InceptionResNetV2, (c) SE-ResNet152, and (d) VGG19_BN.

    Fig. 7.

    Fig. 7. Accuracy curves for (a) DenseNet161, (b) InceptionResNetV2, (c) SE-ResNet152, and (d) VGG19_BN.

    Figure 8 shows the confusion matrices of DenseNet161. The test set contains 2000 retinal OCT images, 400 images for each category. As can be seen from the figure, DenseNet161 trained using the original dataset mistakenly classify 100 normal images into AMD, which is the immediate cause for the low RE for normal in Table 5. SE-ResNet152 and VGG19_BN have the same problem. More concretely, SE-ResNet152 trained with the original dataset erroneously classify 178 normal images into AMD, and VGG19_BN trained with the original dataset wrongly classify 154 normal images into AMD. However, CNNs trained with the processed dataset all avoid this problem. Considering that the quantity of AMD and normal images in the training dataset are close, we can’t blame it on imbalanced datasets. We believe that the main cause is the low quality of the original AMD images. It erects barriers for CNNs to learn and distinguish the images. However, the images’ qualities are improved after being processed by our approach, and the CNNs trained with the processed dataset don’t commit the same fault. It shows that the sensitivity of the CNNs is improved, which is of great significance in disease classification.

    Fig. 8.

    Fig. 8. Confusion matrices for DenseNet161 trained using the original or processed dataset.

    Gradient-weighted class activation mapping (Grad-CAM) produces ‘visual explanations’ for decisions from CNN-based models.37 A coarse localization map highlighting the crucial regions related to the target concepts in the input image is achieved through analysis of the final convolutional layer in the model. Figure 9 shows the Grad-CAMs for DenseNet161. Different colors in the figure represent regions with different weights, and the weights decrease from red to blue. As can be seen from the figure, the DenseNet161 pays the most attention to the prominent lesion area, which is consistent with clinicians’ judgment in the actual diagnosis process.

    Fig. 9.

    Fig. 9. Grad-CAMs for DenseNet161 showing (a) AMD, (b) CNV, (c) DME, (d) Drusen, and (e) Normal.

    4. Discussion

    The quality of images in the training dataset is a crucial factor for the deep learning model’s performance in image classification. Low-quality images erect obstacles to the training of models. A portion of OCT images containing plenty of speckle noise is of poor quality due to interference in the acquisition environment. Ameliorating images’ quality has been proven to be valid for increasing image classification AC. Thus, we proposed a deep learning-based selective denoising approach to improve the quality of images in the dataset. We processed the original dataset using our method and got a dataset consisting of high-quality images (the processed dataset). By randomly selecting 250 NRR images and comparing their noise levels before and after denoising, we find that the noise levels of NRR images are reduced by 92.13% in average. Consequently, the performance of the CNNs trained with the processed dataset is improved by an average of 2.04% under OA.

    Deep learning has received much attention in image denoising.38 In some specific application scenarios, denoising algorithms based on deep learning have outperformed traditional nonAI denoising algorithms. Furthermore, we will try to denoise retina OCT images using deep learning. We also consider augmenting the dataset for training deep learning models to screen out NRR images to improve the classification’s accuracy.

    5. Conclusions

    In this study, a time-saving selective denoising approach based on deep learning is proposed to improve the accuracy of retinal disease classification by removing the noise from the poor-quality retinal OCT images. Our approach includes three steps: (1) Train a deep learning model to quickly and accurately screen out low-quality images containing much noise from large-scale original retinal OCT image dataset; (2) Use the BM3D algorithm to denoise these noise-containing images to improve the quality of images in the dataset; (3) Use the processed dataset to train models to classify retinal OCT images according to different diseases. We introduced a patch-based noise level estimation algorithm to determine the magnitude of noise in retinal OCT images and whether denoising is required. We divided the datasets for training CNNs to screen out NRR images based on this. DenseNet161, InceptionResNetV2, SE-ResNet152, and VGG19_BN were selected as experimental networks from CNNs participating in ILSVRC over the years. Results show that the performance of the CNNs trained with the processed dataset is improved by an average of 2.04%, 5.19%, and 5.10% under OA, Macro-F1, and Micro-F1 scores, respectively, by ameliorating the images’ quality in the dataset. In addition, our approach improves the sensitivity of CNNs. Among these models, the DenseNet161 trained with the processed dataset performed best with 98.14% OA, 95.34% Macro F1-score, and 95.35% Micro F1-score. It suggests that our approach efficiently ameliorates deep learning models’ performance in the classification of retinal diseases. Our approach also has prominent advantages in terms of speed. Compared with denoising all images indiscriminately, our method saves 81.12% of the processing time.

    Acknowledgment

    Lantian Hu and Ruixiang Guo contributed equally to this work and are co-first authors. This work is supported by Major Science and Technology Project of Hainan Province, ZDKJ202006.

    Conflicts of Interest

    The authors declare no competing interests.

    References

    • 1. D. Huang, E. A. Swanson, C. P. Lin, J. S. Schuman, W. G. Stinson, W. Chang, M. R. Hee, T. Flotte, K. Gregory, C. A. Puliafito , “Optical coherence tomography,” Science 254, 1178–1181 (1991). Crossref, Web of ScienceGoogle Scholar
    • 2. T. C. Chen, A. Hoguet, A. K. Junk, K. Nouri-Mahdavi, S. Radhakrishnan, H. L. Takusagawa, P. P. Chen , “Spectral-domain OCT: Helping the clinician diagnose glaucoma: A report by the American Academy of Ophthalmology,” Ophthalmology 125, 1817–1827 (2018). Crossref, Web of ScienceGoogle Scholar
    • 3. A. Pujari, K. Bhaskaran, P. Sharma, P. Singh, S. Phuljhele, R. Saxena, S. V. Azad , “Optical coherence tomography angiography in neuro-ophthalmology: Current clinical role and future perspectives,” Surv. Ophthalmol. 66, 471–481 (2021). Crossref, Web of ScienceGoogle Scholar
    • 4. H. Li, K. Liu, L. Yao, X. Deng, Z. Zhang, P. Li , “ID-OCTA: OCT angiography based on inverse SNR and decorrelation features,” J. Innov. Opt. Health Sci. 14, 2130001 (2021). Link, Web of ScienceGoogle Scholar
    • 5. U. Schmidt-Erfurth, A. Sadeghipour, B. S. Gerendas, S. M. Waldstein, H. Bogunović , “Artificial intelligence in retina,” Prog. Retin. Eye Res. 67, 1–29 (2018). Crossref, Web of ScienceGoogle Scholar
    • 6. R. Kapoor, B. T. Whigham, L. A. Al-Aswad , “Artificial intelligence and optical coherence tomography imaging,” Asia-Pacific J. Ophthalmol. 8, 187–194 (2019). Web of ScienceGoogle Scholar
    • 7. D. S. W. Ting, L. R. Pasquale, L. Peng, J. P. Campbell, A. Y. Lee, R. Raman, G. S. W. Tan, L. Schmetterer, P. A. Keane, T. Y. Wong , “Artificial intelligence and deep learning in ophthalmology,” Br. J. Ophthalmol. 103, 167–175 (2019). Crossref, Web of ScienceGoogle Scholar
    • 8. P. Ongsulee , Artificial intelligence, machine learning and deep learning, 2017 15th Int. Conf. ICT and Knowledge Engineering (ICT&KE), pp. 1–6, IEEE (2017). CrossrefGoogle Scholar
    • 9. J. Liu, S. Yan, N. Lu, D. Yang, C. Fan, H. Lv, S. Wang, X. Zhu, Y. Zhao, Y. Wang , “Automatic segmentation of foveal avascular zone based on adaptive watershed algorithm in retinal optical coherence tomography angiography images,” J. Innov. Opt. Health Sci. 15, 2242001 (2022). Link, Web of ScienceGoogle Scholar
    • 10. G. Zheng, Y. Jiang, C. Shi, H. Miao, X. Yu, Y. Wang, S. Chen, Z. Lin, W. Wang, F. Lu , “Deep learning algorithms to segment and quantify the choroidal thickness and vasculature in swept-source optical coherence tomography images,” J. Innov. Opt. Health Sci. 14, 2140002 (2021). Link, Web of ScienceGoogle Scholar
    • 11. P. Prahs, V. Radeck, C. Mayer, Y. Cvetkov, N. Cvetkova, H. Helbig, D. Märker , “OCT-based deep learning algorithm for the evaluation of treatment indication with anti-vascular endothelial growth factor medications,” Graefe’s Arch. Clin. Exp. Ophthalmol. 256, 91–98 (2018). Crossref, Web of ScienceGoogle Scholar
    • 12. D. S. Kermany, M. Goldbaum, W. Cai, C. C. Valentim, H. Liang, S. L. Baxter, A. McKeown, G. Yang, X. Wu, F. Yan , “Identifying medical diagnoses and treatable diseases by image-based deep learning,” Cell 172, 1122–1131. e1129 (2018). Crossref, Web of ScienceGoogle Scholar
    • 13. V. Gulshan, L. Peng, M. Coram, M. C. Stumpe, D. Wu, A. Narayanaswamy, S. Venugopalan, K. Widner, T. Madams, J. Cuadros , “Development and validation of a deep learning algorithm for detection of diabetic retinopathy in retinal fundus photographs,” Jama 316, 2402–2410 (2016). Crossref, Web of ScienceGoogle Scholar
    • 14. Z. Jiang, L. Wang, Q. Wu, Y. Shao, M. Shen, W. Jiang, C. Dai , “Computer-aided diagnosis of retinopathy based on vision transformer,” J. Innov. Opt. Health Sci. 15, 2250009 (2022). Link, Web of ScienceGoogle Scholar
    • 15. J. Kim, L. Tran , Retinal disease classification from OCT images using deep learning algorithms, 2021 IEEE Conf. Computational Intelligence in Bioinformatics and Computational Biology (CIBCB), pp. 1–6, IEEE (2021). CrossrefGoogle Scholar
    • 16. Z. Chen, Z. Zeng, H. Shen, X. Zheng, P. Dai, P. Ouyang , “DN-GAN: Denoising generative adversarial networks for speckle noise reduction in optical coherence tomography images,” Biomed. Signal Process. Control 55, 101632 (2020). Crossref, Web of ScienceGoogle Scholar
    • 17. M. Koziarski, B. Cyganek , “Image recognition with deep neural networks in presence of noise–dealing with and taking advantage of distortions,” Integr. Comput.-Aided Eng. 24, 337–349 (2017). Crossref, Web of ScienceGoogle Scholar
    • 18. K. Dabov, A. Foi, V. Katkovnik, K. Egiazarian , “Image denoising by sparse 3-D transform-domain collaborative filtering,” IEEE Trans. Image Process. 16, 2080–2095 (2007). Crossref, Web of ScienceGoogle Scholar
    • 19. B. Qiu, Z. Huang, X. Liu, X. Meng, Y. You, G. Liu, K. Yang, A. Maier, Q. Ren, Y. Lu , “Noise reduction in optical coherence tomography images using a deep neural network with perceptually-sensitive loss function,” Biomed. Opt. Exp. 11, 817–830 (2020). Crossref, Web of ScienceGoogle Scholar
    • 20. P. Gholami, P. Roy, M. K. Parthasarathy, V. Lakshminarayanan , “OCTID: Optical coherence tomography image database,” Comput. Electr. Eng. 81, 106532 (2020). Crossref, Web of ScienceGoogle Scholar
    • 21. M. Li, R. Idoughi, B. Choudhury, W. Heidrich , “Statistical model for OCT image denoising,” Biomed. Opt. Exp. 8, 3903–3917 (2017). Crossref, Web of ScienceGoogle Scholar
    • 22. B. Baumann, C. W. Merkle, R. A. Leitgeb, M. Augustin, A. Wartak, M. Pircher, C. K. Hitzenberger , “Signal averaging improves signal-to-noise in OCT images: But which approach works best, and when?,” Biomed. Opt. Exp. 10, 5755–5775 (2019). Crossref, Web of ScienceGoogle Scholar
    • 23. A. Zhang, J. Xi, J. Sun, X. Li , “Pixel-based speckle adjustment for noise reduction in Fourier-domain OCT images,” Biomed. Opt. Exp. 8, 1721–1730 (2017). Crossref, Web of ScienceGoogle Scholar
    • 24. J. Wang, G. Deng, W. Li, Y. Chen, F. Gao, H. Liu, Y. He, G. Shi , “Deep learning for quality assessment of retinal OCT images,” Biomed. Opt. Exp. 10, 6057–6072 (2019). Crossref, Web of ScienceGoogle Scholar
    • 25. X. Liu, M. Tanaka, M. Okutomi , “Single-image noise level estimation for blind denoising,” IEEE Trans. Image Process. 22, 5226–5237 (2013). Crossref, Web of ScienceGoogle Scholar
    • 26. K. Weiss, T. M. Khoshgoftaar, D. Wang , “A survey of transfer learning,” J. Big Data 3, 1–40 (2016). CrossrefGoogle Scholar
    • 27. L. Torrey, J. Shavlik , Handbook of Research on Machine Learning Applications and Trends: Algorithms, Methods, and Techniques, Information Science Reference, Hershey, PA (2010). Google Scholar
    • 28. O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. Bernstein , “Imagenet large scale visual recognition challenge,” Int. J. Comput. Vis. 115, 211–252 (2015). Crossref, Web of ScienceGoogle Scholar
    • 29. D. Maji, A. Santara, P. Mitra, D. Sheet, “Ensemble of deep convolutional neural networks for learning to detect retinal vessels in fundus images,” arXiv:1603.04833 (2016). Google Scholar
    • 30. G. Huang, Z. Liu, L. Van Der Maaten, K. Q. Weinberger , Densely connected convolutional networks, Proc. IEEE Conf. Computer Vision and Pattern Recognition, pp. 4700–4708 (IEEE, Honolulu, HI, USA, 2017). CrossrefGoogle Scholar
    • 31. C. Szegedy, S. Ioffe, V. Vanhoucke, A. A. Alemi , Inception-v4, inception-resnet and the impact of residual connections on learning, Thirty-first AAAI Conf. Artificial Intelligence (AAAI Press, San Francisco, California, USA, 2017), pp. 4278–4284. CrossrefGoogle Scholar
    • 32. K. He, X. Zhang, S. Ren, J. Sun , Deep residual learning for image recognition, Proc. IEEE Conf. Computer Vision and Pattern Recognition, pp. 770–778 (IEEE, Las Vegas, NV, USA, 2016). CrossrefGoogle Scholar
    • 33. J. Hu, L. Shen, G. Sun , Squeeze-and-excitation networks, Pro. IEEE Conf. Computer Vision and Pattern Recognition, pp. 7132–7141 (IEEE, Salt Lake City, UT, USA, 2018). CrossrefGoogle Scholar
    • 34. K. Simonyan, A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” preprint, arXiv:1409.1556 (2014). Google Scholar
    • 35. N. Chinchor , MUC-4 evaluation metrics, Proc. Fourth Message Understanding Conf., pp. 22–29. Morgan Kaufmann (Association for Computational Linguistics, McLean, Virginia, USA, 1992). CrossrefGoogle Scholar
    • 36. D. D. Lewis, R. E. Schapire, J. P. Callan, R. Papka , Training algorithms for linear text classifiers, Proc. 19th Annual Int. ACM SIGIR Conf. Research and Development in Information Retrieval, pp. 298–306 (Association for Computing Machinery, New York, NY, United States, 1996). CrossrefGoogle Scholar
    • 37. R. R. Selvaraju, M. Cogswell, A. Das, R. Vedantam, D. Parikh, D. Batra , Grad-cam: Visual explanations from deep networks via gradient-based localization, Proc. IEEE Int. Conf. Computer Vision, pp. 618–626 (IEEE, Venice, Italy, 2017). CrossrefGoogle Scholar
    • 38. C. Tian, L. Fei, W. Zheng, Y. Xu, W. Zuo, C.-W. Lin , “Deep learning on image denoising: An overview,” Neural Netw. 131, 251–275 (2020). Crossref, Web of ScienceGoogle Scholar