Age-related Macular Degeneration (AMD) and Diabetic Macular Edema (DME) are two common retinal diseases for elder people that may ultimately cause irreversible blindness. Timely and accurate diagnosis is essential for the treatment of these diseases. In recent years, computer-aided diagnosis (CAD) has been deeply investigated and effectively used for rapid and early diagnosis. In this paper, we proposed a method of CAD using vision transformer to analyze optical coherence tomography (OCT) images and to automatically discriminate AMD, DME, and normal eyes. A classification accuracy of 99.69% was achieved. After the model pruning, the recognition time reached 0.010 s and the classification accuracy did not drop. Compared with the Convolutional Neural Network (CNN) image classification models (VGG16, Resnet50, Densenet121, and EfficientNet), vision transformer after pruning exhibited better recognition ability. Results show that vision transformer is an improved alternative to diagnose retinal diseases more accurately.

Keywords:

1. Introduction

Age-related macular degeneration (AMD) and diabetic macular edema (DME) are two common retinal diseases that occur among the elders. AMD mostly emerges in people over 45years old and the prevalence of AMD disease increases with age gradually.^1,2,3 DME, which is the direct cause of vision impairment in most diabetic patients, has a high incidence all over the world.^4,5 Timely and accurate diagnosis of AMD and DME is an important premise for effective treatment in lieu of irreversible blindness (to prevent irreversible blindness). The noninvasive imaging through the three-dimensional cellular cytolytic tissue, synchronous medical imaging technology, can improve early medical diagnosis and contribute to the development of novel therapies. Optical coherence tomography (OCT) is a noninvasive, high-resolution optical medical diagnostic imaging technique, which enables in vivo cross-sectional tomographic visualization of the internal microstructure of eyes. In recent years, it has become a golden standard for eye disease diagnosis.^{6,7,8,9,10,11,12}

Diagnosing retinal diseases using OCT images is currently the most effective CAD of retinopathy. In recent years, deep learning has developed rapidly, and it has provided many solutions in the field of medical diagnosis. Especially, Convolutional Neural Network (CNN) has achieved great success in medical image classification. A large number of researchers have made great achievements in the field of using CNN to classify OCT images for disease diagnosis. Shih et al. proposed to classify four types of OCT images using the VGG16 model, which involved three retinal diseases and normal retina. The accuracy of the model on the data test set reached 99%. The results demonstrate the effectiveness of the deep learning algorithm in the diagnosis of retinal diseases.¹³ Zhang Quan et al. proposed a multi-scale deep learning model using OCT images to identify diabetic edema macula, and the model achieved an accuracy rate of 94.5%. By comparison with other models, the author proved that this model provided a better ability to recognize low-quality medical images.¹⁴ Saratxaga Cristina et al. used deep learning algorithm to classify OCT images of mouse colons, and the recognition accuracy of their model reached 96.65%, which effectively proved the excellent performance of deep learning in computer-aided diagnosis (CAD) of colonic lesions.¹⁵ Luo Yuemei used OCT images combined with deep learning to assist in the diagnosis of sebaceous glands. The classification accuracy of the model proposed in this paper reached 97.9%, which is of great help to medical personnel in the diagnosis and treatment of serum problems.¹⁶ Potapenko Ivan et al. used noisy OCT image data sets for deep learning training to complete the identification task of AMD disease. The recognition accuracy of the model reached 90.9%, which proved that the model could assist medical personnel in the clinical diagnosis of AMD.¹⁷

It should be noticed that most of the above-mentioned studies are based on CNN. CNN has the advantage of being able to extract image features very well, which has been verified by a large number of scholars. However, there are still little researches on vision transformer CAD. Vision transformer is a new image classification model that has been proposed in 2020.¹⁸ It does not rely on any CNN, but it is completely performed based on the transformer structure, which has a different feature extraction method from CNN. Yakoub Bazi et al. have proved that vision transformer has better classification capabilities than CNN to solve image classification problems. They applied vision transformer to remote sensing image classification and tested multiple remote sensing image data sets. Experimental results showed that the vision transformer classification accuracy rate of remote sensing images exceeds the results based on the CNN.¹⁹

In this paper, we integrated vision transformer and OCT images to improve the diagnosis of retinal diseases. Aiming at the two common retinal diseases of AMD and DME, we collected OCT images of AMD and DME diseases, as well as images of normal ocular fundus. We then trained the vision transformer to classify these three types of OCT images in order to realize the diagnosis of retinal diseases. In Sec. 2, we introduced the dataset and vision transformer used in this paper. In Sec. 3, we analyzed the experimental results of this paper, discussed the impact of model pruning, and compared the results with that using the CNN. The full study was summarized in Sec. 4.

2. Materials and Methods

2.1. Dataset

The dataset used in this paper comes from the OCT fundus images of 15 normal people, 15 AMD patients, and 15 DME patients collected by Duke University.²⁰ The dataset includes 1407 OCT images of normal retinal, 723 OCT images of AMD, and 1101 OCT images of DME. Examples of these three types of OCT images are shown in Fig. 1.

As shown in Fig. 1(a), normal retina OCT images provide the inner retinal layers including the nerve fiber layer (NFL), ganglion cell layer (GCL), inner plexiform layer (IPL), inner nuclear layer (INL), and the outer layers including the outer plexiform layer (OPL), outer nuclear layer (ONL), myoid and ellipsoid zone (MEZ), the outer segment of photoreceptors (OS) and the retinal pigment epithelium (RPE). OCT images of AMD fundus shown in Fig. 1(b) present a large number of drusen/choroidal neovascularization (CNV) formations, which results in submacular hemorrhage and leakage. As we can see from Fig. 1(c), OCT images of DME present with retinal edema, hemorrhage, cystic macular edema, and subretinal fluid. OCT fundus images of AMD shown in Fig. 1(b) present a large number of larger drusen/CNV formations, which results in submacular blood and protein leakage. We can see from Fig. 1(c) that the OCT fundus images of DME are present with retinal swelling, hemorrhage, cystic macular edema, and subretinal fluid.

2.2. Data preprocessing

It is necessary to preprocess the images before inputting the images to model for recognition. Image resizing and normalization are the main data processing steps. In this study, the images’ size was adjusted to 224 × 224, and then the outputs of them were normalized.^21,22,23

2.3. Vision transformer

Vision transformer is completely implemented based on the transformer structure, which is widely used in the field of Natural Language Processing (NLP). The transformer structure consists of a set of encoder components and a group of decoder components. Vision transformer is an image classification model and does not require decoder components, so there is only an encoder component in the vision transformer.

The encoder component is composed of a stack of six identical encoders. Each encoder consists of a multi-head attention layer and a feed-forward layer. Both layers contain residual connection and LayerNorm. MLP is a multilayer perceptron. The structure of an encoder component is shown in Fig. 2.

Fig. 2. The structure of an encoder component.

The multi-head attention in the encoder component structure is a self-attention structure, which allows the model to focus on different aspects of information. Scaled dot-product attention is the attention score calculation method of multi-head attention. The structure of the multi-head attention and the structure of scaled dot-product attention are shown in Fig. 3.

Fig. 3. The structure of the multi-head attention and the structure of scaled dot-product attention.

Multi-head attention is shown in the following equations :

Q i = QW Q i, K i = KW K i, V i = VW V i, i = 1, \dots, 8, <math display="block" altimg="eq-00002.gif"><msub><mrow><mi>Q</mi></mrow><mrow><mstyle><mtext mathvariant="italic">i</mtext></mstyle></mrow></msub><mo>=</mo><msubsup><mrow><mstyle><mtext mathvariant="normal">QW</mtext></mstyle></mrow><mrow><mi>i</mi></mrow><mrow><mi>Q</mi></mrow></msubsup><mo>,</mo><mspace width="1em"></mspace><msub><mrow><mi>K</mi></mrow><mrow><mi>i</mi></mrow></msub><mo>=</mo><msubsup><mrow><mstyle><mtext mathvariant="normal">KW</mtext></mstyle></mrow><mrow><mi>i</mi></mrow><mrow><mi>K</mi></mrow></msubsup><mo>,</mo><msub><mrow><mi>V</mi></mrow><mrow><mi>i</mi></mrow></msub><mo>=</mo><msubsup><mrow><mstyle><mtext mathvariant="normal">VW</mtext></mstyle></mrow><mrow><mi>i</mi></mrow><mrow><mi>V</mi></mrow></msubsup><mo>,</mo><mspace width="1em"></mspace><mi>i</mi><mo>=</mo><mn>1</mn><mo>,</mo><mo>\dots</mo><mo>,</mo><mn>8</mn><mo>,</mo></math> (1)

head i = Attention (Q i, K i, V i), i = 1, \dots, 8, <math display="block" altimg="eq-00003.gif"><msub><mrow><mstyle><mtext mathvariant="normal">head</mtext></mstyle></mrow><mrow><mi>i</mi></mrow></msub><mo>=</mo><mstyle><mtext mathvariant="normal">Attention</mtext></mstyle><mo stretchy="false">(</mo><msub><mrow><mi>Q</mi></mrow><mrow><mi>i</mi></mrow></msub><mo>,</mo><msub><mrow><mi>K</mi></mrow><mrow><mi>i</mi></mrow></msub><mo>,</mo><msub><mrow><mi>V</mi></mrow><mrow><mi>i</mi></mrow></msub><mo stretchy="false">)</mo><mo>,</mo><mspace width="1em"></mspace><mi>i</mi><mo>=</mo><mn>1</mn><mo>,</mo><mo>\dots</mo><mo>,</mo><mn>8</mn><mo>,</mo></math> (2)

MultiHead (Q, K, V) = Concact (head 1, \dots, head 8) W o, <math display="block" altimg="eq-00004.gif"><mstyle><mtext mathvariant="normal">MultiHead</mtext></mstyle><mo stretchy="false">(</mo><mi>Q</mi><mo>,</mo><mi>K</mi><mo>,</mo><mi>V</mi><mo stretchy="false">)</mo><mo>=</mo><mstyle><mtext mathvariant="normal">Concact</mtext></mstyle><mo stretchy="false">(</mo><msub><mrow><mstyle><mtext mathvariant="normal">head</mtext></mstyle></mrow><mrow><mn>1</mn></mrow></msub><mo>,</mo><mo>\dots</mo><mo>,</mo><msub><mrow><mstyle><mtext mathvariant="normal">head</mtext></mstyle></mrow><mrow><mn>8</mn></mrow></msub><mo stretchy="false">)</mo><msup><mrow><mi>W</mi></mrow><mrow><mi>o</mi></mrow></msup><mo>,</mo></math> (3)

where Q represents a query vector, K represents a key vector, V represents a value vector, and W represents a weight matrix.

Linear embedding layer is a crucial structure in vision transformer. The images are divided into multiple patches after being input into vision transformer, and each patch is implemented for patch embedding processing. Embedding is a spatial mapping method commonly used in NLP, which maps high-dimensional vectors to low-dimensional spaces. Patch embedding is the embedding operation of a patch, and each patch is flattened into a one-dimensional tensor. After the patch embedding operation, positional embedding and class embedding are added to the transformer encoder. After being output by the transformer encoder, it will go through an MLP head structure, which is composed of a fully connected layer and a Gaussian Error Linear Unit (GELU) activation function. The structure of the MLP is shown in Fig. 4.

The equation of the GELU activation function is

GELU(x)=0.5x(1+tanh(√2π(x+0.044715x3))).<math display="block" altimg="eq-00005.gif"><mstyle><mtext mathvariant="normal">GELU</mtext></mstyle><mo stretchy="false">(</mo><mi>x</mi><mo stretchy="false">)</mo><mo>=</mo><mn>0</mn><mo>.</mo><mn>5</mn><mi>x</mi><mfenced separators="" open="(" close=")"><mrow><mn>1</mn><mo>+</mo><mo>tanh</mo><mfenced separators="" open="(" close=")"><mrow><msqrt><mrow><mfrac><mrow><mn>2</mn></mrow><mrow><mi>π</mi></mrow></mfrac></mrow></msqrt><mo stretchy="false">(</mo><mi>x</mi><mo>+</mo><mn>0</mn><mo>.</mo><mn>0</mn><mn>4</mn><mn>4</mn><mn>7</mn><mn>1</mn><mn>5</mn><msup><mrow><mi>x</mi></mrow><mrow><mn>3</mn></mrow></msup><mo stretchy="false">)</mo></mrow></mfenced></mrow></mfenced><mo>.</mo></math>(4)

After the output of the MLP head structure and the output of a layer of full connection layer, the model carries out the classification task. In this study, three types of data were classified, so the full connection layer output of vision transformer was changed to three categories. The structure of vision transformer is shown in Fig. 5.

Fig. 5. The structure of vision transformer.

2.4. Symmetric cross-entropy loss function

The loss function used in this paper is the asymmetric cross-entropy loss function. The symmetrical cross-entropy loss function can reduce the impact of noise in the data set on training and can prevent overfitting.²⁴ The definition equation of the symmetric cross-entropy loss function is

l sce = l ce + l rce, <math display="block" altimg="eq-00006.gif"><msub><mrow><mi>l</mi></mrow><mrow><mstyle><mtext mathvariant="normal">sce</mtext></mstyle></mrow></msub><mo>=</mo><msub><mrow><mi>l</mi></mrow><mrow><mstyle><mtext mathvariant="normal">ce</mtext></mstyle></mrow></msub><mo>+</mo><msub><mrow><mi>l</mi></mrow><mrow><mstyle><mtext mathvariant="normal">rce</mtext></mstyle></mrow></msub><mo>,</mo></math> (5)

where

$l_{ce}$ is the cross-entropy loss function and

$l_{rce}$ is the reverse cross-entropy function. The definitions of

$l_{ce}$ and

$l_{rce}$ are, respectively,

lce=1N∑i−[yi⋅log(pi)+(1−yi)⋅log(1−pi)],<math display="block" altimg="eq-00011.gif"><msub><mrow><mi>l</mi></mrow><mrow><mstyle><mtext mathvariant="normal">ce</mtext></mstyle></mrow></msub><mo>=</mo><mfrac><mrow><mn>1</mn></mrow><mrow><mi>N</mi></mrow></mfrac><munder><mrow><mo>∑</mo></mrow><mrow><mstyle><mtext mathvariant="italic">i</mtext></mstyle></mrow></munder><mo>−</mo><mo stretchy="false">[</mo><msub><mrow><mi>y</mi></mrow><mrow><mi>i</mi></mrow></msub><mo>⋅</mo><mo>log</mo><mo stretchy="false">(</mo><msub><mrow><mi>p</mi></mrow><mrow><mi>i</mi></mrow></msub><mo stretchy="false">)</mo><mo>+</mo><mo stretchy="false">(</mo><mn>1</mn><mo>−</mo><msub><mrow><mi>y</mi></mrow><mrow><mi>i</mi></mrow></msub><mo stretchy="false">)</mo><mo>⋅</mo><mo>log</mo><mo stretchy="false">(</mo><mn>1</mn><mo>−</mo><msub><mrow><mi>p</mi></mrow><mrow><mi>i</mi></mrow></msub><mo stretchy="false">)</mo><mo stretchy="false">]</mo><mo>,</mo></math>(6)

l rce = - K \sum k = 1 p (k | x) log q (k | x) . <math display="block" altimg="eq-00012.gif"><msub><mrow><mi>l</mi></mrow><mrow><mstyle><mtext mathvariant="normal">rce</mtext></mstyle></mrow></msub><mo>=</mo><mo>-</mo><munderover accentunder="true" accent="true"><mrow><mo>\sum</mo></mrow><mrow><mi>k</mi><mo>=</mo><mn>1</mn></mrow><mrow><mi>K</mi></mrow></munderover><mi>p</mi><mo stretchy="false">(</mo><mi>k</mi><mfenced separators="" open="|" close=""><mrow><mi>x</mi></mrow></mfenced><mo stretchy="false">)</mo><mo>log</mo><mi>q</mi><mo stretchy="false">(</mo><mi>k</mi><mfenced separators="" open="|" close=""><mrow><mi>x</mi></mrow></mfenced><mo stretchy="false">)</mo><mo>.</mo></math> (7)

2.5. Experimental environment

The experimental environment of this paper is as follows: The hardware environment is Intel Core i7-9700f processor, NVIDIA RTX2060s 8GB graphics card and 16 GB memory; the software environment is Win10 system, Python 3.7, and the deep learning framework used is PyTorch.²⁵

2.6. Evaluation standard

In this paper, accuracy was adopted as the standard to evaluate the performance of the model. The definition of accuracy is shown in Eq. (8). In addition, we also analyzed the recognition ability of each type of OCT image by using the confusion matrix, which is the comparison matrix between the predicted results and the real results,

ACC=(TP+TN)ALL,<math display="block" altimg="eq-00014.gif"><mstyle><mtext mathvariant="normal">ACC</mtext></mstyle><mo>=</mo><mfrac><mrow><mo stretchy="false">(</mo><mstyle><mtext mathvariant="normal">TP</mtext></mstyle><mo>+</mo><mstyle><mtext mathvariant="normal">TN</mtext></mstyle><mo stretchy="false">)</mo></mrow><mrow><mstyle><mtext mathvariant="normal">ALL</mtext></mstyle></mrow></mfrac><mo>,</mo></math>(8)

where ACC represents the ratio of all true predictions of positive, TP represents true positive, TN represents true negative, and ALL represents all predicted samples.

2.7. Experimental procedure and hyperparameter settings

The experimental process and parameter settings of this paper were as follows:

(1)	The data set was divided into training set, validation set, and test set in an 8:1:1 ratio. The training set was used for training the model, the validation set was used for verifying whether the model had been fitted, and the test set was used for testing the classification ability of the model.
(2)	The data of the model were preprocessed.
(3)	We used the Ranger optimizer as the optimizer; the learning rate was set as 0.00003, which is the commonly used learning rate for the training classification model; we trained the model 100 times, which is sufficient for the model with small data set.
(4)	The parameters of the model were saved with the highest accuracy of the model on the validation set.
(5)	The test was performed on the test set and the accuracy of the test set was obtained.

3. Results and Discussion

3.1. Experimental results

The experiment was performed according to the experimental process described in Sec. 2.7, and the error changes of the training set and the accuracy changes of the validation set were recorded, which were used as the basis for analyzing the fitting ability of the model. The error changes of the training set and the accuracy changes of the validation set were shown in Fig. 6.

Fig. 6. (a) The error changes of the training set; (b) the accuracy changes of the validation set.

It can be seen that the error on the training set dropped very quickly and tends to be stable after the 40th epoch. At the same time, the accuracy rate on the validation set rises very fast and tends to be stable after the 40th epoch. As shown by the error change on the training set and the accuracy change on the validation set, it can be determined that the vision transformer fits normally without overfitting.

The accuracy of the model on the test set is 99.69% and the confusion matrix of the test set is shown in Fig. 7.

Meanwhile, we can conclude that the vision transformer’s classification of OCT fundus images for AMD and DME is completely accurate, which indicates that there will be no missed diagnosis in practical applications. For the OCT images of normal fundus, there is one case of classification error. In the actual diagnosis, a patient will have multiple OCT images for diagnosis, so the misclassification of one normal OCT image will not affect the final diagnosis of retinopathy.

In order to verify the effectiveness of the method proposed in this paper, the model presented in this paper was compared with the traditional CNN. We chose VGG16,²⁶ Resnet50,²⁷ Densenet121,²⁸ and EfficentNet²⁹ as the comparison models. The experimental procedures and hyperparameter settings of these models were consistent with the vision transformer.

Model pruning is a method to reduce the memory demand and calculation demand of the model and improve the calculation speed of the model. By encouraging the sparsity of channels in the vision transformer, the important channels are selected and those channels with zero or small coefficients are discarded to achieve an efficient classification model.³⁰ In this paper, the model pruning is performed on the vision transformer to improve the recognition speed of a single OCT image under the condition that the recognition ability of the model does not decrease significantly.

The accuracy comparison of the test set between vision transformer, vision transformer after pruning, and four CNN models and the recognition time comparison of a single image are shown in Table 1.

**Table 1. Model comparison.**
Model	Accuracy	Time
VGG16	98.51%	0.014s
Resnet50	97.32%	0.089s
Densenet121	97.02%	0.091s
Efficientnet	34.16%	0.032s
Vision transformer	99.69%	0.017s
Vision transformer after pruning	99.69%	0.010s

As we can see from Table 1, among the CNN image classification models, VGG16 has the highest classification accuracy of 98.51%, but it is still inferior to the classification accuracy of vision transformer. With attention mechanism, vision transformer can focus on image regions that are semantically relevant to the classification target, so it can obtain higher accuracy. In terms of the recognition speed of a single image, both VGG16 and vision transformer are faster than other CNN models in the recognition speed of a single image. The recognition speed of a single image of vision transformer after pruning is the fastest; it only costs 0.010s. This speed is faster than all other models, and its recognition accuracy does not decrease, remaining at 99.69%. Considering the recognition accuracy and recognition speed, vision transformer after pruning is superior to CNN models in recognizing OCT images of fundus.

In order to further verify the validity of the model proposed in this paper, we compare the results of the model in this paper with the research results of related literature, and the comparison is shown in Table 2.

**Table 2. Comparison of classification results with previous authors.**
Model	Methods	Diseases	Accuracy
Literature²⁰	SVM	AMD DME	95.56%
Literature³¹	Dictionary-learning	AMD DME	98.38%
Literature³²	CNN	AMD DME	91.33%
Literature³³	CNN	AMD DME	96.66%
Literature³⁴	CNN	AMD DME	94.20%
Literature³⁵	CNN	DME	96%
This paper	Vision transformer	AMD DME	99.69%

Table 2 shows that Refs. 20 and 31 used machine learning algorithms and the accuracy of the model using dictionary-learning reached 98.38%, which exceeds the results of other studies using CNN. In the literature using CNN, we can see that the recognition accuracy of the model using multiscale and multipath CNN architecture is higher than that of general CNN models. The classification accuracy of the model proposed in this paper is the highest, surpassing the other researches, which proves that the recognition ability of vision transformer for OCT fundus images is stronger than CNN models and traditional machine learning algorithms.

4. Conclusions

AMD and DME are two retinal diseases that seriously harm the eyesight of the elderly. Timely diagnosis and treatment are very important for patients. In this paper, we presented a CAD method using vision transformer to classify OCT fundus images, which effectively diagnosed AMD and DME retinal diseases. The data set used in this paper came from the OCT fundus images collected by Duke University. The loss function and optimizer of the model in this paper selected the symmetric cross-entropy loss function and the Ranger optimizer. The classification accuracy of our model for AMD, DME, and normal OCT fundus images reached 99.69%. Then, we pruned the model and compared the effect of vision transformer after pruning with the CNN image classification models. We proved that vision transformer had the best recognition ability, and vision transformer after pruning had the fastest recognition speed without any decrease in recognition accuracy. The model proposed in this paper can better realize the CAD of retinal diseases.

Acknowledgments

This work was supported by the Science and Technology innovation project of Shanghai Science and Technology Commission (19441905800), the Natural National Science Foundation of China (62175156, 81827807, 8210041176, 82101177, 61675134), the Project of State Key Laboratory of Ophthalmology, Optometry and Visual Science, Wenzhou Medical University (K181002), and the Key R&D Program Projects in Zhejiang Province (2019C03045).

References

1. J. Z. Nowak, “Age-related macular degeneration (AMD): Pathogenesis and therapy,” Pharmacol. Rep. 58(3), 353 (2006). Web of Science, Google Scholar
2. K. L. Pennington, M. M. DeAngelis, “Epidemiology of age-related macular degeneration (AMD): Associations with cardiovascular disease phenotypes and lipid factors,” Eye Vis. 3(1), 1–20 (2016). Crossref, Web of Science, Google Scholar
3. K. Kaarniranta, A. Salminen, A. Haapasalo et al., “Age-related macular degeneration (AMD): Alzheimer’s disease in the eye?” J. Alzheimer’s Dis. 24(4), 615–631 (2011). Crossref, Web of Science, Google Scholar
4. N. Bhagat, R. A. Grigorian, A. Tutela et al., “Diabetic macular edema: Pathogenesis and treatment,” Surv. Ophthalmol. 54(1), 1–32 (2009). Crossref, Web of Science, Google Scholar
5. J. Ding, T. Y. Wong, “Current epidemiology of diabetic retinopathy and diabetic macular edema,” Curr. Diabetes Rep. 12(4), 346–354 (2012). Crossref, Web of Science, Google Scholar
6. Y. Lin et al., “A measurement of epidermal thickness of fingertip skin from OCT images using convolutional neural network,” J. Innov. Opt. Health Sci. 14(1), 2140005 (2021). Link, Web of Science, Google Scholar
7. H. Li et al., “ID-OCTA: OCT angiography based on inverse SNR and decorrelation features,” J. Innov. Opt. Health Sci. 14(1), 2130001 (2021). Link, Web of Science, Google Scholar
8. Y. Yu et al., “Monitoring of edema progression in permanent and transient MCAO model using SS-OCT,” J. Innov. Opt. Health Sci. 14(1), 2140006 (2021). Link, Web of Science, Google Scholar
9. N. Qin et al., “Research on optical properties of cardiovascular tissues based on OCT data,” J. Innov. Opt. Health Sci. 14(1), 2140007 (2021). Link, Web of Science, Google Scholar
10. F. Du, L. Gao, L. Li et al., “In vivo evaluation of laser-induced choroidal neovascularization in rats simultaneously using optical coherence tomography and photoacoustic microscopy,” J. Innov. Opt. Health Sci. 14(3), 2140012 (2021). Link, Web of Science, Google Scholar
11. M. R. Ibrahim, K. M. Fathalla, S. M. Youssef, “HyCAD-OCT: A hybrid computer-aided diagnosis of retinopathy by optical coherence tomography integrating machine learning and feature maps localization,” Appl. Sci. 10(14), 4716 (2020). Crossref, Web of Science, Google Scholar
12. D. S. Kermany, M. Goldbaum, W. Cai et al., “Identifying medical diagnoses and treatable diseases by image-based deep learning,” Cell 172(5), 1122–1131 (2018). Crossref, Web of Science, Google Scholar
13. F. Y. Shih, H. Patel, “Deep learning classification on optical coherence tomography retina images,” Int. J. Pattern Recognit. Artif. Intell. 34(8), 2052002 (2020). Link, Web of Science, Google Scholar
14. Q. Zhang, Z. Liu, J. Li et al., “Identifying diabetic macular edema and other retinal diseases by optical coherence tomography image and multiscale deep learning,” Diabetes Metab. Syndr. Obes. Targets Ther. 13, 4787 (2020). Crossref, Web of Science, Google Scholar
15. C. L. Saratxaga, J. Bote, J. F. Ortega-Morán et al., “Characterization of optical coherence tomography images for colon lesion differentiation under deep learning,” Appl. Sci. 11(7), 3119 (2021). Crossref, Web of Science, Google Scholar
16. Y. Luo, X. Wang, X. Yu et al., “Imaging sebaceous gland using optical coherence tomography with deep learning assisted automatic identification,” J. Biophoton. 14(6), e202100015 (2021). Crossref, Web of Science, Google Scholar
17. I. Potapenko, M. Kristensen, B. Thiesson et al., “Detection of oedema on optical coherence tomography images using deep learning model trained on noisy clinical data,” Acta Ophthalmol. (2021). Web of Science, Google Scholar
18. A. Dosovitskiy, L. Beyer, A. Kolesnikov et al., “An image is worth 16x16 words: Transformers for image recognition at scale,” arXiv:2010.11929 (2020). Google Scholar
19. Y. Bazi, L. Bashmal, M. M. A. Rahhal et al., “Vision transformers for remote sensing image classification,” Remote Sens. 13(3), 516 (2021). Crossref, Web of Science, Google Scholar
20. P. P. Srinivasan, L. A. Kim, P. S. Mettu et al., “Fully automated detection of diabetic macular edema and dry age-related macular degeneration from optical coherence tomography images,” Biomed. Opt. Exp. 5(10), 3568 (2014). Crossref, Web of Science, Google Scholar
21. Z. Jiang et al., “Recognition of rice leaf diseases and wheat leaf diseases based on multi-task deep transfer learning,” Comput. Electron. Agric. 186, 106184 (2021). Crossref, Web of Science, Google Scholar
22. J. Zhen-Cun, W. E. N. Xiao-Jing, D. Zheng-Xin et al., “Research on fire detection of improved VGG16 image recognition based on deep learning,” Fire Sci. Technol. 40(3), 375 (2021). Google Scholar
23. J. Wen-Ping, J. Zhen-Cun, “Research on early fire detection of Yolo V5 based on multiple transfer learning,” Fire Sci. Technol. 40(1), 109 (2021). Google Scholar
24. Y. Wang, X. Ma, Z. Chen et al., Symmetric cross entropy for robust learning with noisy labels, Proc. IEEE/CVF Int. Conf. Computer Vision, pp. 322–330, Seoul, Korea (2019). Crossref, Google Scholar
25. A. Paszke, S. Gross, F. Massa et al., “Pytorch: An imperative style, high-performance deep learning library,” arXiv:1912.01703 (2019). Google Scholar
26. K. Simonyan, A. Zisserman, “Very deep convolutional networks for large-scale image recognition” arXiv:1409.1556 (2014). Google Scholar
27. K. He, X. Zhang, S. Ren et al., Deep residual learning for image recognition, Proc. IEEE Conf. Computer Vision and Pattern Recognition, pp. 770–778, Las Vegas, NV, USA, 2016. Crossref, Google Scholar
28. G. Huang, Z. Liu, L. Van Der Maaten et al., Densely connected convolutional networks, Proc. IEEE Conf. Computer Vision and Pattern Recognition, pp. 4700–4708, Honolulu, HI, USA, 2017. Crossref, Google Scholar
29. M. Tan, Q. Le, Efficientnet: Rethinking model scaling for convolutional neural networks, Int. Conf. Machine Learning, pp. 6105–6114 (PMLR, 2019). Google Scholar
30. M. Zhu, K. Han, Y. Tang et al., Visual transformer pruning, arXiv:2104.08500 (2021). Google Scholar
31. E. Mousavi, R. Kafieh, H. Rabbani, “Classification of dry age-related macular degeneration and diabetic macular oedema from optical coherence tomography images using dictionary learning,” IET Image Process. 14(8), 1571–1579 (2020). Crossref, Web of Science, Google Scholar
32. S. P. K. Karri, D. Chakraborty, J. Chatterjee, “Transfer learning based classification of optical coherence tomography images with diabetic macular edema and dry age-related macular degeneration,” Biomed. Opt. Exp. 8(2), 579–592 (2017). Crossref, Web of Science, Google Scholar
33. A. Thomas, P. M. Harikrishnan, R. Ramachandran et al., “A novel multiscale and multipath convolutional neural network based age-related macular degeneration detection using OCT images,” Comput. Meth. Prog. Biomed. 209, 106294 (2021). Crossref, Web of Science, Google Scholar
34. Y. Luo, Q. Xu, R. Jin et al., “Automatic detection of retinopathy with optical coherence tomography images via a semi-supervised deep learning method,” Biomed. Opt. Exp. 12(5), 2684–2702 (2021). Crossref, Web of Science, Google Scholar
35. Z. Wang, W. Zhang, Y. Sun et al., “Detection of diabetic macular edema in optical coherence tomography image using an improved level set algorithm,” BioMed Res. Int. 2020, 6974215 (2020). Web of Science, Google Scholar

Vol. 15, No. 02

Metrics

Downloaded 8,422 times

History

Received 10 August 2021

Accepted 22 November 2021

Published: 17 January 2022

Information

This is an Open Access article. It is distributed under the terms of the Creative Commons Attribution 4.0 (CC-BY) License. Further distribution of this work is permitted, provided the original work is properly cited.

Keywords

PDF download