Open Access

A generalized deep neural network approach for improving resolution of fluorescence microscopy images

State Key Laboratory of Cultivation Base for Photoelectric Technology and Functional Materials, National Center for International Research of Photoelectric Technology & Nano-Functional Materials and Application, Key Laboratory of Photoelectronic Technology of Shaanxi Province, Institute of Photonics and Photon-Technology, Northwest University, Xi’an 710127, P. R. China

Search for more papers by this author

Qing He

https://orcid.org/0009-0002-8088-3909

Search for more papers by this author

Yang Liu

https://orcid.org/0009-0006-6182-0100

Search for more papers by this author

, and

Kaige Wang

https://orcid.org/0000-0001-6221-9954

E-mail Address: wangkg@nwu.edu.cn

Corresponding author.

Search for more papers by this author

https://doi.org/10.1142/S1793545824500111Cited by:0 (Source: Crossref)

Abstract

Deep learning is capable of greatly promoting the progress of super-resolution imaging technology in terms of imaging and reconstruction speed, imaging resolution, and imaging flux. This paper proposes a deep neural network based on a generative adversarial network (GAN). The generator employs a U-Net-based network, which integrates DenseNet for the downsampling component. The proposed method has excellent properties, for example, the network model is trained with several different datasets of biological structures; the trained model can improve the imaging resolution of different microscopy imaging modalities such as confocal imaging and wide-field imaging; and the model demonstrates a generalized ability to improve the resolution of different biological structures even out of the datasets. In addition, experimental results showed that the method improved the resolution of caveolin-coated pits (CCPs) structures from 264nm to 138nm, a 1.91-fold increase, and nearly doubled the resolution of DNA molecules imaged while being transported through microfluidic channels.

Keywords:

1. Introduction

The in-depth study of the dynamics of DNA molecules, subcellular structures and various organelles within the cell can give us a unique insight into the origin of life. the invention of microscopy, especially fluorescence microscopy, laid the foundation for realizing this ideal.¹ Since its invention, fluorescence microscopy has played a crucial role in advancing research in cell and molecular biology within the life sciences.^2,3 However, due to the optical diffraction limit,⁴ the horizontal resolution of traditional fluorescence microscopy rarely exceeds 200 nm, and the vertical resolution is approximately 500nm. This limitation significantly hinders a detailed and accurate understanding of the characteristics of biomolecules and their biological activities within living organisms.

In the 1990s, people invented super-resolution optical imaging technology, which breaks through the resolution limit of an optical microscope, and the spatial resolution reaches 20nm. Super-resolution imaging methods include stochastic optical reconstruction microscopy (STORM)^5,6 and photoactivated localization microscopy (PALM)⁷ based on single-molecule localization, stimulated emission depletion microscopy (STED)^8,9 based on imaging point spread function (PSF) modulation, and structured illumination microscopy (SIM)^10,11 based on frequency domain modulation. Although these methods have succeeded in pushing the optical diffraction limit, they have some problems. For instance, STED and STORM techniques are slow and susceptible to photobleaching and phototoxicity issues; SIM techniques are more intricate in both imaging systems and image reconstruction.¹²

To address these issues, researchers have been working to develop fluorescent reagents with better photostability and less photobleaching, and to optimize the imaging optical path and reconstruction algorithms, such as MINFLUX,¹³ the latest imaging technology that combines STED with single-molecule localization techniques. In addition, deep learning methods are beginning to be applied to super-resolution imaging research and have been rapidly developed.^14,15,16 Deep learning is a widely used method in machine learning that uses deep neural networks for automatic classification, prediction and learning.¹⁷ Traditional learning algorithms typically have only one or two layers of nonlinear transformations with limited learning capacity,^18,19,20 making it difficult to solve complex problems. Deep learning uses a deep nonlinear network framework to learn the mapping relationship of the training data, achieve the approximation of complex functions, and learn the characteristics of the whole data by training the sample dataset.

Deep learning has shown great strength in image processing.^21,22,23,24 The combination of fluorescence microscopy imaging and deep learning presents new opportunities for ultra-high-resolution imaging of cells and organisms. This includes improving the spatial resolution of imaging, accelerating image reconstruction speed, and enhancing imaging quality.^25,26,27,28 Currently, the method of combining deep learning with fluorescence microscopy usually trains a network model for each biological structure,^29,30 which results in a large amount of time to train the model, and although the use of migration learning³¹ can accelerate the training speed, training a large number of models still consumes more computational resources.

This paper employs a deep learning model based on a generative adversarial network (GAN)³² to construct an image super-resolution network. The Dense-U-Net is used as the generator of GAN. Compared to other deep learning models for super-resolution imaging, the proposed network requires only one training to have super-resolution transformation effects on several biological structures. Five types of structures, including microtubules (MTs), are input into the network for training. The trained model accurately maps low-resolution images of MTs and other structures in the test set to super-resolution images. Additionally, the model achieved excellent resolution enhancement for images of DNA molecules traversing the microfluidic channels outside the training dataset by using a combination of Gaussian fuzzy denoising and multi-scale enhancement³³ pre-processing methods before inputting them into the neural network for super-resolution transformation.

2. Materials and Methods

2.1. Model

To enhance the resolution of low-resolution images, that is, to extract clear structures from fuzzy, low-quality images, we developed a framework based on GAN, which is illustrated in Fig. 1. Traditional GAN networks feed random noise z into the generator, potentially complicating the direction of its iterative process. In our framework, real samples of low-resolution images are fed into the generator, providing better guidance for its behavior. This network consists of a generative model and a discriminative model. The generative model incorporates a combination of U-Net³⁴ and DenseNet³⁵ architectures, whereas the discriminative model employs a simple convolutional neural network. During network training, a low-resolution image is first input into the generative model to create a high-resolution image; subsequently, the discriminative model receives this generated high-resolution image alongside the original high-resolution label image to compare the two and estimates the likelihood that the produced image matches the label image. This process is iterated during the training phase until the discriminative model can no longer differentiate between the produced high-resolution image and the label image. At this juncture, the generative model has determined the optimal parameters for efficiently producing high-resolution images that closely resemble the label images in the training set. The discriminative model plays a pivotal role in assessing the similarity between the generated high-resolution image and the label image.

Fig. 1. Network framework diagrams. (a) generative model, (b) discriminative model.

2.1.1. Generator

A detailed framework of the proposed generative model is shown in Fig. 2. A U-Net containing a DenseNet is used for the generative model, where the downsampling part of the U-Net consists of DenseNet, a deep convolutional neural network that enhances feature reuse and gradient flow by introducing dense connections into the network, which improves the performance and generalization of the model. In DenseNet, each layer takes the outputs of all previous layers as inputs, forming a dense connection. Compared to ResNet, DenseNet focuses more on feature reuse and information sharing and tends to excel in model accuracy and generalization.³⁵

In the generative model, the input image first passes through a convolutional layer which operates as in the following equation :

x 0 = LReLU (BN (Conv (x i))), x_{0} = LReLU (BN (Conv (x_{i}))), <math display="block" altimg="eq-00005.gif"><msub><mrow><mi>x</mi></mrow><mrow><mn>0</mn></mrow></msub><mo>=</mo><mstyle><mtext mathvariant="normal">LReLU</mtext></mstyle><mo stretchy="false">(</mo><mstyle><mtext mathvariant="normal">BN</mtext></mstyle><mo stretchy="false">(</mo><mstyle><mtext mathvariant="normal">Conv</mtext></mstyle><mo stretchy="false">(</mo><msub><mrow><mi>x</mi></mrow><mrow><mstyle mathvariant="bold"><mi>i</mi></mstyle></mrow></msub><mo stretchy="false">)</mo><mo stretchy="false">)</mo><mo stretchy="false">)</mo><mo>,</mo></math> (1)

where

x_{0}

$x_{0}$ and

x_{i}

$x_{i}$ denote the output and input of the convolution block, respectively. Conv() is the convolution operation, BN is the batch normalization layer used to speed up training and improve network generalization, and LReLU is the leakage current rectified linear unit activation function, with the definition given by

LReLU (x, α) = max (0, x) - α \times max (0, - x) . LReLU (x, α) = max (0, x) - α \times max (0, - x) . <math display="block" altimg="eq-00008.gif"><mstyle><mtext mathvariant="normal">LReLU</mtext></mstyle><mo stretchy="false">(</mo><mi>x</mi><mo>,</mo><mi>α</mi><mo stretchy="false">)</mo><mo>=</mo><mo>max</mo><mo stretchy="false">(</mo><mn>0</mn><mo>,</mo><mi>x</mi><mo stretchy="false">)</mo><mo>-</mo><mi>α</mi><mo>\times</mo><mo>max</mo><mo stretchy="false">(</mo><mn>0</mn><mo>,</mo><mo>-</mo><mi>x</mi><mo stretchy="false">)</mo><mo>.</mo></math> (2)

The Dense-U-Net framework consists of a downsampling section of three dense blocks and four upsampling sections. Layers with the same Height

Width are concatenated, and an additional upsampling block doubles the Height and Width of the input image, enriching the pixel information of the image while increasing the image resolution. Each dense block consists of four dense layers and a transition layer, where the inputs of the dense layers are connected to the outputs, and the transition layer is used to control the number of channels and perform downsampling. The dense layer can be represented by

D k = Concat (BN [LReLU (Conv (D k - 1))], D k - 1), k = 1, 2, 3, 4, <math display="block" altimg="eq-00011.gif"><mtable displaystyle="true"><mtr><mtd><msub><mrow><mi>D</mi></mrow><mrow><mi>k</mi></mrow></msub><mo>=</mo><mstyle><mtext mathvariant="normal">Concat</mtext></mstyle><mo stretchy="false">(</mo><mstyle><mtext mathvariant="normal">BN</mtext></mstyle><mo stretchy="false">[</mo><mstyle><mtext mathvariant="normal">LReLU</mtext></mstyle><mo stretchy="false">(</mo><mstyle><mtext mathvariant="normal">Conv</mtext></mstyle><mo stretchy="false">(</mo><msub><mrow><mi>D</mi></mrow><mrow><mi>k</mi><mo>-</mo><mn>1</mn></mrow></msub><mo stretchy="false">)</mo><mo stretchy="false">)</mo><mo stretchy="false">]</mo><mo>,</mo><msub><mrow><mi>D</mi></mrow><mrow><mi>k</mi><mo>-</mo><mn>1</mn></mrow></msub><mo stretchy="false">)</mo><mo>,</mo></mtd></mtr><mtr><mtd><mi>k</mi><mo>=</mo><mn>1</mn><mo>,</mo><mn>2</mn><mo>,</mo><mn>3</mn><mo>,</mo><mn>4</mn><mo>,</mo></mtd></mtr></mtable></math> (3)

where

$D_{k}$ is the output of the kth dense layer,

$D_{0}$ is the output of the first convolutional layer, which is also the input of the first dense layer, and concat denotes the connect operation.

After four dense layers, the transition layer performs channel reduction and downsampling, which can be represented by

T o = LReLU (BN (Avgpooling [LReLU (BN (Conv (T i)))])), <math display="block" altimg="eq-00014.gif"><mtable displaystyle="true" columnalign="left"><mtr><mtd><msub><mrow><mi>T</mi></mrow><mrow><mi>o</mi></mrow></msub><mo>=</mo><mstyle><mtext mathvariant="normal">LReLU</mtext></mstyle><mo stretchy="false">(</mo><mstyle><mtext mathvariant="normal">BN</mtext></mstyle><mo stretchy="false">(</mo><mstyle><mtext mathvariant="normal">Avgpooling</mtext></mstyle></mtd></mtr><mtr><mtd><mo stretchy="false">[</mo><mstyle><mtext mathvariant="normal">LReLU</mtext></mstyle><mo stretchy="false">(</mo><mstyle><mtext mathvariant="normal">BN</mtext></mstyle><mo stretchy="false">(</mo><mstyle><mtext mathvariant="normal">Conv</mtext></mstyle><mo stretchy="false">(</mo><msub><mrow><mi>T</mi></mrow><mrow><mi>i</mi></mrow></msub><mo stretchy="false">)</mo><mo stretchy="false">)</mo><mo stretchy="false">)</mo><mo stretchy="false">]</mo><mo stretchy="false">)</mo><mo stretchy="false">)</mo><mo>,</mo></mtd></mtr></mtable></math> (4)

where

$T_{o}$ is the output of the transition layer and

$T_{i}$ is the input of the transition layer. In the transition layer, channel reduction is implemented by convolution and downsampling is implemented by an average pooling layer.

There is a bottleneck layer at the end of the downsampling, which has the effect of increasing the number of channels and performing a downsampling. This can be expressed in the following equation:

B o = LReLU (BN (Conv [LReLU (BN (Avgpooling [LReLU (BN (Conv (B i)))]))])), <math display="block" altimg="eq-00017.gif"><mtable displaystyle="true"><mtr><mtd columnalign="right"><msub><mrow><mi>B</mi></mrow><mrow><mi>o</mi></mrow></msub></mtd><mtd columnalign="center"><mo>=</mo></mtd><mtd columnalign="left"><mstyle><mtext mathvariant="normal">LReLU</mtext></mstyle><mo stretchy="false">(</mo><mstyle><mtext mathvariant="normal">BN</mtext></mstyle><mo stretchy="false">(</mo><mstyle><mtext mathvariant="normal">Conv</mtext></mstyle><mo stretchy="false">[</mo><mstyle><mtext mathvariant="normal">LReLU</mtext></mstyle><mo stretchy="false">(</mo><mstyle><mtext mathvariant="normal">BN</mtext></mstyle><mo stretchy="false">(</mo><mstyle><mtext mathvariant="normal">Avgpooling</mtext></mstyle></mtd></mtr><mtr><mtd></mtd><mtd></mtd><mtd columnalign="left"><mo stretchy="false">[</mo><mstyle><mtext mathvariant="normal">LReLU</mtext></mstyle><mo stretchy="false">(</mo><mstyle><mtext mathvariant="normal">BN</mtext></mstyle><mo stretchy="false">(</mo><mstyle><mtext mathvariant="normal">Conv</mtext></mstyle><mo stretchy="false">(</mo><msub><mrow><mi>B</mi></mrow><mrow><mi>i</mi></mrow></msub><mo stretchy="false">)</mo><mo stretchy="false">)</mo><mo stretchy="false">)</mo><mo stretchy="false">]</mo><mo stretchy="false">)</mo><mo stretchy="false">)</mo><mo stretchy="false">]</mo><mo stretchy="false">)</mo><mo stretchy="false">)</mo><mo>,</mo></mtd></mtr></mtable></math> (5)

where

$B_{o}$ is the output of the bottleneck layer and

$B_{i}$ is the input of the bottleneck layer.

Each upsampling block performs convolution, BN and LReLU operations on the upsampling inputs can be represented by

U k = LReLU (BN (Conv [LReLU (BN (Conv (Concat {U k - 1, T 4 - k})))])) k = 1, 2, 3, <math display="block" altimg="eq-00020.gif"><mtable displaystyle="true"><mtr><mtd columnalign="right"><msub><mrow><mi>U</mi></mrow><mrow><mi>k</mi></mrow></msub></mtd><mtd columnalign="center"><mo>=</mo></mtd><mtd columnalign="left"><mstyle><mtext mathvariant="normal">LReLU</mtext></mstyle><mo stretchy="false">(</mo><mstyle><mtext mathvariant="normal">BN</mtext></mstyle><mo stretchy="false">(</mo><mstyle><mtext mathvariant="normal">Conv</mtext></mstyle><mo stretchy="false">[</mo><mstyle><mtext mathvariant="normal">LReLU</mtext></mstyle><mo stretchy="false">(</mo><mstyle><mtext mathvariant="normal">BN</mtext></mstyle><mo stretchy="false">(</mo><mstyle><mtext mathvariant="normal">Conv</mtext></mstyle></mtd></mtr><mtr><mtd></mtd><mtd></mtd><mtd columnalign="left"><mo stretchy="false">(</mo><mstyle><mtext mathvariant="normal">Concat</mtext></mstyle><mo stretchy="false">{</mo><msub><mrow><mi>U</mi></mrow><mrow><mi>k</mi><mo>-</mo><mn>1</mn></mrow></msub><mo>,</mo><msub><mrow><mi>T</mi></mrow><mrow><mn>4</mn><mo>-</mo><mi>k</mi></mrow></msub><mo stretchy="false">}</mo><mo stretchy="false">)</mo><mo stretchy="false">)</mo><mo stretchy="false">)</mo><mo stretchy="false">]</mo><mo stretchy="false">)</mo><mo stretchy="false">)</mo><mspace width="1em" class="quad"></mspace><mi>k</mi><mo>=</mo><mn>1</mn><mo>,</mo><mn>2</mn><mo>,</mo><mn>3</mn><mo>,</mo></mtd></mtr></mtable></math> (6)

where

$U_{0}$ at

$k = 0$ is the output of the bottleneck layer.

$U_{0}$ is concatenated with the output

$T_{3}$ of the transition layer of the third downsampling block, and the result

$U_{1}$ is used as the input to the second upsampling block. One downsampling block more than the upsampling block is used to double the Height and Width of the image input to the generator. Spatial upsampling is achieved by bilinear interpolation in the upsampling block. Finally, the results of the four upsampling blocks are used to map the eight-channel feature map to a single-channel high-resolution output through a convolutional layer.

2.1.2. Discriminator

Figure 1(b) shows the framework of the discriminator, which is a simple convolutional neural network framework. It starts with a convolutional layer, followed by four convolutional blocks. Each convolutional block contains a convolutional layer, a batch normalization layer, and an LReLU activation function given by

z k = LReLU (BN (Conv (z k - 1))), k = 1, 2, 3, 4, <math display="block" altimg="eq-00026.gif"><msub><mrow><mi>z</mi></mrow><mrow><mi>k</mi></mrow></msub><mo>=</mo><mstyle><mtext mathvariant="normal">LReLU</mtext></mstyle><mo stretchy="false">(</mo><mstyle><mtext mathvariant="normal">BN</mtext></mstyle><mo stretchy="false">(</mo><mstyle><mtext mathvariant="normal">Conv</mtext></mstyle><mo stretchy="false">(</mo><msub><mrow><mi>z</mi></mrow><mrow><mi>k</mi><mo>-</mo><mn>1</mn></mrow></msub><mo stretchy="false">)</mo><mo stretchy="false">)</mo><mo stretchy="false">)</mo><mo>,</mo><mi>k</mi><mo>=</mo><mn>1</mn><mo>,</mo><mn>2</mn><mo>,</mo><mn>3</mn><mo>,</mo><mn>4</mn><mo>,</mo></math> (7)

where

$z_{k}$ is the output of the kth convolutional block and

$z_{0}$ is the input of the first convolutional block. Next, an average global pooling layer is inserted for dimensionality reduction. This layer performs a downsampling operation by taking the spatial average of the feature maps in the corresponding

$2 \times 2$ region while discarding redundant information. Following this, there are three fully connected layers and a sigmoid activation function that sets the number of channels to one and outputs the probability estimated by the discriminator.

2.1.3. Loss function

The loss function is an operator function for measuring the degree of difference between the model’s predicted value $f (x)$ and the true value Y. It is usually denoted by $Loss (Y, f (x))$ , and the smaller the loss function, the better the model’s robustness. In GAN, the loss function has two parts, the generator loss function and the discriminator loss function. The generator and discriminator loss functions are defined by ${Loss}_{G}$ and ${Loss}_{D}$ ; respectively. The loss function of the generator is given by

LossG(X,Y)=αMSE[G(X),Y]+γBCE[D(G(X))−D(Y),Ylabel]+β[1−SSIM(G(X),Y)]=αw×hw×h∑i=1[G(X)i−Yi]2−γlog[D(G(X))−D(Y)]+β[1−(2μG(X)μY+c1)(2σG(X)Y+c2)(μ2G(X)+μ2Y+c1)(σ2G(X)+σ2Y+c2)],<math display="block" altimg="eq-00034.gif"><mtable displaystyle="true"><mtr><mtd></mtd><mtd></mtd><mtd columnalign="left"><msub><mrow><mstyle><mtext mathvariant="normal">Loss</mtext></mstyle></mrow><mrow><mi>G</mi></mrow></msub><mo stretchy="false">(</mo><mi>X</mi><mo>,</mo><mi>Y</mi><mo stretchy="false">)</mo></mtd></mtr><mtr><mtd></mtd><mtd></mtd><mtd columnalign="left"><mspace width="2em" class="qquad"></mspace><mo>=</mo><mi>α</mi><mstyle><mtext mathvariant="normal">MSE</mtext></mstyle><mo stretchy="false">[</mo><mi>G</mi><mo stretchy="false">(</mo><mi>X</mi><mo stretchy="false">)</mo><mo>,</mo><mi>Y</mi><mo stretchy="false">]</mo><mo>+</mo><mi>γ</mi><mstyle><mtext mathvariant="normal">BCE</mtext></mstyle><mo stretchy="false">[</mo><mi>D</mi><mo stretchy="false">(</mo><mi>G</mi><mo stretchy="false">(</mo><mi>X</mi><mo stretchy="false">)</mo><mo stretchy="false">)</mo></mtd></mtr><mtr><mtd></mtd><mtd></mtd><mtd columnalign="left"><mspace width="2em" class="qquad"></mspace><mspace width="1em" class="quad"></mspace><mo>−</mo><mspace width=".17em" class="thinspace"></mspace><mi>D</mi><mo stretchy="false">(</mo><mi>Y</mi><mo stretchy="false">)</mo><mo>,</mo><msub><mrow><mi>Y</mi></mrow><mrow><mstyle><mtext mathvariant="normal">label</mtext></mstyle></mrow></msub><mo stretchy="false">]</mo><mo>+</mo><mi>β</mi><mo stretchy="false">[</mo><mn>1</mn><mo>−</mo><mstyle><mtext mathvariant="normal">SSIM</mtext></mstyle><mo stretchy="false">(</mo><mi>G</mi><mo stretchy="false">(</mo><mi>X</mi><mo stretchy="false">)</mo><mo>,</mo><mi>Y</mi><mo stretchy="false">)</mo><mo stretchy="false">]</mo></mtd></mtr><mtr><mtd></mtd><mtd></mtd><mtd columnalign="left"><mspace width="2em" class="qquad"></mspace><mo>=</mo><mfrac><mrow><mi>α</mi></mrow><mrow><mi>w</mi><mo>×</mo><mi>h</mi></mrow></mfrac><munderover accentunder="true" accent="true"><mrow><mo>∑</mo></mrow><mrow><mi>i</mi><mo>=</mo><mn>1</mn></mrow><mrow><mi>w</mi><mo>×</mo><mi>h</mi></mrow></munderover><msup><mrow><mo stretchy="false">[</mo><mi>G</mi><msub><mrow><mo stretchy="false">(</mo><mi>X</mi><mo stretchy="false">)</mo></mrow><mrow><mi>i</mi></mrow></msub><mo>−</mo><msub><mrow><mi>Y</mi></mrow><mrow><mi>i</mi></mrow></msub><mo stretchy="false">]</mo></mrow><mrow><mn>2</mn></mrow></msup></mtd></mtr><mtr><mtd></mtd><mtd></mtd><mtd columnalign="left"><mspace width="2em" class="qquad"></mspace><mspace width="1em" class="quad"></mspace><mo>−</mo><mspace width=".17em" class="thinspace"></mspace><mi>γ</mi><mo>log</mo><mo stretchy="false">[</mo><mi>D</mi><mo stretchy="false">(</mo><mi>G</mi><mo stretchy="false">(</mo><mi>X</mi><mo stretchy="false">)</mo><mo stretchy="false">)</mo><mo>−</mo><mi>D</mi><mo stretchy="false">(</mo><mi>Y</mi><mo stretchy="false">)</mo><mo stretchy="false">]</mo></mtd></mtr><mtr><mtd></mtd><mtd></mtd><mtd columnalign="left"><mspace width="2em" class="qquad"></mspace><mspace width="1em" class="quad"></mspace><mo>+</mo><mspace width=".17em" class="thinspace"></mspace><mi>β</mi><mfenced separators="" open="[" close="]"><mrow><mn>1</mn><mo>−</mo><mfrac><mrow><mo stretchy="false">(</mo><mn>2</mn><msub><mrow><mi>μ</mi></mrow><mrow><mi>G</mi><mo stretchy="false">(</mo><mi>X</mi><mo stretchy="false">)</mo></mrow></msub><msub><mrow><mi>μ</mi></mrow><mrow><mi>Y</mi></mrow></msub><mo>+</mo><msub><mrow><mi>c</mi></mrow><mrow><mn>1</mn></mrow></msub><mo stretchy="false">)</mo><mo stretchy="false">(</mo><mn>2</mn><msub><mrow><mi>σ</mi></mrow><mrow><mi>G</mi><mo stretchy="false">(</mo><mi>X</mi><mo stretchy="false">)</mo><mi>Y</mi></mrow></msub><mo>+</mo><msub><mrow><mi>c</mi></mrow><mrow><mn>2</mn></mrow></msub><mo stretchy="false">)</mo></mrow><mrow><mo stretchy="false">(</mo><msubsup><mrow><mi>μ</mi></mrow><mrow><mi>G</mi><mo stretchy="false">(</mo><mi>X</mi><mo stretchy="false">)</mo></mrow><mrow><mn>2</mn></mrow></msubsup><mo>+</mo><msubsup><mrow><mi>μ</mi></mrow><mrow><mi>Y</mi></mrow><mrow><mn>2</mn></mrow></msubsup><mo>+</mo><msub><mrow><mi>c</mi></mrow><mrow><mn>1</mn></mrow></msub><mo stretchy="false">)</mo><mo stretchy="false">(</mo><msubsup><mrow><mi>σ</mi></mrow><mrow><mi>G</mi><mo stretchy="false">(</mo><mi>X</mi><mo stretchy="false">)</mo></mrow><mrow><mn>2</mn></mrow></msubsup><mo>+</mo><msubsup><mrow><mi>σ</mi></mrow><mrow><mi>Y</mi></mrow><mrow><mn>2</mn></mrow></msubsup><mo>+</mo><msub><mrow><mi>c</mi></mrow><mrow><mn>2</mn></mrow></msub><mo stretchy="false">)</mo></mrow></mfrac></mrow></mfenced><mo>,</mo></mtd></mtr></mtable></math>(8)

where X is the low-resolution image input to the generator,

$G (X)$ is the output image of the generator, Y is the input label image, and

$α$ ,

$β$ ,

$γ$ are the equalization coefficients of each term, which are set to three parts, i.e.,

$α = 1$ ,

$β = 0.1$ ,

$γ = 1$ , respectively. The constants

$c_{1}$ and

$c_{2}$ in the third term is a very small number to prevent the denominator from going to zero. The generator loss function

${Loss}_{G}$ is the combination of the mean square error loss (MSE loss), the structural similarity index measure (SSIM)³⁶ and the cross-entropy loss function. The first two evaluate the difference between the output super-resolution image of the generator and the label image used for training, and the last one is related to the probability of the discriminator output. The discriminator loss function is given by

$Y_{label}$ is set to 1 in the loss function of the generator and to 0 in the loss function of the discriminator. During actual training, the generator and discriminator are trained and optimized in competition with each other.

2.2. Experiments

2.2.1. Datasets and pre-processing

In practice, the publicly available high-quality image dataset BioSR¹⁴ was chosen for training. This dataset contains five biological organization structures, including microtubules (MTs), caveolin-coated pits (CCPs), endoplasmic reticulum (ER), actin filaments (F-actin) and nonlinear actin filaments (F-actin-nonlinear). The original data format is MRC, which is not an image format and requires processing of the data within it before it can be displayed in a common image format. Also, resizing the image to an exponential size of 2 allows for better downsampling in the network and is more consistent with the network structure. So, we need to pre-process the original MRC format image. Figure 3 shows the pre-processing steps and the BioSR dataset, including wide-field and SIM images of F-actin converted to image format (Figs. 3(a) and 3(b), respectively). Figures 3(c) and 3(d) show the training datasets obtained after the pre-processing in Figs. 3(a) and 3(b), respectively. The dataset was pre-processed through the following steps: Firstly, the original wide-field data and the original SIM data were converted from MRC format to image format. Then, they were subjected to interpolation and the pixel resolution was adjusted to $2048 \times 2048$ pixels using bilinear interpolation. The pixel spacing was reduced from the original size of 0.0626 $μ$ m to 0.01534 $μ$ m. Next, the adjusted wide-field-SIM image pairs were subjected to random cropping of the same region control. The pixel resolution of the randomly cropped images was $512 \times 512$ pixels, and they were flipped horizontally and vertically after cropping. The wide-field images in the wide-field-SIM image pairs were then downsampled to a pixel resolution of $256 \times 256$ to match the network’s framework. Then these cropped and flipped image pairs were used for network training, where the cropped images were used as low-resolution images for input to the network, and the bilinearly interpolated SIM images were used as high-resolution label images. Each set of images was obtained by operations such as cropping and flipping to obtain approximately 10 image pairs of the same region (removing images that do not contain or contain very little of the region of interest). For each structure, approximately 50 sets of raw data were selected. Finally, about 3000 sets of image data were generated, including low-resolution ( $256 \times 256$ pixels) and high-resolution ( $512 \times 512$ pixels) label images. Of these, about 2200 sets were used as training sets, about 700 sets were used as validation sets, and about 100 sets were used as test sets. In addition, about 10 sets of raw data without cropping and flipping were selected for each structure to be used as a test set.

Fig. 3. The BioSR dataset and pre-processing steps. The upper half represents the low-resolution image and the lower half the high-resolution image. (a) wide-field image of F-actin; (b) SIM image of F-actin; (c) and (d) are the training datasets formed by (a) and (b) that have been cropped and flipped.

Figure 4 displays the training set images generated after random cropping and flipping for the five structures. Figures 4(a)–4(e) represent the low-resolution images of the five structures, respectively, with a pixel size of $256 \times 256$ pixels. Figures 4(f)–4(j) correspond to the high-resolution label images of Figs. 4(a)–4(e), respectively, with a pixel size of $512 \times 512$ pixels.

Fig. 4. The training set images generated from five biostructures: CCPs, ER, F-actin, F-actin-nonlinear, and MTs. (a)–(e) are low-resolution images of the five structures; (f)–(j) are high-resolution label images corresponding to (a)–(e), respectively.

2.2.2. Training

The workstation equipped with an Intel Core i9-13900K CPU (5.3GHz) and an NVIDIA GeForce RTX3090 graphics card was used to accomplish the training. Pytorch version 1.12.1 and Python version 3.8.16 were used for the framework, and Microsoft Windows 10 was the operating system. For training, the adaptive moment estimation (Adam) optimiser³⁷ was used for stochastic initialization and optimization, and the initial learning rates of the generator and discriminator were 1e−4 and 5e−5, respectively. To ensure efficient training, the batchsize was set to 16 and the number of epochs to 150. Our model was trained from scratch and took approximately 3h to train (depending on the batchsize and epoch setting). Throughout the training process, the network gradually learns to transform super-resolution images, resulting in improved image details. In actual operation, it is worth highlighting that super-resolution transformation from low-resolution images to high-resolution images takes only about 1s.

3. Results

Figure 5 shows the results of the network training on the five structures of the test set. The images in first-row show the test set images with low resolution and poor signal-to-noise ratio, containing some artefacts and blur. The second-row images show the label images corresponding to that in the first row and the third-row images show the network output images corresponding to ones in the second row. As can be clearly seen in Fig. 5, the resolution in the images of all five structures is significantly improved, and some structures that are difficult to distinguish in the low-resolution images become clearly visible in the output images of the network.

Fig. 5. Network training results. The first row is the wide-field training set image as input to the network; the second row is the SIM label image as input to the network; and the third row is the high-resolution image as output from the network.

The SSIM is used to evaluate the network’s super-resolution transformation capability. A higher SSIM score indicates better super-resolution capability. The network’s output super-resolution images were evaluated with five structures, and the SSIM score of the network’s output image and the label image reached 85.77%.

In order to analyze the effect of neural networks on the structural information of the images in more detail, we locally zoomed in on the images in Fig. 5 and plotted some of the intensity profiles to evaluate the resolution information using full-width half-maximum (FWHM). The results are shown in Fig. 6. In Fig. 6, column (a) shows the five structural network input wide-field images, i.e., the locally zoomed images of the low-resolution input images of the test set; column (b) shows the label images, i.e., the locally zoomed images of the high-resolution images of the test set; and column (c) shows the locally zoomed images of the network output images. Figures 6(d)–6(h) are the normalized intensity maps along the green solid line in each structural image. It is obviously that the FWHM of the network output images of each structure are significantly better than that of the network input wide-field images, and the FWHM of some of the network output images are even better than the label images of the test set.

Fig. 6. Zoomed images and intensity maps of the network training results. Column (a) shows the locally zoomed image of the wide-field image of the test set; (b) shows the locally zoomed image of the label image of the test set; (c) shows the locally zoomed image of the high-resolution image of the network output; and (d)–(h) show the normalized intensity maps of the five structures along the green solid line of the species in columns (d)–(h), respectively.

Using CCPs as an example, 10 groups of separated structures were selected to measure their FWHM. Table 1 shows the measured FWHM values of the wide field, label images, and network output images of the selected 10 groups of structures, along with their mean and standard deviation. These structures were deliberately selected without a pit in the center to allow for a better comparison of the FWHM. The FWHM for the widefield image is $264 \pm 15$ nm, while the label and network output images have FWHM of $137 \pm 17$ nm and $138 \pm 11$ nm, respectively. The network output image shows a resolution increase of about 1.9 times compared to the network input widefield image, and the network output image shows a relatively stable super-resolution compared to the label image.

**Table 1. FWHM values and their mean and standard deviation (SD) of CCPs.**
Group	FWHM (nm)										Mean (nm)	SD (nm)
Widefield	257	234	271	282	277	263	266	276	247	268	264	14
Label	113	118	151	162	158	124	138	155	119	126	137	17
Output	127	131	155	147	153	133	134	146	121	130	138	11

In addition, the super-resolution effect of the network is related to the size of the input image to the network. In deep learning methods, the training data affects the results of the final training of the model. Therefore, to obtain good super-resolution results, the pixel size of the input image to the network should match the pixel size of the training data or the distribution of biological structures contained in the image should match each other at the same pixel resolution size. For example, if the pixel size of the input image is resized by about 15nm, a super-resolution image with high image quality can be obtained by network processing. However, if it is adjusted to other sizes, such as 30nm or 50nm, the super-resolution effect of the network is not satisfactory.

4. Discussion

In order to ascertain the generalizability of the model, the imaging results of different biological structures obtained using different imaging modalities were input into the neural network for super-resolution transformation. The super-resolution results are compared by different image evaluation methods.

4.1. Enhancing image resolution of MTs

Figure 7 shows a confocal image of a self-made, stained, and photographed MT image that was fed into the trained network to verify the super-resolution capability of our network. Figure 7(a) shows the MT image input to the network, which has a poor signal-to-noise ratio (SNR), and the MT structures are difficult to distinguish clearly. Figure 7(b) shows the image after deconvolution with a sharpened convolution kernel, and Fig. 7(c) shows the MT image output from the network. Compared to Fig. 7(a), the MT structure in Fig. 7(c) is clearer and easily distinguished; the resolution is also significantly improved compared to the deconvolution image in Fig. 7(b). Figures 7(d), 7(e) and Fig. 7(f) are locally enlargements of Figs. 7(a)–7(c), respectively, which show the effect of resolution improvement more clearly. Even for images not included in the training, validation, and test sets, the network is still capable of producing super-resolution images.

Fig. 7. Super-resolution enhancement of MT images. (a) MT image input to the network; (b) the deconvolution image of (a); (c) high-resolution image output from the network; (d), (e), and (f) are locally zoomed images corresponding to the white boxes in (a), (b), and (c), respectively.

Figure 8 illustrates the improvement in image quality through normalized image intensity comparison plots along the horizontal direction for Figs. 7(d)–7(f). The inset of Fig. 8 shows a locally zoomed image in the box. The FWHM of the network input image, the deconvolution image of the network input image, and the network output image are 465nm, 430nm, and 374nm, respectively. The resolution of the output image from the network has improved by 1.24 times compared to the input image of the network, and by 1.15 times compared to the deconvolution image of the network input image.

Fig. 8. Plots of normalized intensity distributions for the network input MT image, the deconvolution image of the MTs and the network output high-resolution image. The inset shows a locally zoomed image of the boxed portion of the figure.

Our method can further enhance the SNR of images, and the resolution of fluorescence microscopy is significantly correlated with its SNR. We utilize SNR to compare the image quality between the network input confocal image and the neural network output image, with SNR calculated according to the formula in Ref. 14. Figures 9(a) and 9(b), respectively, depict the confocal image of MTs and the network output super-resolution image. Regions containing both structural and background signals were selected in these images, and the peak value of the signal calculated from a Gaussian fit, the average value of the background, and the standard deviation of the background were computed within these regions to derive the SNR. The SNR of the network input image is 29.37, while that of the network output image is 44.91, resulting in a 1.53-fold increase in SNR.

Fig. 9. Network input confocal image and network output super-resolution image.

4.2. Enhancing image resolution of DNA traversing through a microfluidic channel

In the study of single molecules traversing microfluidic channels, images typically taken with CCDs suffer from low signal-to-noise ratio and poor resolution. Figure 10 shows the image of DNA molecules traversing a microfluidic channel taken by wide-field imaging and its step-by-step processing results using the multi-scale enhancement method along with the neural network. We combine the neural network with the multi-scale enhancement method to jointly process the image of DNA molecules traversing a microfluidic channel, which improves the image resolution and signal-to-noise ratio. Figure 10(a) shows an original pattern of DNA molecules moving through a microfluidic channel. During processing, first, the Gaussian blurring method was applied to eliminate the high background noise from the image, and Fig. 10(b) displays the image after the background noise was removed. Figure 10(c) shows the result of image enhancement using the multi-scale enhancement method, which is an image processing technique that enhances specific features or structures in an image by analyzing and processing images at different scales. Figure 10(d) shows the multi-channel enhanced image input to the neural network. Comparing Figs. 10(a) and 10(d), it is clearly the resolution of the processed image is significantly improved and the noise level is significantly reduced. Figure 10(e) shows the intensity distribution map along the yellow solid line in the white dashed in Figs. 10(a) and 10(d). Compared to the original image, the FWHM value of the network output result is reduced by a factor of 1.9, i.e., the resolution is improved by a factor of 1.9.

Fig. 10. Images of DNA molecules traversing a microfluidic channel and their super-resolution results. (a) a frame image of a DNA molecule traversing a microfluidic channel; (b) an image after Gaussian blur noise reduction processing; (c) an image after processing by a multi-scale enhancement method; (d) an image output from the network; and (e) a plot of the intensity distributions along the yellow solid line in the white dashed box in Fig. (a) and (d).

The method has also been used to process dynamic images, and an image of DNA molecules traversing through a microfluidic channel is shown in Fig. 11. Figure 11(a) clearly shows the movement of DNA molecules in a microfluidic channel from 0ms to 320ms. Figure 11(b) and 11(c) are locally zoomed images of the DNA molecules traversing the microfluidic channel at 320ms, and it can be seen that the resolution of the processed images is significantly improved, and the contours of the DNA molecules are clearer. The two indistinguishable DNA molecules framed by solid yellow lines in Fig. 11(b) can be clearly distinguished in Fig. 11(c). For a better view of the continuous movement of the DNA molecules through the microfluidic channel after treatment, see Appendix A.

Fig. 11. Images of DNA molecules traversing through a microfluidic channel. The imaging acquisition frame rate was 25 frames, i.e., one image was acquired every 40ms. (a) Output images of the images acquired in the time interval from 0ms to 320ms after pre-processing and super-resolution network processing, respectively. (b), (c) are the images acquired at 320ms and their output images after network processing, respectively.

5. Conclusions

This paper proposes a deep neural network based on GAN to address the lack of generality and slow training speed of traditional deep learning processing for biological structure image super-resolution transformation models. The proposed network model, which uses Dense-U-Net as a generative model of GAN, can achieve super-resolution transformation of a variety of biological structures with only one training, greatly improving training efficiency and reducing the consumption of computational resources. The super-resolution capability of the network was verified with test dataset images, improving the resolution of the original wide-field images of CCPs structures from 264nm to 138nm. Additionally, when the network was applied to images of MTs outside the dataset and images of DNA molecules transferring through microfluidic channels, the resolution of DNA molecules in the microfluidic channels increased by 1.9 times. Moreover, the method was also applied to process images of the moving DNA samples and clearly distinguished two DNA molecules, which were originally overlapped, within a microfluidic channel. The method has been helpful in facilitating the practical application of microscopic imaging techniques in various disciplines such as life sciences, biology, and chemistry.

Acknowledgments

Subjects funded by the National Natural Science Foundation of China (Nos. 62275216 and 61775181), the Natural Science Basic Research Programme of Shaanxi Province-Major Basic Research Special Project (Nos. S2018-ZC-TD-0061 and TZ0393), and the Special Project for the Development of National Key Scientific Instruments and Equipment No. (51927804).

Conflicts of Interest

The authors declare that there are no conflicts of interest relevant to this paper.

ORCID

Zichen Jin https://orcid.org/0009-0008-9930-7921

Qing He https://orcid.org/0009-0002-8088-3909

Yang Liu https://orcid.org/0009-0006-6182-0100

Kaige Wang https://orcid.org/0000-0001-6221-9954

References

1. J. W. Lichtman, J. A. Conchello, “Fluorescence microscopy,” Nat. Methods 2, 910–919 (2005). Crossref, Web of Science, Google Scholar
2. B. N. Giepmans, S. R. Adams, M. H. Ellisman, R. Y. Tsien, “The fluorescent toolbox for assessing protein location and function,” Science 312, 217–224 (2006). Crossref, Web of Science, Google Scholar
3. M. Mickoleit, B. Schmid, M. Weber, F. O. Fahrbach, S. Hombach, S. Reischauer, J. Huisken, “High-resolution reconstruction of the beating zebrafish heart,” Nat. Methods 11, 919–922 (2014). Crossref, Web of Science, Google Scholar
4. J. Pawley, Handbook of Biological Confocal Microscopy, Springer Science & Business Media (2006). Crossref, Google Scholar
5. B. Huang, W. Wang, M. Bates, X. Zhuang, “Three-dimensional super-resolution imaging by stochastic optical reconstruction microscopy,” Science 319, 810–813 (2008). Crossref, Web of Science, Google Scholar
6. M. J. Rust, M. Bates, X. Zhuang, “Sub-diffraction-limit imaging by stochastic optical reconstruction microscopy (STORM),” Nat. Methods 3, 793–796 (2006). Crossref, Web of Science, Google Scholar
7. E. Betzig, G. H. Patterson, R. Sougrat, O. W. Lindwasser, S. Olenych, J. S. Bonifacino, M. W. Davidson, J. Lippincott-Schwartz, H. F. Hess, “Imaging intracellular fluorescent proteins at nanometer resolution,” Science 313, 1642–1645 (2006). Crossref, Web of Science, Google Scholar
8. S. W. Hell, “Far-field optical nanoscopy,” Science 316, 1153–1158 (2007). Crossref, Web of Science, Google Scholar
9. D. Wildanger, R. Medda, L. Kastrup, S. W. Hell, “A compact STED microscope providing 3D nanoscale resolution,” J. Microsc. 236, 35–43 (2009). Crossref, Web of Science, Google Scholar
10. M. G. Gustafsson, “Surpassing the lateral resolution limit by a factor of two using structured illumination microscopy,” J. Microsc. 198, 82–87 (2000). Crossref, Web of Science, Google Scholar
11. L. Shao, P. Kner, E. H. Rego, M. G. Gustafsson, “Super-resolution 3D microscopy of live whole cells using structured illumination,” Nat. Methods 8, 1044–1046 (2011). Crossref, Web of Science, Google Scholar
12. H. Li, L. Qu, Z. Hua, “Deep learning based fluorescence microscopy imaging technologies and applications,” Laser Optoelectron. Prog. 58, 1811007 (2021). Google Scholar
13. F. Balzarotti, Y. Eilers, K. C. Gwosch, A. H. Gynnå, V. Westphal, F. D. Stefani, J. Elf, S. W. Hell, “Nanometer resolution imaging and tracking of fluorescent molecules with minimal photon fluxes,” Science 355, 606–612 (2017). Crossref, Web of Science, Google Scholar
14. C. Qiao, D. Li, Y. Guo, C. Liu, T. Jiang, Q. Dai, D. Li, “Evaluation and development of deep neural networks for image super-resolution in optical microscopy,” Nat. Methods 18, 194–202 (2021). Crossref, Web of Science, Google Scholar
15. M. Li, H. Shan, S. Pryshchep, M. M. Lopez, G. Wang, “Deep adversarial network for super stimulated emission depletion imaging,” J. Nanophotonics 14, 016009–016009 (2020). Crossref, Web of Science, Google Scholar
16. Y. Q. Tang, C. W. Zhou, H. W. Hao, Y. J. Sun, “Deep learning facilitated whole live cell fast super-resolution imaging,” Chin. Phys. B 31, 048705 (2022). Crossref, Web of Science, Google Scholar
17. S. C. Wang, “Artificial neural network,” Interdisciplinary Computing in Java Programming, pp. 81–100, Springer, Boston, MA (2003). Crossref, Google Scholar
18. F. I. Alam, J. Zhou, A. W. C. Liew, X. Jia, J. Chanussot, Y. Gao, “Conditional random field and deep feature learning for hyperspectral image classification,” IEEE Trans. Geosci. Remote Sens. 57, 1612–1628 (2018). Crossref, Web of Science, Google Scholar
19. T. Abeywickrama, M. A. Cheema, D. Taniar, “K-nearest neighbors on road networks: A journey in experimentation and in-memory implementation,” arXiv:1601.01549. Google Scholar
20. D. A. Reynolds, T. F. Quatieri, R. B. Dunn, “Speaker verification using adapted Gaussian mixture models,” Digit. Signal Process 10, 19–41 (2000). Crossref, Web of Science, Google Scholar
21. C. Dong, C. C. Loy, K. He, X. Tang, “Learning a deep convolutional network for image super-resolution,” Computer Vision–ECCV 2014: 13th European Conf., D. Fleet, T. Pajdla, B. Schiele, T. Tuytelaar, Eds., pp. 184–199, Springer, Cham, Zurich (2014). Crossref, Google Scholar
22. C. Tang, Y. Yuan, S. Xia, G. Ma, B. Wang, “Visualizing veins from color skin images using convolutional neural networks,” J. Innov. Opt. Health Sci. 13, 2050020 (2020). Link, Web of Science, Google Scholar
23. M. Hu, Z. Yuan, D. Yang, J. Zhao, Y. Liang, “Deep learning-based inpainting of saturation artifacts in optical coherence tomography images,” J. Innov. Opt. Health Sci. 16, 2350026 (2023). Google Scholar
24. M. Zhou, H. Wang, Q. Zhao, Y. Li, Y. Huang, D. Meng, Y. Zheng, “Interactive segmentation as gaussion process classification,” Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition, pp. 19488–19497, IEEE, Vancouver (2023). Google Scholar
25. C. Belthangady, L. A. Royer, “Applications, promises, and pitfalls of deep learning for fluorescence image reconstruction,” Nat. Methods 16, 1215–1225 (2019). Crossref, Web of Science, Google Scholar
26. C. Ling, C. Zhang, M. Wang, F. Meng, L. Du, X. Yuan, “Fast structured illumination microscopy via deep learning,” Photonics Res. 8, 1350–1359 (2020). Crossref, Web of Science, Google Scholar
27. E. Nehme, D. Freedman, R. Gordon, B. Ferdman, L. E. Weiss, O. Alalouf, T. Naor, R. Orange, T. Michaeli, Y. Shechtman, “DeepSTORM3D: Dense 3D localization microscopy and PSF design by deep learning,” Nat. Methods 17, 734–740 (2020). Crossref, Web of Science, Google Scholar
28. L. Hu, S. Hu, W. Gong, K. Si, “Image enhancement for fluorescence microscopy based on deep learning with prior knowledge of aberration,” Opt. Lett. 46, 2055–2058 (2021). Crossref, Web of Science, Google Scholar
29. Z. Wang, L. Zhu, H. Zhang, G. Li, C. Yi, Y. Li, Y. Yang, Y. Ding, M. Zhen, S. Gao, S. Gao, T. K. Hsiai, P. Fei, “Real-time volumetric reconstruction of biological dynamics with light-field microscopy and deep learning,” Nat. Methods 18, 551–556 (2021). Crossref, Web of Science, Google Scholar
30. B. Huang, J. Li, B. Yao, Z. Yang, E. Y. Lam, J. Zhang, W. Yan, J. Qu, “Enhancing image resolution of confocal fluorescence microscopy with deep learning,” PhotoniX 4, 1–22 (2023). Google Scholar
31. S. J. Pan, Q. Yang, “A survey on transfer learning,” IEEE Trans. Knowl. Data Eng. 22, 1345–1359 (2009). Crossref, Web of Science, Google Scholar
32. I. J. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, Y. Bengio, “Generative adversarial nets,” Advances in Neural Information Processing Systems, p. 27, NIPS, Montreal (2014). Google Scholar
33. Y. Hao, A. Ji, R. Xing, W. Zhu, B. Jiang, Y. Jian, H. Chen, “Capillaries segmentation of NIR-II images and its application in ischemic stroke,” Comput. Biol. Med. 147, 105742 (2022). Crossref, Web of Science, Google Scholar
34. O. Ronneberger, P. Fischer, T. Brox, “U-net: Convolutional networks for biomedical image segmentation,” arXiv:1505.04597. Google Scholar
35. G. Huang, Z. Liu, L. V. D. Maaten, K. Q. Weinberger, “Densely connected convolutional networks,” Proc. IEEE Conf. Computer Vision and Pattern Recognition, pp. 4700–4708, IEEE, Honolulu (2017). Crossref, Google Scholar
36. Z. Wang, A. C. Bovik, H. R. Sheikh, E. P. Simoncelli, “Image quality assessment: from error visibility to structural similarity,” IEEE Trans. Image Process 13, 600–612 (2004). Crossref, Web of Science, Google Scholar
37. D. P. Kingma, J. Ba, “Adam: A method for stochastic optimization,” arXiv:1412.6980. Google Scholar

Vol. 17, No. 06

Supplemental Materials

Metrics

Downloaded 941 times

History

Received 27 March 2024

Accepted 8 May 2024

Published: 10 July 2024

Information

This is an Open Access article. It is distributed under the terms of the Creative Commons Attribution 4.0 (CC-BY) License. Further distribution of this work is permitted, provided the original work is properly cited.

Keywords

PDF download