Research PaperOpen Access

Enhancing Robustness of Medical Image Segmentation Model with Neural Memory Ordinary Differential Equation

Junjie Hu

https://orcid.org/0000-0002-5750-0511

Machine Intelligence Laboratory, College of Computer Science, Sichuan University, Chengdu 610065, P. R. China

E-mail Address: hujunjie@scu.edu.cn

Search for more papers by this author

Chengrong Yu

https://orcid.org/0009-0004-1238-7414

Machine Intelligence Laboratory, College of Computer Science, Sichuan University, Chengdu 610065, P. R. China

E-mail Address: yuchengrong@stu.scu.edu.cn

Search for more papers by this author

Zhang Yi

https://orcid.org/0000-0002-5867-9322

Machine Intelligence Laboratory, College of Computer Science, Sichuan University, Chengdu 610065, P. R. China

E-mail Address: zhangyi@scu.edu.cn

Search for more papers by this author

, and

Haixian Zhang

https://orcid.org/0000-0002-9821-508X

Machine Intelligence Laboratory, College of Computer Science, Sichuan University, Chengdu 610065, P. R. China

E-mail Address: zhanghaixian@scu.edu.cn

Corresponding author.

Search for more papers by this author

https://doi.org/10.1142/S0129065723500600Cited by:13 (Source: Crossref)

Abstract

Deep neural networks (DNNs) have emerged as a prominent model in medical image segmentation, achieving remarkable advancements in clinical practice. Despite the promising results reported in the literature, the effectiveness of DNNs necessitates substantial quantities of high-quality annotated training data. During experiments, we observe a significant decline in the performance of DNNs on the test set when there exists disruption in the labels of the training dataset, revealing inherent limitations in the robustness of DNNs. In this paper, we find that the neural memory ordinary differential equation (nmODE), a recently proposed model based on ordinary differential equations (ODEs), not only addresses the robustness limitation but also enhances performance when trained by the clean training dataset. However, it is acknowledged that the ODE-based model tends to be less computationally efficient compared to the conventional discrete models due to the multiple function evaluations required by the ODE solver. Recognizing the efficiency limitation of the ODE-based model, we propose a novel approach called the nmODE-based knowledge distillation (nmODE-KD). The proposed method aims to transfer knowledge from the continuous nmODE to a discrete layer, simultaneously enhancing the model’s robustness and efficiency. The core concept of nmODE-KD revolves around enforcing the discrete layer to mimic the continuous nmODE by minimizing the KL divergence between them. Experimental results on 18 organs-at-risk segmentation tasks demonstrate that nmODE-KD exhibits improved robustness compared to ODE-based models while also mitigating the efficiency limitation.

Keywords:

1. Introduction

Segmentation plays a crucial role in clinical practice, e.g. the computed tomography (CT)-based organs-at-risk (OARs) and clinical target volume (CTV) segmentation,^1,2 the optical coherence tomography (OCT)-based macular edema segmentation,³ and endoscopy image-based colon glands segmentation,⁴ etc. Precisely segmenting the targets contributes to the quantitative diagnosis and treatment, thus bringing significant impact on the clinical trial. Due to their benefits from their powerful learning capability, deep neural networks (DNNs) have been successfully applied in various tasks.^{5,6,7,8,9,10,11,12,13,14} For the medical image segmentation task, both the fully convolutional networks (FCNs)¹⁵ and the transformer-based network¹⁶ are commonly used. Among these methods, perhaps one of the most well-known medical image segmentation models is the U-Net,¹⁷ a succinct segmentation architecture that is composed of an encoder, a decoder, and multiple shortcut connections between them. The encoder’s objective is to extract abstract features from the inputs, while the decoder endeavors to recognize the target using the extracted features. Additionally, shortcut connections facilitate information flow within the network. Numerous variants based on the vanilla U-Net architecture have been proposed, showcasing promising segmentation performance across various tasks. Furthermore, the recently introduced no-new-Net (nnU-Net)¹⁸ has achieved state-of-the-art results on multiple benchmark tasks, further demonstrating the remarkable capabilities of the U-Net architecture.

Despite the numerous encouraging results reported in the literature, it is widely acknowledged that the successful application of DNNs heavily relies on large-scale, high-quality dataset. However, the dataset requirements pose challenges for medical segmentation models. Moreover, obtaining accurate segmentation annotations is difficult due to the intricate characteristics of the target and the wide range of expertise among annotators. It is known that many targets (e.g. CTV of the tumor or temporal lobes of the brain) lack clear boundaries, leading to high inter- and intra-annotation variations. Given the uncertainty in the annotations, it is crucial for the segmentation model to exhibit robustness against potential label noise in the training dataset.

To assess the robustness of the widely used U-Net model, we deliberately disrupt the label of the training dataset. Specifically, we erase 25% of the masks from the slices near the top and bottom, resulting in a total of 50% masked slices depicted in Fig. 1. It is important to note that the label of the test dataset remains intact since our objective is to evaluate the performance of U-Net using the clean and noisy training datasets separately. For the clean training dataset, U-Net achieves the DSC value of 0.9023 on the test dataset. However, when noise is injected into the labels of the training dataset, the DSC value drops to 0.8102. This decline in performance is anticipated due to the powerful learning capacity of DNNs. Previous research has demonstrated that DNNs can memorize the training dataset, even when the labels are shuffled.¹⁹ This declination leads us to the first question: How can we enhance the robustness of DNNs when confronted with the noisy training dataset?

Fig. 1. (Color online) Comparison of coronal view between the whole and masked region of the spinal cord. The red region indicates the slices containing the spinal cord, and the dotted region represents the slices with empty label (i.e. the mask of the spinal cord is erased). The table below the image compares the dice similarity coefficient (DSC) value of the test dataset between the vanilla U-Net and U-Net with nmODE.

Fortunately, we have discovered a recently proposed ordinary differential equations (ODEs)-based model called neural memory ODE (nmODE),²⁰ demonstrating remarkable robustness against noise. Unlike conventional layer-wise discrete models, the nmODE is a continuous model that incorporates separate learning and memory neurons, resulting in clear dynamic properties. The detailed explanation of the nmODE is given in Sec. 3.1. In our experiments, we regard the nmODE as a special module and insert it into the penultimate layer of the U-Net. The experimental results are presented in the second row of the table shown in Fig. 1. It can be observed that the U-Net with nmODE performs comparably with the vanilla U-Net on the clean training dataset while exhibiting a significant advantage over the vanilla U-Net on the noisy training dataset. The superiority of the U-Net with nmODE can be attributed to the global attractor property inherent in the nmODE, which serves as an efficient memory mechanism within the model. By separating the learning from memory, the nmODE could learn to correct the parameters through the dynamical system, enhancing the robustness of the segmentation model to noise in the training dataset.

Even the U-Net with nmODE demonstrates impressive performance against perturbations in the training dataset, it has to be acknowledged that the continuous ODE model is less computationally efficient compared to the conventional discrete layer-wise model. This inefficiency arises from the iterative computation process in the ODE solver, which evaluates the ODE multiple times to accomplish the integration. The limitation in efficiency hinders the application of the U-Net with nmODE in scenarios that require timely segmentation results. This leads us to the second question: How can we enhance the segmentation model’s robustness while maintaining its computational efficiency? Our solution to this question is straightforward: let the discrete model mimics the behavior of the continuous model. Suppose the input is denoted as x, and the outputs of the U-Net with nmODE and vanilla U-Net are represented as $\tilde{F} (x)$ and $F (x)$ , respectively. During the training phase, our objective is to minimize the distance between $\tilde{F} (x)$ and $F (x)$ . During the inference, only $F (x)$ is used for prediction, eliminating the computational cost of the nmODE. Specifically, by utilizing the Kullback–Leibler (KL) divergence to measure the distance between $\tilde{F} (x)$ and $F (x)$ , we can obtain the knowledge distillation (KD) framework. We refer to this proposed method as nmODE-KD, an architecture that distills knowledge from nmODE to the discrete model, thereby achieving both robustness and efficiency simultaneously. Overall, the contributions of the paper can be summarized as follows:

(i)	This paper empirically demonstrates that the nmODE method not only enhances the accuracy of the segmentation network but also prompts its robustness against label noise in the training dataset.
(ii)	A novel architecture called nmODE-KD is proposed, which aims to transfer knowledge from the nmODE to the discrete layer-wise segmentation network.
(iii)	Rigorous experiments on 18OARs segmentation tasks in head-neck prove the effectiveness of the proposed nmODE-KD method.

2. Related Works

2.1. Neural ordinary differential equations

The proposition of neural ordinary differential equation (NODE)²¹ has revolutionized the field of DNNs, changing the architecture of networks from discrete limited layer to continuous unlimited paradigm. Unlike the conventional discrete DNNs that have fixed architecture, the NODE is a specialized model that implicitly maps the input to the output. The NODE has higher non-linearity, clearer dynamics behavior, and stronger fitting capacity when compared with the discrete model. The easy-to-use libraries such as torchdiffeq^a implement multiple ODE solvers, making it possible to use NODE for practical applications. After the prevalence of NODE, many variants have been proposed. Dupont et al.²² pointed out that the NODE would preserve the topology of the input space, resulting in constrained approximation capabilities of NODE. The authors propose to augment the NODE with additional dimensions to learn complex mapping. Gholami et al.²³ observed that the adjoint method²¹ is numerically unstable for specific activation functions. The gradient is inaccurate when the time steps are small. They address the observed problems by incorporating checkpointing method while keeping the same computational cost as the NODE.

Solving the NODE requires evaluating the differential equation multiple times, which is referred as the number of function evaluation (NFE). Reducing the NFE without a significant decline in accuracy is desired since it can enhance the efficiency of NODE for applications. Based on this consideration, Kelly et al.²⁴ introduced a differentiable regularization term by the Kth order of the state with respect to time, leading to simpler trajectories that are easy to solve. Kidger et al.²⁵ replaced the commonly used $L^{2}$ norm with a seminorm to judge whether the adaptive step in the ODE solver is accepted or rejected. Experiments on multiple tasks show that the proposed improvement reduces 40% NFE. Ghosh et al.²⁶ proposed a regularization method that randomly samples the end time in the integration of the ODE solver. The regularization method contributes to decreased training time and increased accuracy compared with the baseline approaches.

One of the most significant benefits of NODE lies in its robustness, which is also a crucial inherent limitation for conventional discrete DNNs. Literature reports that the NODE is robust to random perturbations and adversarial attacks. For example, Hanshu et al.²⁷ empirically verify the robustness of NODE on multiple benchmark datasets. The authors further analyze the robustness and propose a time-invariant steady neural ODE (TisODE) model to advance the robustness. Cui et al.²⁸ propose an activation function named half-Swish to enhance the stability of NODE. Experimental results indicate that the proposed activation function outperforms the basic ones on robustness. Rather than the literature that solely concerns efficiency or robustness, this paper strives for a balance between efficiency and robustness to broaden the applications of NODE.

2.2. Medical image segmentation

Image segmentation plays a vital role in various fields.^29,30,31 Medical image segmentation,^32,33,34 the segmentation task with inputted medical images, is a common requirement during clinical practice. As a biologically inspired model,^{35,36,37,38,39} DNNs have been ubiquitously used in medical image segmentation tasks. The well-known U-Net architecture provides a strong baseline among the numerous types of DNNs. Plenty of U-Net variants deliver promising performance in medical image segmentation tasks. For example, Li et al.⁴⁰ proposed the hybrid densely connected U-Net (H-DenseUNet) that uses 2D DenseUNet⁴¹ to extract the intra-slice feature and 3D counterpart module to build the inter-slice relationship. The H-DenseUNet achieves promising results on the OARs and tumor segmentation tasks. Oktay et al.⁴² incorporated the attention module⁴³ into the U-Net to enhance the feature extracted by the encoder, obtaining superior performance on multiple OARs segmentation tasks over the U-Net. The nnU-Net¹⁸ leverages a self-configuring pipeline to achieve state-of-the-art records on multiple public medical segmentation datasets without manual intervention. Yu et al.⁴⁴ designed a self-supervised method that leverages the distance between two slices as the pretext task. The pre-trained model is later transferred into the downstream OARs segmentation tasks to increase the accuracy. To balance the efficiency and accuracy between 2D and 3D convolutions in medical image segmentation tasks, MixConvNet⁴⁵ leverages a mixture of 2D convolutions from different views to replace the 3D convolution. Besides fully supervised learning, semi-supervised learning that uses a large number of unannotated datasets has also been considered in medical image segmentation.⁴⁶ Despite the performance improvement reported in the literature, few studies have paid attention to the robustness of the segmentation model, which plays an important role during clinical practice. This limitation motivates us to leverage a more advanced approach to enhance the robustness of the segmentation model.

Currently, some research attempts to use the NODE to strengthen the performance of the segmentation model. For instance, Pinckaers and Litjens⁴⁷ incorporated the NODE into the U-Net to better exploit the semantic features of the colon glands. Cheng et al.⁴⁸ proposed a second-order NODE-based model and achieved promising results on six benchmark segmentation datasets. Despite the progress achieved by the NODE in the medical image segmentation tasks, only a few works have paid attention to the segmentation efficiency, which determines the usability of the NODE-based segmentation model in clinical practice. This paper empirically verifies the robustness of the ODE-based model for medical image segmentation tasks. Moreover, the ODE-based model’s efficiency is also considered by leveraging the knowledge distillation paradigm.

3. Methodology

In this section, we will begin by providing a brief overview of the NODE for a better understanding of the relationship between nmODE and conventional discrete DNNs. Suppose the output of layer l is denoted as $a^{l}$ . Then the forward computation between layer l and $l + 1$ in discrete DNNs can be represented as $a^{l + 1} = f (a^{l}; W)$ , where f typically consists of the linear transformation, normalization, and nonlinear activation function. One of the most well-known structures in DNNs is the residual connection, which is the foundation of residual networks (ResNets).⁴⁹ The residual connection is defined as $a^{l + 1} = a^{l} + f (a^{l}; W)$ , where the shortcut connection $a^{l}$ enables the construction of a network with significantly increased depth. By considering layer l as time t and transforming feature $a^{l}$ into representation $y (t)$ , we then obtain the equation $y (t + 1) - y (t) = f (y (t); W)$ . Taking the limit as t approaches infinity on the left side, we then derive the NODE that can be formulated as follows :

ẏ (t) = f (y (t); W) . <math display="block" altimg="eq-00018.gif"><mi>ẏ</mi><mo stretchy="false">(</mo><mi>t</mi><mo stretchy="false">)</mo><mo>=</mo><mi>f</mi><mfenced separators="" open="(" close=")"><mrow><mi>y</mi><mo stretchy="false">(</mo><mi>t</mi><mo stretchy="false">)</mo><mo>;</mo><mi>W</mi></mrow></mfenced><mo>.</mo></math> (1)

By providing the initial internal input $y (0)$ and employing the ODE solver to solve the above differential equation, we can subsequently acquire the solution $y (T)$ . The NODE can be seamlessly incorporated into the architecture of DNNs as a specialized layer, where the feature $a^{l}$ is commonly considered as the initial internal input $y (0)$ . The computation principle of the NODE is illustrated in Fig. 2(a). The optimization of the NODE can be accomplished using the backpropagation or the adjoint²¹ approaches, which are known as discretize-then-optimize and optimize-then-discretize⁵⁰ methods, respectively. The former method can compute precise gradients and has the advantage of speed, but it consumes $𝒪 (T)$ memory since it needs to store all the intermediate variables for backpropagation. On the other hand, the latter method has a constant memory cost of $𝒪 (1)$ , but it is slower compared to the former and introduces numerical discretization errors.

Fig. 2. Comparison between the NODE and nmODE.

3.1. Neural memory ODE

The NODE is more powerful than the conventional discrete model, which can be attributed to the embedded dynamic system. However, certain literature suggests that the NODE may preserve the input space’s topology, leading to functions that the NODE is incapable of representing.²² For example, the functions $g (- 1) = 1$ and $g (1) = - 1$ . The reason lies in the fact that the trajectories of ODE cannot cross each other. Nevertheless, this limitation can be addressed by rearranging the order of inputs, i.e. regarding the data as external inputs while keeping the initial internal input fixed.²⁰ The separation of external inputs from internal inputs implies the existence of two distinct types of neurons: learning neurons and memory neurons. Learning only happens in the learning neurons, whereas the memory neurons endeavor to capture the feature’s characteristics through ODE. This motivates the architecture of nmODE, which can be formulated as follows :

{ẏ (t) = f (y (t), t, γ) γ = g (a l; W) . <math display="block" altimg="eq-00027.gif"><mfenced separators="" open="{" close=""><mrow><mtable displaystyle="true" equalrows="false" equalcolumns="false"><mtr><mtd columnalign="left"><mi>ẏ</mi><mo stretchy="false">(</mo><mi>t</mi><mo stretchy="false">)</mo><mo>=</mo><mi>f</mi><mo stretchy="false">(</mo><mi>y</mi><mo stretchy="false">(</mo><mi>t</mi><mo stretchy="false">)</mo><mo>,</mo><mi>t</mi><mo>,</mo><mi>γ</mi><mo stretchy="false">)</mo></mtd></mtr><mtr><mtd columnalign="left"><mi>γ</mi><mo>=</mo><mi>g</mi><mo stretchy="false">(</mo><msup><mrow><mi>a</mi></mrow><mrow><mi>l</mi></mrow></msup><mo>;</mo><mi>W</mi><mo stretchy="false">)</mo><mo>.</mo></mtd></mtr></mtable></mrow></mfenced></math> (2)

The

$γ$ represents the external input, and the initial internal input

$y (0)$ can be set arbitrarily (e.g.

$y (0) = 0$ ). The computation principle of nmODE is demonstrated in Fig. 2(b). Given the initial internal input

$y (0)$ and external input

$γ$ , the nmODE would output the solution

$y (T)$ . By comparing NODE with nmODE, it is evident that both models can be regarded as specialized layers to be integrated within DNNs. The distinction lies in the learning mechanism. For the NODE, learning is integrated into the ODE, where the output from the previous layer

$a^{l}$ serves as the initial value

$y (0)$ . Nevertheless, the nmODE distinguishes the learning process from the ODE by converting the learning into the transformation

$g (a^{l}; W)$ , which generates the external input

$γ$ for the ODE. The nmODE framework offers a versatile architecture for implementing nonlinear mapping. In practical implementations, Zhang²⁰ introduced a novel implementation of Eq. (2) shown as follows :

ẏ (t) = - y (t) + sin 2 (y (t) + γ) . <math display="block" altimg="eq-00038.gif"><mi>ẏ</mi><mo stretchy="false">(</mo><mi>t</mi><mo stretchy="false">)</mo><mo>=</mo><mo>-</mo><mi>y</mi><mo stretchy="false">(</mo><mi>t</mi><mo stretchy="false">)</mo><mo>+</mo><msup><mrow><mo>sin</mo></mrow><mrow><mn>2</mn></mrow></msup><mo stretchy="false">(</mo><mi>y</mi><mo stretchy="false">(</mo><mi>t</mi><mo stretchy="false">)</mo><mo>+</mo><mi>γ</mi><mo stretchy="false">)</mo><mo>.</mo></math> (3)

By providing the initial value of

$y (0)$ and external input

$γ$ , the dynamical system described in Eq. (3) would converge to only one global attractor.²⁰ The existence of a global attractor ensures that the model possesses enhanced memory properties. It is known that the knowledge of neural networks is stored in its learnable parameters, i.e. the connection weights. The model described by Eq. (3) separates the learning (

$γ$ ) from memory, thus endowing the model the capability to learn to correct the connection weights through the dynamical system, which also improves the resilience to noise in the training dataset.

3.2. Transferring knowledge from nmODE to segmentation model

Despite the superior memory properties of the nmODE compared to the conventional discrete model, one potential limitation lies in its computational efficiency. The nmODE is solved using an ODE solver, which iteratively evaluates the ODE using either the first-order (Euler) or high-order (Runge–Kutta) method. It is also observed that the NFE is increasing along with the training progress.^21,22 Yet lowering the error tolerance could reduce the number of NFE, but it also leads to decreased accuracy. The limitation of computational efficiency poses challenges for the real-time application of nmODE. It is desirable to simultaneously possess the advantages of nmODE in memory and the strengths of the discrete model in terms of speed.

Taking the aforementioned problem into consideration, we propose an architecture specifically designed to transfer the knowledge from the nmODE to the discrete model, as depicted in Fig. 3. The learning process is composed of two phases. In the first phase, we train a segmentation model with nmODE, where the nmODE is employed in the penultimate layer designed to process the abstract feature before making predictions. During the implementation, the U-Net¹⁷ is used as the segmentation model. The internal input $y (0)$ and external input $γ$ of nmODE depicted in Eq. (3) are set to 0 and the abstract feature from the convolutional layer in U-Net, respectively. The output of nmODE is subsequently passed to a $1 \times 1$ convolutional decision layer to obtain the final prediction. This prediction is regarded as the target for the discrete segmentation model to approximate. In the second phase, we proceed to train a discrete model whose architecture is almost the same as that of the first phase, except the nmODE is removed. For the discrete model, the supervision is composed of two parts. First is the distance between the prediction and the label measured by the cross entropy loss, which is designed to utilize the information contained in the label. The second supervision is derived from the prediction of the segmentation model with nmODE, which aims to transfer the knowledge from the nmODE to the discrete model. The second distance is estimated by the KL divergence between the predictions from nmODE and the discrete model. The two distances are designed to learn the information contained in the label and nmODE. In summary, the complete objective function for the discrete model can be formulated as follows :

ℒ = - (1 - λ) log p s y - λ C \sum c = 1 p t c log p s c, <math display="block" altimg="eq-00045.gif"><mi mathvariant="cal">ℒ</mi><mo>=</mo><mo>-</mo><mo stretchy="false">(</mo><mn>1</mn><mo>-</mo><mi>λ</mi><mo stretchy="false">)</mo><mo>log</mo><msubsup><mrow><mi>p</mi></mrow><mrow><mi>y</mi></mrow><mrow><mi>s</mi></mrow></msubsup><mo>-</mo><mi>λ</mi><munderover accentunder="true" accent="true"><mrow><mo>\sum</mo></mrow><mrow><mi>c</mi><mo>=</mo><mn>1</mn></mrow><mrow><mi>C</mi></mrow></munderover><msubsup><mrow><mi>p</mi></mrow><mrow><mi>c</mi></mrow><mrow><mi>t</mi></mrow></msubsup><mo>log</mo><msubsup><mrow><mi>p</mi></mrow><mrow><mi>c</mi></mrow><mrow><mi>s</mi></mrow></msubsup><mo>,</mo></math> (4)

where the two items represent the cross entropy and KL divergence balanced by the hyper-parameter

$λ$ .

$p_{y}^{s}$ denotes the prediction of the discrete model in terms of the target class y.

$p_{c}^{t}$ and

$p_{c}^{s}$ represent the predictions of the nmODE and the discrete model for the class c, respectively. By optimizing the objective shown in Eq. (4), the discrete model attempts to simultaneously approximate the label and prediction from the segmentation model integrated with nmODE.

Fig. 3. Learning process of the proposed two-phase method.

In fact, the proposed architecture adheres to the knowledge distillation paradigm, which is specifically designed to transfer knowledge from the teacher to the student. Therefore, the proposed method is named nmODE-KD, representing a model that effectively transfers knowledge from the nmODE to the discrete model. A clear distinction between conventional KD and the proposed nmODE-KD lies in the design of the teacher. In the nmODE-KD approach, the discrete model (student) not only benefits from KD, but also demonstrates robustness against potential noise present in the training dataset. We thoroughly validate the effectiveness of nmODE-KD in Sec. 4.

4. Experiments

In this section, we first describe the used medical image segmentation datasets and experimental setup. Then we present the results obtained from the clean training dataset, the noisy training dataset, and the visualization of features in nmODE.

4.1. Datasets and experimental setup

The UaNet,¹ an open source dataset for head-neck OARs segmentation, is used in the experiments. The UaNet contains various soft and bone tissues that are taken into account during head-neck radiotherapy. These tissues can be found in Table 1. Due to the majority of samples only consisting of a single foreground slice for the optic chiasm and hypophysis, these two OARs are excluded from the experiments. The dataset is split into training, validation, and test according to the ratio of 0.70:0.15:0.15. The quantity of samples in each part is near the number 90:20:20. The OARs that are categorized into left and right types (e.g. left eye and right eye) are consolidated into a single class to eliminate the influence of location.

**Table 1. Comparison of models trained by the clean training dataset.**
ROI	DSC of U-Net	DSC of convODE	DSC of nmODE
Brachial plexus	0.6636	0.6624	0.6703
Brain stem	0.9150	0.9097	0.9184
Constrictor naris	0.7723	0.7712	0.7751
Ear	0.8055	0.8074	0.8081
Eye	0.9181	0.9252	0.9245
Larynx	0.9205	0.9206	0.9231
Lens	0.7717	0.7683	0.7745
Mandible	0.8682	0.8644	0.8720
Optic nerve	0.7159	0.7137	0.7218
Oral cavity	0.8678	0.8722	0.8731
Parotid	0.8480	0.8509	0.8513
SMG	0.8125	0.8105	0.8172
Spinal cord	0.9023	0.9064	0.9049
Sublingual gland	0.6009	0.6044	0.6148
Temporal lobe	0.8865	0.8769	0.8975
Thyroid	0.8169	0.8250	0.8226
TMJ	0.8470	0.8453	0.8505
Trachea	0.9241	0.9262	0.9277
Average	0.8254	0.8256	0.8304

The U-Net¹⁷ serves as the baseline to validate the proposed method. It is possible to substitute U-Net with a more advanced network to further enhance performance. However, our primary focus lies in assessing the robustness of the proposed method rather than breaking state-of-the-art records. The Adam⁵¹ method is used as the optimizer with a learning rate of $1 0^{- 4}$ . The ODE is solved by the discretize-then-optimize approach with the dopri5 in the torchdiffeq package.²¹ $λ$ in Eq. 4 is set to $0.1$ . The model is trained by 200 epochs. The DSC is used as the metric to evaluate the performance, which is defined as $DSC = \frac{2 | V_{p} ⋂ V_{g} |}{| V_{p} | + | V_{g} |}$ , where $V_{p}$ and $V_{g}$ represent the volume of prediction and ground truth, respectively. The DSC is designed to quantify the degree of overlap between the prediction and label, with values ranging from 0 to 1. A value of 0 indicates no overlap between the prediction and the label, while a value of 1 signifies a perfect match between the prediction and the label. The experimental results are reported on the test dataset using the model that achieved the highest DSC on the validation dataset.

4.2. Experiments on clean training dataset

We first carry out experiments by using the clean training dataset to evaluate the impact of ODE on the segmentation tasks. The comparison of the vanilla U-Net,¹⁷ convODE,⁴⁷ and nmODE²⁰ is shown in Table 1. The convODE is the first attempt that incorporates the ODE depicted in Fig. 2(a) into the colon gland segmentation model. It is evident that nmODE consistently achieves higher DSC scores than the vanilla U-Net for all OARs. Comparing convODE to nmODE, we observe that nmODE outperforms convODE for the majority of OARs, except for the eye, spinal cord, and thyroid. The performance of nmODE summarized in Table 1 suggests that it can serve as a potentially useful module to enhance segmentation accuracy. However, this improvement comes at the expense of significantly increased computational costs. The training and inference time of U-Net, convODE, and nmODE are summarized in Table 2. The substantially increased inference time for convODE and nmODE hampers the practical deployment of ODE-based models. But after transferring the knowledge from ODE to the discrete layer, the inference time is the same as that of the U-Net student model, which can increase the efficacy significantly. Additionally, the presence of potential noise in the training dataset poses challenges to the segmentation model. In the next subsection, we further conduct experiments by using the noisy training dataset.

**Table 2. The training and inference time (seconds) per batch of each model.**
	U-Net	convODE	nmODE
Training	0.52	1.43	1.58
Inference	0.33	1.08	1.27

4.3. Experiments on noisy training dataset

The masked training dataset, which erases 25% of the masks from slices near the top and bottom, is utilized to evaluate the performance of the models. The experimental results are presented in Table 3. By comparing the vanilla U-Net, convODE, and nmODE, it becomes evident that the nmODE exhibits a significant advantage among the three methods. The nmODE outperforms the vanilla U-Net on all the OARs, with the most notable improvement observed in the optic nerve, where the DSC increases from 0.4529 to 0.5128. Furthermore, the nmODE demonstrates superiority over the convODE on most OARs, except for the lens. This improvement indicates that the nmODE is more robust than the convODE when dealing with noise in the training dataset. We also compare the ODE-based model with the ODE-KD models, namely convODE-KD and nmODE-KD, as shown in the right part of Table 3. It is observed that the nmODE-KD further enhances the performance compared to the nmODE. For instance, the DSC values of the spinal cord for the U-Net, nmODE, and nmODE-KD are 0.8102, 0.8611, and 0.8665, respectively. Moreover, the nmODE-KD consistently achieves higher DSC than the convODE-KD for all the OARs. The paired t-test between the vanilla U-Net and nmODE, as well as the nmODE and nmODE-KD, are also conducted. Both p-values are very small ( $< 1 0^{- 4}$ ), indicating that the results are statistically different.

**Table 3. Comparison of models trained by the masked training dataset.**
ROI	DSC of U-Net	DSC of convODE	DSC of nmODE	DSC of convODE-KD	DSC of nmODE-KD
Brachial plexus	0.5937	0.5785	0.5985	0.5811	0.6256
Brain stem	0.8693	0.8734	0.8783	0.8770	0.8813
Constrictor naris	0.6848	0.6843	0.7023	0.6910	0.7264
Ear	0.7180	0.7074	0.7331	0.7508	0.7623
Eye	0.8958	0.8884	0.9068	0.8875	0.9102
Larynx	0.8061	0.8005	0.8158	0.8347	0.8672
Lens	0.5301	0.6003	0.5814	0.5972	0.6611
Mandible	0.7652	0.7923	0.7970	0.7851	0.8054
Optic nerve	0.4529	0.4893	0.5128	0.4746	0.5325
Oral cavity	0.8048	0.7993	0.8161	0.8006	0.8212
Parotid	0.8098	0.8068	0.8196	0.8110	0.8274
SMG	0.7597	0.7446	0.7818	0.7750	0.8058
Spinal cord	0.8102	0.8260	0.8611	0.8443	0.8665
Sublingual gland	0.4148	0.4001	0.4239	0.4654	0.4690
Temporal lobe	0.8191	0.8249	0.8350	0.8138	0.8594
Thyroid	0.7443	0.7193	0.7624	0.7448	0.7945
TMJ	0.7582	0.7935	0.8094	0.8020	0.8112
Trachea	0.8274	0.8348	0.8391	0.8456	0.8589
Average	0.7258	0.7313	0.7486	0.7434	0.7714

4.4. Visualization of the feature in nmODE

To qualitatively assess the impact of nmODE in the segmentation model, we further compare the external inputs and outputs of nmODE, which is shown in Fig. 4. The external inputs and outputs of nmODE are $γ$ and $y (T)$ shown in Fig. 3, respectively. Taking the brain stem as an example, it is clear to observe that the nmODE helps to better localize the target, where $y (T)$ is closer to the label than $γ$ . A similar phenomenon can also be observed in the eye and spinal cord. The segmentation of the larynx shows an interesting result where the nmODE contributes to rectifying the contour of the target. Moreover, $y (T)$ is more confident than $γ$ as shown in Fig. 4(d). These visualization results indicate that the nmODE can be served as an effective plug-in module to adjust the feature, resulting in a higher DSC score compared to the vanilla discrete model.

Fig. 4. Visualization of features in nmODE in terms of the brain stem, eye, larynx, and spinal cord. The four images in each part represent the CT slice, external input of nmODE, output of nmODE, and label, respectively.

4.5. Ablation studies

Ablation studies are also conducted to validate the generalization of nmODE-KD, including experiments on the varied mask ratios and different model architectures. The dataset in the experiments is the spinal cord. We first inspect the impact of the mask ratio, which is shown in Table 4. It can be found that the DSC of the U-Net is decreasing along with the increment of mask ratio. For example, the DSC of U-Net is $0.8941$ when the mask ratio is 5%, and the DSC drops to $0.6332$ with the mask ratio of 75%. Nonetheless, the DSC of nmODE is consistently higher than that of the U-Net. Moreover, the gap between the nmODE and U-Net is enlarged along with the increment of mask ratio, indicating the robustness of nmODE against the noise. It can also be observed that the proposed nmODE-KD achieves the highest DSC among the three methods, implying the effectiveness of the knowledge transfer.

**Table 4. Results of the spinal cord with varied mask ratios.**
Mask Ratio	DSC of U-Net	DSC of nmODE	DSC of nmODE-KD
5%	0.8941	0.8989	0.9009
15%	0.8906	0.8983	0.8991
25%	0.8797	0.8872	0.8895
50%	0.8102	0.8611	0.8665
75%	0.6332	0.7445	0.7538

We further conduct experiments on clean and noisy datasets by applying nmODE to different network architectures, which is summarized in Table 5. Three well-known U-Net variants are considered, including the Attention U-Net,⁴² U-Net++,⁵² and UNETR.⁵³ The Attention U-Net⁴² integrates the attention mechanism into the U-Net, the U-Net++⁵² contains a series of skip connections, and the UNETR⁵³ combines the U-Net with the transformer.¹⁶ By observing the results on the clean dataset shown in the left part of Table 5, it can be found that the nmODE helps to increase the DSC despite the model’s architecture. A similar increment can also be found in the noisy dataset. These experimental results show that the nmODE can be used as a plug-and-play module contributing to the model’s performance in both clean and noisy datasets.

**Table 5. Results of the spinal cord by using models with and without nmODE. The mask ratio of the noisy dataset is 50%.**
	Clean dataset		Noisy dataset
Model	without nmODE	with nmODE	without nmODE	with nmODE
U-Net	0.9023	0.9049	0.8102	0.8611
Attention U-Net	0.9025	0.9051	0.8110	0.8182
U-Net++	0.9033	0.9047	0.8068	0.8134
UNETR	0.8822	0.8872	0.7663	0.7844

5. Conclusion

This paper empirically verifies the effectiveness of nmODE in CT-based OARs segmentation tasks. The nmODE is a new continuous ODE-based model with only one global attractor in theory. However, its application and efficacy in medical image analysis tasks are unclear, which is the main motivation of this work. For its application, we rigorously demonstrate the nmODE own robustness against the label noise in the training dataset. Besides, the nmODE-KD is proposed to transfer the knowledge from the nmODE to the discrete layer. Thus the model can also benefit from the computational efficiency in the discrete layer. Experimental results show that the nmODE-KD can further improve the segmentation accuracy. By visualizing and comparing the inputs and outputs of nmODE, it is clear that the nmODE contributes to rectifying the feature extracted by the U-Net.

It is also possible to extend the method into other modalities (e.g. magnetic resonance imaging) and 3D analysis tasks.^54,55 Besides the false negative noise introduced in the experiments, verifying its robustness against the potential false positive noise is also desirable. Currently, the prompt-guided general segmentation model is prevalent in computer vision and medical image analysis. In future works, we attempt to leverage the proposed nmODE-KD to develop the general segmentation model, increasing its robustness and applicability in medical image analysis tasks. Moreover, it is worthwhile to integrate the nmODE into more advanced algorithms, such as the neural dynamic classification algorithm,^56,57 ensemble learning,⁵⁸ self-supervised learning,⁵⁹ etc.

Acknowledgments

This work was supported by the National Major Science and Technology Projects of China under Grant 2018AAA0100201, National Natural Science Foundation of China under Grant 62106162, China Postdoctoral Science Foundation under Grant 2021M692269, and Sichuan University Postdoctoral Science Foundation under Grant 2022SCU12080.

ORCID

Junjie Hu https://orcid.org/0000-0002-5750-0511

Chengrong Yu https://orcid.org/0009-0004-1238-7414

Zhang Yi https://orcid.org/0000-0002-5867-9322

Haixian Zhang https://orcid.org/0000-0002-9821-508X

Notes

^a https://github.com/rtqichen/torchdiffeq.

References

1. H. Tang, X. Chen, Y. Liu, Z. Lu, J. You, M. Yang, S. Yao, G. Zhao, Y. Xu, T. Chen et al., Clinically applicable deep learning framework for organs at risk delineation in ct images, Nat. Mach. Intell. 1(10) (2019) 480–491. Crossref, Google Scholar
2. J. Hu, Y. Song, L. Zhang, S. Bai and Z. Yi, Multi-scale attention u-net for segmenting clinical target volume in graves’ ophthalmopathy, Neurocomputing 427 (2021) 74–83. Crossref, Web of Science, Google Scholar
3. J. Hu, Y. Chen and Z. Yi, Automated segmentation of macular edema in OCT using deep neural networks, Med. Image Anal. 55 (2019) 216–227. Crossref, Medline, Web of Science, Google Scholar
4. K. Sirinukunwattana, J. P. Pluim, H. Chen, X. Qi, P.-A. Heng, Y. B. Guo, L. Y. Wang, B. J. Matuszewski, E. Bruni, U. Sanchez et al., Gland segmentation in colon histology images: The glas challenge contest, Med. Image Anal. 35 (2017) 489–502. Crossref, Medline, Web of Science, Google Scholar
5. N. K. Benamara, M. Val-Calvo, J. R. Alvarez-Sanchez, A. Diaz-Morcillo, J. M. Ferrandez-Vicente, E. Fernandez-Jover and T. B. Stambouli, Real-time facial expression recognition using smoothed deep neural network ensemble, Integr. Comput.-Aided Eng. 28(1) (2021) 97–111. Crossref, Web of Science, Google Scholar
6. E. Macias-Garcia, D. Galeana-Perez, J. Medrano-Hermosillo and E. Bayro-Corrochano, Multi-stage deep learning perception system for mobile robots, Integr. Comput.-Aided Eng. 28(2) (2021) 191–205. Crossref, Web of Science, Google Scholar
7. A. Hassanpour, M. Moradikia, H. Adeli, S. R. Khayami and P. Shamsinejadbabaki, A novel end-to-end deep learning scheme for classifying multi-class motor imagery electroencephalography signals, Expert Syst. 36(6) (2019) e12494. Crossref, Web of Science, Google Scholar
8. G. B. Martins, J. P. Papa and H. Adeli, Deep learning techniques for recommender systems based on collaborative filtering, Expert Syst. 37(6) (2020) e12647. Crossref, Web of Science, Google Scholar
9. H. S. Nogay and H. Adeli, Machine learning (ml) for the diagnosis of autism spectrum disorder (asd) using brain imaging, Rev. Neurosci. 31(8) (2020) 825–841. Crossref, Web of Science, Google Scholar
10. H. S. Nogay and H. Adeli, Detection of epileptic seizure using pretrained deep convolutional neural network and transfer learning, Eur. Neurol. 83(6) (2021) 602–614. Crossref, Web of Science, Google Scholar
11. H. S. Nogay and H. Adeli, Diagnostic of autism spectrum disorder based on structural brain mri images using, grid search optimization, and convolutional neural networks, Biomed. Signal Process. Control 79 (2023) 104234. Crossref, Web of Science, Google Scholar
12. M. J. Gómez-Silva, A. de la Escalera and J. M. Armingol, Back-propagation of the Mahalanobis istance through a deep triplet learning model for person re-identification, Integr. Comput.-Aided Eng. 28(3) (2021) 277–294. Crossref, Web of Science, Google Scholar
13. T. Gao, Z. Li, Y. Gao, P. Schonfeld, X. Feng, Q. Wang and Q. He, A deep reinforcement learning approach to mountain railway alignment optimization, Comput.-Aided Civ. Infrastruct. Eng. 37(1) (2022) 73–92. Crossref, Web of Science, Google Scholar
14. C.-X. Liu and G.-L. Yu, A deep learning model for the topological design of 2d periodic wave barriers, Comput.-Aided Civ. Infrastruct. Eng. 37(4) (2022) 451–469. Crossref, Web of Science, Google Scholar
15. J. Long, E. Shelhamer and T. Darrell, Fully convolutional networks for semantic segmentation, in Proc. IEEE Conference on Computer Vision and Pattern Recognition (2015), pp. 3431–3440. Crossref, Google Scholar
16. A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser and I. Polosukhin, Attention is all you need, in Advances in Neural Information Processing Systems, Vol. 30 (2017), pp. 5998–6008. Google Scholar
17. O. Ronneberger, P. Fischer and T. Brox, U-net: Convolutional networks for biomedical image segmentation, in Medical Image Computing and Computer-Assisted Intervention (Springer, 2015), pp. 234–241. Crossref, Google Scholar
18. F. Isensee, P. F. Jaeger, S. A. Kohl, J. Petersen and K. H. Maier-Hein, nnu-net: A self-configuring method for deep learning-based biomedical image segmentation, Nat. Methods 18(2) (2021) 203–211. Crossref, Medline, Web of Science, Google Scholar
19. C. Zhang, S. Bengio, M. Hardt, B. Recht and O. Vinyals, Understanding deep learning requires rethinking generalization, Commun. ACM 64(3) (2021) 107–115. Crossref, Web of Science, Google Scholar
20. Z. Yi, nmODE: Neural memory ordinary differential equation, Artif. Intell. Rev. (2023). https://doi.org/10.1007/s10462-023-10496-2 Crossref, Web of Science, Google Scholar
21. R. T. Chen, Y. Rubanova, J. Bettencourt and D. K. Duvenaud, Neural ordinary differential equations, in Advances in Neural Information Processing Systems Vol. 31 (2018), pp. 6571–6583. Google Scholar
22. E. Dupont, A. Doucet and Y. W. Teh, Augmented neural ODEs, in Advances in Neural Information Processing Systems, Vol. 32 (2019), pp. 3140–3150. Google Scholar
23. A. Gholami, K. Keutzer and G. Biros, Anode: Unconditionally accurate memory-efficient gradients for neural ODEs (2019). arXiv:1902.10298 Google Scholar
24. J. Kelly, J. Bettencourt, M. J. Johnson and D. K. Duvenaud, Learning differential equations that are easy to solve, in Advances in Neural Information Processing Systems, Vol. 33 (Curran Associates Inc., 2020), pp. 4370–4380. Google Scholar
25. P. Kidger, R. T. Chen and T. J. Lyons, “Hey, that’s not an ode”: Faster ode adjoints via seminorms, International Conference on Machine Learning, 18–24 July 2021, Hawaii, US, pp. 5443–5452. Google Scholar
26. A. Ghosh, H. Behl, E. Dupont, P. Torr and V. Namboodiri, Steer: Simple temporal regularization for neural ODE, in Advances in Neural Information Processing Systems, Vol. 33 (2020), pp. 14831–14843. Google Scholar
27. Y. Hanshu, D. Jiawei, T. Vincent and F. Jiashi, On robustness of neural ordinary differential equations, Int. Conf. Learning Representations, 26 April 2020, Addis Ababa, Ethiopia. Google Scholar
28. W. Cui, H. Zhang, H. Chu, P. Hu and Y. Li, On robustness of neural odes image classifiers, Inf. Sci. 632 (2023) 576–593. Crossref, Web of Science, Google Scholar
29. J. Lin, L. Ma and Y. Yao, A spectrum-domain instance segmentation model for casting defects, Integr. Comput.-Aided Eng. 29(1) (2022) 63–82. Crossref, Web of Science, Google Scholar
30. Z. Wang, Y. Zhang, K. M. Mosalam, Y. Gao and S.-L. Huang, Deep semantic segmentation for visual understanding on construction sites, Comput.-Aided Civ. Infrastruct. Eng. 37(2) (2022) 145–162. Crossref, Web of Science, Google Scholar
31. Y. Wu, Y. Qin, Y. Qian, F. Guo, Z. Wang and L. Jia, Hybrid deep learning architecture for rail surface segmentation and surface defect detection, Comput.-Aided Civ. Infrastruct. Eng. 37(2) (2022) 227–244. Crossref, Web of Science, Google Scholar
32. G. Mirzaei and H. Adeli, Segmentation and clustering in brain mri imaging, Rev. Neurosci. 30(1) (2018) 31–44. Crossref, Medline, Web of Science, Google Scholar
33. G. Mirzaei, A. Adeli and H. Adeli, Imaging and machine learning techniques for diagnosis of Alzheimer’s disease, Rev. Neurosci. 27(8) (2016) 857–870. Crossref, Medline, Web of Science, Google Scholar
34. G. Mirzaei and H. Adeli, Resting state functional magnetic resonance imaging processing techniques in stroke studies, Rev. Neurosci. 27(8) (2016) 871–885. Crossref, Medline, Web of Science, Google Scholar
35. H. S. Park and H. Adeli, Distributed neural dynamics algorithms for optimization of large steel structures, J. Struct. Eng. 123(7) (1997) 880–888. Crossref, Web of Science, Google Scholar
36. N. Siddique and H. Adeli, Harmony search algorithm and its variants, Int. J. Pattern Recogn. Artif. Intell. 29(08) (2015) 1539001. Link, Web of Science, Google Scholar
37. J. Wang, D. Zhong, H. Adeli, D. Wang and M. Liu, Smart bacteria-foraging algorithm-based customized kernel support vector regression and enhanced probabilistic neural network for compaction quality assessment and control of earth-rock dam, Expert Syst. 35(6) (2018) e12357. Crossref, Web of Science, Google Scholar
38. S. I. Hossain, M. Akhand, M. Shuvo, N. Siddique and H. Adeli, Optimization of university course scheduling problem using particle swarm optimization with selective search, Expert Syst. Appl. 127 (2019) 9–24. Crossref, Web of Science, Google Scholar
39. M. Akhand, S. I. Ayon, S. Shahriyar, N. Siddique and H. Adeli, Discrete spider monkey optimization for travelling salesman problem, Appl. Soft Comput. 86 (2020) 105887. Crossref, Web of Science, Google Scholar
40. X. Li, H. Chen, X. Qi, Q. Dou, C.-W. Fu and P.-A. Heng, H-denseunet: Hybrid densely connected unet for liver and tumor segmentation from CT volumes, IEEE Trans. Med. Imaging 37(12) (2018) 2663–2674. Crossref, Medline, Web of Science, Google Scholar
41. G. Huang, Z. Liu, L. Van Der Maaten and K. Q. Weinberger, Densely connected convolutional networks, in Proc. IEEE Conf. Computer Vision and Pattern Recognition, 26 July 2017, Hawaii, US, pp. 4700–4708. Google Scholar
42. O. Oktay, J. Schlemper, L. L. Folgoc, M. Lee, M. Heinrich, K. Misawa, K. Mori, S. McDonagh, N. Y. Hammerla, B. Kainz et al., Attention u-net: Learning where to look for the pancreas (2018). arXiv:1804.03999 Google Scholar
43. D. Bahdanau, K. H. Cho and Y. Bengio, Neural machine translation by jointly learning to align and translate, 3rd Int. Conf. Learning Representations, ICLR 2015, 7 May 2015, San Diego, US. Google Scholar
44. C. Yu, J. Hu, G. Li, S. Zhu, S. Bai and Z. Yi, Segmentation for regions of interest in radiotherapy by self-supervised learning, Knowl.-Based Syst. 256 (2022) 109370. Crossref, Web of Science, Google Scholar
45. J. Wang, L. Zhang and Y. Zhang, Mixture 2d convolutions for 3d medical image segmentation, Int. J. Neural Syst. 33(01) (2023) 2250059. Link, Web of Science, Google Scholar
46. Y. Hua, X. Shu, Z. Wang and L. Zhang, Uncertainty-guided voxel-level supervised contrastive learning for semi-supervised medical image segmentation, Int. J. Neural Syst. 32(04) (2022) 2250016. Link, Web of Science, Google Scholar
47. H. Pinckaers and G. Litjens, Neural ordinary differential equations for semantic segmentation of individual colon glands (2019). arXiv:1910.10470 Google Scholar
48. C.-W. Cheng, C. Runkel, L. Liu, R. H. Chan, C.-B. Schönlieb and A. I. Aviles-Rivero, Continuous u-net: Faster, greater and noiseless (2023). arXiv:2302.00626 Google Scholar
49. K. He, X. Zhang, S. Ren and J. Sun, Deep residual learning for image recognition, in Proc. IEEE Conf. Computer Vision and Pattern Recognition, 26 June 2016, Las Vegas, US, pp. 770–778. Google Scholar
50. P. Kidger, On neural differential equations, Ph.D. Thesis, University of Oxford (2021). Google Scholar
51. D. P. Kingma and J. Ba, Adam: A method for stochastic optimization (2014). arXiv:1412.6980 Google Scholar
52. Z. Zhou, M. M. Rahman Siddiquee, N. Tajbakhsh and J. Liang, Unet++: A nested u-net architecture for medical image segmentation, in Deep Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support: 4th International Workshop, DLMIA 2018, and 8th International Workshop, ML-CDS 2018, Held in Conjunction with MICCAI 2018, Granada, Spain, September 20, 2018, Proceedings 4, (Springer, 2018), pp. 3–11. Crossref, Google Scholar
53. A. Hatamizadeh, Y. Tang, V. Nath, D. Yang, A. Myronenko, B. Landman, H. R. Roth and D. Xu, Unetr: Transformers for 3d medical image segmentation, in Proc. IEEE/CVF Winter Conf. Applications of Computer Vision, 4 January 2022, Waikoloa, US, pp. 574–584. Google Scholar
54. Y. Liang, F. He, X. Zeng and J. Luo, An improved loop subdivision to coordinate the smoothness and the number of faces via multi-objective optimization, Integr. Comput.-Aided Eng. 29(1) (2022) 23–41. Crossref, Web of Science, Google Scholar
55. L. Li, F. He, R. Fan, B. Fan and X. Yan, 3d reconstruction based on hierarchical reinforcement learning with transferability, Integr. Comput.-Aided Eng. 30(4) (2023) 327–339. Crossref, Web of Science, Google Scholar
56. M. H. Rafiei and H. Adeli, A new neural dynamic classification algorithm, IEEE Trans. Neural Netw. Learn. Syst. 28(12) (2017) 3074–3083. Crossref, Medline, Web of Science, Google Scholar
57. D. R. Pereira, M. A. Piteri, A. N. Souza, J. P. Papa and H. Adeli, Fema: A finite element machine for fast learning, Neural Comput. Appl. 32 (2020) 6393–6404. Crossref, Web of Science, Google Scholar
58. K. M. R. Alam, N. Siddique and H. Adeli, A dynamic ensemble learning algorithm for neural networks, Neural Comput. Appl. 32 (2020) 8675–8690. Crossref, Web of Science, Google Scholar
59. M. H. Rafiei, L. V. Gauthier, H. Adeli and D. Takabi, Self-supervised learning for electroencephalography, IEEE Trans. Neural Netw. Learn. Syst. (2022). https://doi.org/10.1109/TNNLS.2022.3190448 Crossref, Medline, Web of Science, Google Scholar

Vol. 33, No. 12

Metrics

Downloaded 707 times

History

Received 3 July 2023

Accepted 31 August 2023

Published: 23 September 2023

Information

This is an Open Access article published by World Scientific Publishing Company. It is distributed under the terms of the Creative Commons Attribution 4.0 (CC BY) License which permits use, distribution and reproduction in any medium, provided the original work is properly cited.

Keywords

PDF download