The life of humans faces several disasters that sometimes may be deadly and cause severe effects on future endeavors. One major cause of lifestyle change is the several types of disease among which some are unrecognizable, and some are untreatable. Cancer is one of the chronic diseases that occur due to the uncontrollable cell division and in the category of cancer, breast cancer is the second disease with the lowest survival rate throughout the world. Prediction of breast cancer at its early stage enhances the survival rate. To maximize the survival rate and a healthy environment, researches on various techniques of breast cancer prediction have emerged. Though these researches have paved a better way for prediction, they also had some drawbacks that minimized the efficacy of the research outcomes. To overcome all these drawbacks, the proposed self-attention-based deep Convolutional Neural Network (Self-attention DCNN) is utilized and the classifier’s performance is enhanced with the Smack Echolocation Optimization (SELO). In addition, a better segmentation method called SELO-optimized ensemble masked Region-based CNN (SELO-mask RCNN) is introduced in the research that segments the region of interest to incorporate the fine-grained segmentation as well as the semantic understanding of the image. The performance of research is evaluated with F1-measure, Precision, and Recall that achieved 98.27%, 99.21%, and 98.99%, respectively.
The combination of different sources of information is currently one of the most relevant aspects in the diagnostic process of several diseases. In the field of neurological disorders, different imaging modalities providing structural and functional information are frequently available. Those modalities are usually analyzed separately, although a joint of the features extracted from both sources can improve the classification performance of Computer-Aided Diagnosis (CAD) tools. Previous studies have computed independent models from each individual modality and combined them in a subsequent stage, which is not an optimum solution. In this work, we propose a method based on the principles of siamese neural networks to fuse information from Magnetic Resonance Imaging (MRI) and Positron Emission Tomography (PET). This framework quantifies the similarities between both modalities and relates them with the diagnostic label during the training process. The resulting latent space at the output of this network is then entered into an attention module in order to evaluate the relevance of each brain region at different stages of the development of Alzheimer’s disease. The excellent results obtained and the high flexibility of the method proposed allow fusing more than two modalities, leading to a scalable methodology that can be used in a wide range of contexts.
Reinforcement learning (RL) is a powerful technique that allows agents to learn optimal decision-making policies through interactions with an environment. However, traditional RL algorithms suffer from several limitations such as the need for large amounts of data and long-term credit assignment, i.e. the problem of determining which actions actually produce a certain reward. Recently, Transformers have shown their capacity to address these constraints in this area of learning in an offline setting. This paper proposes a framework that uses Transformers to enhance the training of online off-policy RL agents and address the challenges described above through self-attention. The proposal introduces a hybrid agent with a mixed policy that combines an online off-policy agent with an offline Transformer agent using the Decision Transformer architecture. By sequentially exchanging the experience replay buffer between the agents, the agent’s learning training efficiency is improved in the first iterations and so is the training of Transformer-based RL agents in situations with limited data availability or unknown environments.
The global trend of increasing life expectancy introduces new challenges with far-reaching implications. Among these, the risk of falls among older adults is particularly significant, affecting individual health and the quality of life, and placing an additional burden on healthcare systems. Existing fall detection systems often have limitations, including delays due to continuous server communication, high false-positive rates, low adoption rates due to wearability and comfort issues, and high costs. In response to these challenges, this work presents a reliable, wearable, and cost-effective fall detection system. The proposed system consists of a fit-for-purpose device, with an embedded algorithm and an Inertial Measurement Unit (IMU), enabling real-time fall detection. The algorithm combines a Threshold-Based Algorithm (TBA) and a neural network with low number of parameters based on a Transformer architecture. This system demonstrates notable performance with 95.29% accuracy, 93.68% specificity, and 96.66% sensitivity, while only using a 0.38% of the trainable parameters used by the other approach.
Since vision transformers excel at establishing global relationships between features, they play an important role in current vision tasks. However, the global attention mechanism restricts the capture of local features, making convolutional assistance necessary. This paper indicates that transformer-based models can attend to local information without using convolutional blocks, similar to convolutional kernels, by employing a special initialization method. Therefore, this paper proposes a novel hybrid multi-scale model called Frequency-Assisted Local Attention Transformer (FALAT). FALAT introduces a Frequency-Assisted Window-based Positional Self-Attention (FWPSA) module that limits the attention distance of query tokens, enabling the capture of local contents in the early stage. The information from value tokens in the frequency domain enhances information diversity during self-attention computation. Additionally, the traditional convolutional method is replaced with a depth-wise separable convolution to downsample in the spatial reduction attention module for long-distance contents in the later stages. Experimental results demonstrate that FALAT-S achieves 83.0% accuracy on IN-1k with an input size of 224×224 using 29.9M parameters and 5.6G FLOPs. This model outperforms the Next-ViT-S by 0.9APb/0.8APm with Mask-R-CNN 1× on COCO and surpasses the recent FastViT-SA36 by 3.1% mIoU with FPN on ADE20k.
Identifying critical nodes in complex networks is a challenging topic. There are already various crucial node identification methods based on deep learning. However, these methods ignore the interactions between nodes and neighbors when learning node representations, which result in node features learnt insufficient. To solve this problem, we propose a critical node identification model that combines self-attention and ResNet. First, we take degree centrality, closeness centrality, betweenness centrality and clustering coefficient as the features of nodes and use a novel neighbor feature polymerization approach to generate a feature matrix for each node. Then, the susceptible infection recovery (SIR) model is used to simulate the propagation ability of the nodes, and the nodes are categorized based on their propagation ability to acquire their labels. Finally, the feature matrix and labels of the nodes are used as inputs to the model to learn the hidden representation of the nodes. We evaluate the model with accuracy, precision, recall, the F1 index, the ROC curve, and the PR curve in five real networks. The results show that the method outperforms benchmark methods and can effectively identify critical nodes in complex networks.
Grape is an important cash crop that is susceptible to diseases when growing, resulting in lower yield and quality. In recent years, transformers have achieved excellent performance in a variety of natural language processing and image recognition tasks through the self-attention mechanism. Therefore, this paper proposes a grape leaf disease recognition model named Dense Convolutional Transformer (DensCT). The compact convolutional transformer (CCT) is used as the backbone in this model, which improves the convolutional module of the original model by introducing densely connected modules, enhancing the transfer and reuse of features between networks. This also modifies the single-scale feature extraction method of the original model to multi-scale, which improves the feature extraction performance. Finally, the model was trained on two small-scale datasets from scratch, and the recognition accuracy of the final model on the test sets reached 89.19% and 93.92%. Compared with CCT, DenseNet121, ResNet50, MobileNetV3 and ViT, the recognition accuracy improved by 4.73%, 3.38%, 10.81%, 0.68% and 18.24% on the first dataset and 6.08%, 5.41%, 1.35%, 3.38% and 12.84% on the second dataset. The experimental results show that the proposed model can effectively identify grape leaf diseases, which can provide a reference for building disease leaf recognition models on small-scale datasets.
The rapid development of deep reinforcement learning makes it widely used in multi-agent environments to solve the multi-agent cooperation problem. However, due to the instability of multi-agent environments, the performance is insufficient when using deep reinforcement learning algorithms to train each agent independently. In this work, we use the framework of centralized training with decentralized execution to extend the maximum entropy deep reinforcement learning algorithm Soft Actor-Critic (SAC) and proposes the multi-agent deep reinforcement learning algorithm MASAC based on the maximum entropy framework. Proposed model treats all the agents as part of the environment, it can effectively solve the problem of poor convergence of algorithms due to environmental instability. At the same time, we have noticed the shortcoming of centralized training, using all the information of the agents as input of critics, and it is easy to lose the information related to the current agent. Inspired by the application of self-attention mechanism in machine translation, we use the self-attention mechanism to improve the critic and propose the ATT-MASAC algorithm. Each agent can discover their relationship with other agents through encoder operation and attention calculation as part of the critic networks. Compared with the recent multi-agent deep reinforcement learning algorithms, ATT-MASAC has better convergence effect. Also, it has better stability when the number of agents in the environment increases.
In this paper, we propose an enhanced dual path attention network (EDPAN) for image super-resolution. ResNet is good at implicitly reusing extracted features, DenseNet is good at exploring new features. Dual Path Network (DPN) combines ResNets and DenseNet to create a more accurate architecture than the straightforward one. We experimentally show that the residual network performs best when each block consists of two convolutions, and the dense network performs best when each micro-block consists of one convolution. Following these ideas, our EDPAN exploits the advantages of the residual structure and the dense structure. Besides, to deploy the computations for features more effectively, we introduce the attention mechanism into our EDPAN. Moreover, to relieve the parameters burden, we also utilize recursive learning to propose a lightweight model. In the experiments, we demonstrate the effectiveness and robustness of our proposed EDPAN on different degradation situations. The quantitative results and visualization comparison can sufficiently indicate that our EDPAN achieves favorable performance over the state-of-the-art frameworks.
Aiming at the problems of imperfect inpainting edges, mismatching inpainting content and slow training caused by large network model parameters and high requirements for image inpainting edge consistency and semantic integrity, this paper designs a gated convolution and self-attention-based pyramid network (GAP-Net), the network is based on U-Net, and it integrates the gated convolution method and the pyramid loss and changes the feature extraction strategy. In addition, we design a self-attention mechanism module and an attention transfer module for the network, designing and adding content loss and perceptual loss for the network, generating a new data distribution between generated and real images. The comparative analysis of experiment with the PEN-Net method and the Gated method is conducted in the same experimental environment. The experimental results show that our method can increase the extraction of useful information from damaged image areas by gated convolution and pyramid loss. Self-attention mechanism module and the attention transfer module can guide the conversion process of high-level semantic features to image information more accurately, and the content and perceptual loss can accelerate and improve the learning ability of the network, this method improves the repair effect and accelerates the network learning speed.
Nowadays, there is a growing concern about contactless palmprint recognition because of its high-recognition rate, efficiency, and convenience. With the development of image acquisition equipment, it is an often case that the palmprint images for identification and for registration are captured by different devices. At the same time, a large amount of well-labeled palmprint images are difficult to collect. Therefore, the performance of most existing contactless palmprint recognition methods will be poor in real-life applications. To address these issues, we proposed a self-attention CycleGAN for cross-domain semi-supervised palmprint recognition. Based on CycleGAN, the styles of contactless palmprint images in source domain and target domain can be swapped. Specifically, the spatial features are captured through self-attention modules by modeling long-range dependencies. In addition, an extra source domain classifier is trained with the labeled source domain images to give the unlabeled images in target domain a pseudo-label, by which images in target domain are efficiently utilized. The experiment results showed that our method achieved competitive performance.
Recent research has identified adversarial examples which are the challenges to DNN-based ASR systems. In this paper, we propose a new model based on Convolutional GRU and Self-attention U-Net called GRU-U-NetAT to improve adversarial speech signals. To represent the correlation between neighboring noisy speech frames, a two-Layer GRU is added in the bottleneck of U-Net and an attention gate is inserted in up-sampling units to increase the adversarial stability. The goal of using GRU is to combine the weights sharing technique with the use of gates to control the flow of data across multiple feature maps. As a result, it outperforms the original 1D convolution used in U-NetAT. Especially, the performance of the model is evaluated by explainable speech recognition metrics and its performance is analyzed by the improved adversarial training. We used adversarial audio attacks to perform experiments on automatic speech recognition (ASR). We saw (i) the robustness of ASR models which are based on DNN can be improved using the temporal features grasped by the attention-based GRU network; (ii) through adversarial training, including some additive adversarial data augmentation, we could improve the generalization power of Automatic Speech Recognition models which are based on DNN. The word-error-rate (WER) metric confirmed that the enhancement capabilities are better than the state-of-the-art U-NetAT. The reason for this enhancement is the ability of GRU units to extract global information within the feature maps. Based on the conducted experiments, the proposed GRU-U-NetAT increases the score of Speech Transmission Index (STI), Perceptual Evaluation of Speech Quality (PESQ), and the Short-term Objective Intelligibility (STOI) with adversarial speech examples in speech enhancement.
In the current medical image segmentation network, the combination of CNN and Transformer has become a mainstream trend. However, the inherent limitations of convolution operation in CNN and insufficient information interaction in Transformer affect the segmentation performance of the network. To solve these problems, an integrated self-attention and convolution medical image segmentation network (ISC-TransUNet) is proposed in this paper. The network consists of encoder, decoder and jump connection. First, the encoder uses a hybrid structure of BoTNet and Transformer to capture more comprehensive image information and reduce additional computing overhead. Then, the decoder uses an upper sampler cascaded by multiple DUpsampling upper blocks to accurately recover the pixel-level prediction. Finally, the feature fusion of encoder and decoder at different resolutions is realized by ResPath jump connection, which reduces the semantic difference between encoder and decoder. Through experiments on the Synapse multi-organ segmentation dataset, compared with the baseline model TransUNet, Dice similarity coefficient of ISC-TransUNet was improved by 1.13%, Hausdorff distance was reduced by 2.38%, and weight was maintained. The experimental results show that the network can effectively segment tissues and organs in medical images, which has important theoretical significance and application value for intelligent clinical diagnosis and treatment.
In current epilepsy disease research, accurate identification of epilepsy electroencephalogram (EEG) signals is crucial for improving diagnostic efficiency and developing personalized treatment plans. This study proposes an innovative epilepsy recognition model, MAC, which combines the unique advantages of a multilayer perceptron (MLP), a self-attention mechanism and the cosine distance. This model uses a MLP as the basic model and effectively reduces individual differences among epilepsy patients through its superior linear fitting ability. To more accurately measure the difference between two EEG signals, we introduced the cosine distance as a new feature metric. This metric enhances the performance of epilepsy EEG classification by using the cosine value of the angle in vector space to precisely assess the difference between two individuals. In addition, we introduced a self-attention mechanism into the model to enhance the impact of various factors on the final EEG data. Our experiments employed the EEG database of the Epilepsy Research Center of the University of Bonn. Through comparative experiments, it was proven that the proposed MAC model achieved significant improvement in performance on the epilepsy EEG signal recognition task. This study fills the existing research gap in the field of epilepsy identification and provides a powerful tool for the accurate diagnosis of epilepsy diseases in the future. We believe that the introduction of the MAC model will promote new breakthroughs in epilepsy EEG signal recognition and lay a solid foundation for the development of related fields. This research provides an important theoretical and practical reference for advancing the field of epilepsy identification.
In the drug discovery process, accurate prediction of drug–target interactions is crucial to accelerate the development of new drugs. However, existing methods still face many challenges in dealing with complex biomolecular interactions. To this end, we propose a new deep learning framework that combines the structural information and sequence features of proteins to provide comprehensive feature representation through bimodal fusion. This framework not only integrates the topological adaptive graph convolutional network and multi-head attention mechanism, but also introduces a self-masked attention mechanism to ensure that each protein binding site can focus on its own unique features and its interaction with the ligand. Experimental results on multiple public datasets show that our method significantly outperforms traditional machine learning and graph neural network methods in predictive performance. In addition, our method can effectively identify and explain key molecular interactions, providing new insights into understanding the complex relationship between drugs and targets.
Forecasting the motion of surrounding vehicles is necessary for a self-driving vehicle to plan a safe and efficient trajectory for the future. Like experienced human drivers, the self-driving vehicle needs to perceive the interaction of surrounding vehicles and decide the best trajectory from many choices. However, previous methods either lack modeling of interactions or ignore the multi-modal nature of this problem. In this paper, we focus on two important cues of trajectory prediction: interaction and maneuver, and propose Maneuver conditioned Attentional Network named MAN. MAN learns the interactions of all vehicles in a scenario in parallel by self-attention social pooling and the attentional decoder generates the future trajectory conditioned on the predicted maneuver among 3 classes: Lane Changing Left (LCL), Lane Changing Right (LCR) and Lane Keeping (LK). Experiments demonstrate the improvement of our model in prediction on the publicly available NGSIM and HighD datasets. We also present quantitative analysis to study the relationship between maneuver prediction accuracy and trajectory error.
The end-to-end learning approaches possess the advantages of high efficiency, rapidity and superior solving precision for combinatorial optimization problems, while exploring generalization to instances different from training scale is an open question. In this paper, we focus on the knapsack problem (KP) and employ an end-to-end data-driven approach based on attention model incorporated with different forms of baseline of policy gradient algorithm to solve KP. We first investigate the generalization performance of the proposed approach for KP on various problem scales with different capacities. The experimental results show that the end-to-end model possesses certain learning and generalization abilities to discover the intrinsic characteristics between instances, then guides to solve other instances of different scales.
Deep convolutional neural network (CNN) models are typically trained on high-resolution images. When we apply them directly to low-resolution infrared images, for example, the performances will not always be satisfactory. This is due to CNN layers that operate in a local neighborhood, which is already poor in information for infrared images. To overcome these weaknesses and increase information of global nature, a hybrid architecture based on CNN with self-attention mechanism is proposed. This later provides information about the global context by capturing the long-range interactions between the different parts of an image. In this paper, we have incorporated a convolutional–attentional form in the top layers of two pre-trained networks VGGNet and ResNet. The convolutional–attentional form is a concatenation of two paths; the original convolutional feature maps of the pre-trained network, and the output of a relative multi-head attentional block. Extensive experiments are conducted in the FLIR starter thermal dataset, where we achieve a 97.07% overall accuracy in the four-class FLIR starter thermal dataset. Moreover, the proposed architectures exceed the state of the art in target recognition on two-class FLIR starter thermal dataset with a 3.5% improvement in overall classification accuracy. In addition, a study on the effect of different hyper-parameters and error analysis is carried out to give some research forward directions.
Various deep learning models have been developed for different healthcare predictive tasks using Electronic Health Records and have shown promising performance. In these models, medical codes are often aggregated into visit representation without considering their heterogeneity, e.g., the same diagnosis might imply different healthcare concerns with different procedures or medications. Then the visits are often fed into deep learning models, such as recurrent neural networks, sequentially without considering the irregular temporal information and dependencies among visits. To address these limitations, we developed a Multilevel Self-Attention Model (MSAM) that can capture the underlying relationships between medical codes and between medical visits. We compared MSAM with various baseline models on two predictive tasks, i.e., future disease prediction and future medical cost prediction, with two large datasets, i.e., MIMIC-3 and PFK. In the experiments, MSAM consistently outperformed baseline models. Additionally, for future medical cost prediction, we used disease prediction as an auxiliary task, which not only guides the model to achieve a stronger and more stable financial prediction, but also allows managed care organizations to provide a better care coordination.
Dense video captioning is a high-level visual understanding task, which dedicates to semantic interpretation for events in a video. Nowadays, Transformer is widely employed for this task in consideration of its high parallelism and leverage on the long-term dependency. However, the relevance of different visual attributes is usually limited by absolute position embedding in current works. To address this problem, a novel method of designing a position embedding fusion module is proposed for dense video captioning. Experiments on the public ActivityNet Captions dataset demonstrate the effectiveness of the proposed method on enhancing the correlation of individual event, with 10.3635 (2018) and 7.2181 (2019) on the METEOR metric.
Please login to be able to save your searches and receive alerts for new content matching your search criteria.