Player shadows in basketball games show complexity and diversity due to various factors such as lighting conditions, player movements, camera positions, etc. In the shadow removal task, the attention mechanism aids the model in focusing on the key areas in the image, that is, the players themselves and the shadows around them, and may ignore some details, resulting in poor effects in the shadow removal of embedded basketball game players. To this end, a shadow removal method for embedded basketball game players with design attention and multi-scale fusion is proposed. By extracting the texture of the shadow image area of the local embedded basketball player, setting the gray value threshold of the center point of the image area window, obtaining the Local Binary Patterns (LBP) code of the local texture feature structure of the image, scanning the LBP code of the shadow image of the embedded basketball player, and obtaining the final characteristic parameters of the shadow image of the embedded basketball player, the pixels of the shadow image of the embedded basketball player are balanced and adjusted, and the size of the image is determined by the slope of the transformation function, and the shadow image of the embedded basketball player is within the normal proportion range. The consistency of multi-scale detail pixels is adjusted to achieve pixel segmentation of shadow images of embedded basketball players. Analyzing the basic principles of the attention mechanism, the shadow images of embedded basketball players are collected for key-value pairs to represent action information, and a key-value set is set. Encoding is carried out by the encoder in the attention mechanism, and the dependence of the foul action after encoding is calculated and decoded to determine the shadow image of the embedded basketball player to achieve shadow recognition by obtaining the shadow pixel point chromaticity space of the embedded basketball player color vector value, we determine the chromaticity similarity difference, compare the chromaticity similarity difference with the threshold, design a multi-scale fusion embedded basketball player shadow removal model, and implement the research. Experimental results show that the proposed method can effectively remove the shadow of embedded basketball players.
Autism spectrum disorder (ASD) is a developmental disability that poses significant challenges in social interaction, communication, and behavior. Individuals with ASD have unique ways of interacting and communicating, and early prediction is crucial for timely therapy. Researchers are focusing on predicting ASD using image-processing techniques due to its neurological nature. The proposed novel Hybrid Convolutional Bilateral filter-based Deep Dual Swin Axial Generator Attention with Gooseneck Barnacle Optimization (FCB-DDSATGA-GBO) accurately predicts ASD. The facial image dataset is the input data source. The Hybrid Fast Convolutional Bilateral Filter (HFCBF) is used to pre-process the data. Dual Deep Autoencoder and Split Generative Adversarial Network (DDASGAN) is used to extract static features. Additionally, Swin-Gated Axial Attention Transformer (SGAAT) is used to segment the image. To forecast ASD, DDASGAN is used and optimized with Gooseneck Barnacle Optimization (GBO). The performance of the suggested methodology can be assessed using measures such as accuracy, recall, precision, sensitivity, f-score, and error, and compared to existing methods. The suggested FCB-DDSATGA-GBO model outperforms the current techniques, offering an enhanced f1-score of 99.66%, recall of 99.66%, accuracy of 99.67%, specificity of 99.67%, and precision of 99.66% when utilizing facial images.
Selective Tuning (ST) presents a framework for modeling attention and in this work we show how it performs in covert visual search tasks by comparing its performance to human performance. Two implementations of ST have been developed. The Object Recognition Model recognizes and attends to simple objects formed by the conjunction of various features and the Motion Model recognizes and attends to motion patterns. The validity of the Object Recognition Model was first tested by successfully duplicating the results of Nagy and Sanchez. A second experiment was aimed at an evaluation of the model's performance against the observed continuum of search slopes for feature-conjunction searches of varying difficulty. The Motion Model was tested against two experiments dealing with searches in the visual motion domain. A simple odd-man-out search for counter-clockwise rotating octagons among identical clockwise rotating octagons produced linear increase in search time with the increase of set size. The second experiment was similar to one described by Thorton and Gilden. The results from both implementations agreed with the psychophysical data from the simulated experiments. We conclude that ST provides a valid explanatory mechanism for human covert visual search performance, an explanation going far beyond the conventional saliency map based explanations.
The majority of proposed NIRS-BCIs has considered binary classification. Studies considering high-order classification problems have yielded average accuracies that are less than favorable for practical communication. Consequently, there is a paucity of evidence supporting online classification of more than two mental states using NIRS. We developed an online ternary NIRS-BCI that supports the verbal fluency task (VFT), Stroop task and rest. The system utilized two sessions dedicated solely to classifier training. Additionally, samples were collected prior to each period of online classification to update the classifier. Using a continuous-wave spectrometer, measurements were collected from the prefrontal and parietal cortices while 11 able-bodied adult participants were cued to perform one of the two cognitive tasks or rests. Each task was used to indicate the desire to select a particular letter on a scanning interface, while rest avoided selection. Classification was performed using 25 iteration of bagging with a linear discriminant base classifier. Classifiers were trained on 10-dimensional feature sets. The BCI’s classification decision was provided as feedback. An average online classification accuracy of 74.2±14.8% was achieved, representing an ITR of 1.31±0.86 bits/min. The results demonstrate that online communication can be achieved with a ternary NIRS-BCI that supports VFT, Stroop task and rest. Our findings encourage continued efforts to enhance the ITR of NIRS-BCIs.
We have previously shown that during top-down attentional modulation (stimulus expectation) correlations of the beta signals across the primary visual cortex were uniform, while during bottom-up attentional processing (visual stimulation) their values were heterogeneous. These different patterns of attentional beta modulation may be caused by feed-forward lateral inhibitory interactions in the visual cortex, activated solely during stimulus processing. To test this hypothesis, we developed a large-scale computational model of the cortical network. We first identified the parameter range needed to support beta rhythm generation, and next, simulated the different activity states corresponding to experimental paradigms. The model matched our experimental data in terms of spatial organization of beta correlations during different attentional states and provided a computational confirmation of the hypothesis that the paradigm-specific beta activation spatial maps depend on the lateral inhibitory mechanism. The model also generated testable predictions that cross-correlation values depend on the distance between the activated columns and on their spatial position with respect to the location of the sensory inputs from the thalamus.
It has been proven that, compared to using 32-bit floating-point numbers in the training phase, Deep Convolutional Neural Networks (DCNNs) can operate with low-precision during inference, thereby saving memory footprint and power consumption. However, neural network quantization is always accompanied by accuracy degradation. Here, we propose a quantization method called double-stage Squeeze-and-Threshold (double-stage ST) to close the accuracy gap with full-precision models. While accurate colors in pictures can be pleasing to the viewer, they are not necessary for distinguishing objects. The era of black and white television proves this idea. As long as the limited colors are filled reasonably for different objects, the objects can be well identified and distinguished. Our method utilizes the attention mechanism to adjust the activations and learn the thresholds to distinguish objects (features). We then divide the numerically rich activations into intervals (a limited variety of numerical values) by the learned thresholds. The proposed method supports both binarization and multi-bit quantization. Our method achieves state-of-the-art results. In binarization, ReActNet [Z. Liu, Z. Shen, S. Li, K. Helwegen, D. Huang and K. Cheng, arXiv:abs/2106.11309] trained with our method outperforms the previous state-of-the-art result by 0.2 percentage points. Whereas in multi-bit quantization, the top-1 accuracy of the 3-bit ResNet-18 [K. He, X. Zhang, S. Ren and J. Sun, Deep residual learning for image recognition, 2016 IEEE Conf. Computer Vision and Pattern Recognition, CVPR 2016, 27–30 June 2016, Las Vegas, NV, USA (IEEE Computer Society, 2016), pp. 770–778] model exceeds the top-1 accuracy of its full-precision baseline model by 0.4 percentage points. The double-stage ST activation quantization method is easy to apply by inserting it before the convolution. Besides, the double-stage ST is detachable after training and introducing no computational cost in inference.
Most of the existing researches on public health events focus on the number and duration of events in a year or month, which are carried out by regression equation. COVID-19 epidemic, which was discovered in Wuhan, Hubei Province, quickly spread to the whole country, and then appeared as a global public health event. During the epidemic period, Chinese netizens inquired about the dynamics of COVID-19 epidemic through Baidu search platform, and learned about relevant epidemic prevention information. These groups’ search behavior data not only reflect people’s attention to COVID-19 epidemic, but also contain the stage characteristics and evolution trend of COVID-19 epidemic. Therefore, the time, space and attribute laws of propagation of COVID-19 epidemic can be discovered by deeply mining more information in the time series data of search behavior. In this study, it is found that transforming time series data into visibility network through the principle of visibility algorithm can dig more hidden information in time series data, which may help us fully understand the attention to COVID-19 epidemic in Chinese provinces and cities, and evaluate the deficiencies of early warning and prevention of major epidemics. What’s more, it will improve the ability to cope with public health crisis and social decision-making level.
We present a biologically plausible computational model for solving the visual feature binding problem, based on recent results regarding the time course and processing sequence in the primate visual system. The feature binding problem appears due to the distributed nature of visual processing in the primate brain, and the gradual loss of spatial information along the processing hierarchy. This paper puts forward the proposal that by using multiple passes of the visual processing hierarchy, both bottom-up and top-down, and using task information to tune the processing prior to each pass, we can explain the different recognition behaviors that primate vision exhibits. To accomplish this, four different kinds of binding processes are introduced and are tied directly to specific recognition tasks and their time course. The model relies on the reentrant connections so ubiquitous in the primate brain to recover spatial information, and thus allow features represented in different parts of the brain to be integrated in a unitary conscious percept. We show how different tasks and stimuli have different binding requirements, and present a unified framework within the Selective Tuning model of visual attention.
The adoption of deep learning in healthcare is hindered by their “black box” nature. In this paper, we explore the RETAIN architecture for the task of glucose forecasting for diabetic people. By using a two-level attention mechanism, the recurrent-neural-network-based RETAIN model is interpretable. We evaluate the RETAIN model on the type-2 IDIAB and the type-1 OhioT1DM datasets by comparing its statistical and clinical performances against two deep models and three models based on decision trees. We show that the RETAIN model offers a very good compromise between accuracy and interpretability, being almost as accurate as the LSTM and FCN models while remaining interpretable. We show the usefulness of its interpretable nature by analyzing the contribution of each variable to the final prediction. It revealed that signal values older than 1h are not used by the RETAIN model for 30min ahead of time prediction of glucose. Also, we show how the RETAIN model changes its behavior upon the arrival of an event such as carbohydrate intakes or insulin infusions. In particular, it showed that the patient’s state before the event is particularly important for the prediction. Overall the RETAIN model, thanks to its interpretability, seems to be a very promising model for regression or classification tasks in healthcare.
Occlusive keypoints has been a challenge for human pose estimation, especially the mutual occlusion of human bodies. One possible solution to this problem is to utilize multi-scale features, where small scale features are capable of identifying keypoints, while large-scale features can capture the relationship between keypoints. Feature fusion among multi-scale features allows for the exchange of information between keypoints, facilitating the inference of occluded keypoints based on the identified keypoints. However, it’s found that there are invalid features in feature fusion which will interfere valid feature. In this paper, we propose multi-scale feature refined network (MSFRNet) based on HRNet and a new attention module namely multi-resolution attention module (MRAM). The proposed MRAM is designed to strengthen the effective information while suppressing redundant information. It has multiple inputs and outputs and can learn the relationships between keypoints while retaining detailed information. The proposed MSFRNet outperforms HRNet, achieving a 1.4AP improvement on the COCO dataset with only a marginal computational increase of 0.35 GFLOPs. Additionally, it demonstrates superior performance with a 0.9AP, 0.7AP, and 1.8AP improvement on the MPII, CrowdPose and OCHuman datasets, respectively. Furthermore, compared with the latest attention mechanism PSA, the MSFRNet exhibits lower computational cost while maintaining the same pose-estimation accuracy.
To address the limitations of visible light cameras that cannot function effectively at night, infrared cameras have become the optimal supplement. However, current methods for visible–infrared cross-modality person re-identification focus solely on feature combination and fusion, neglecting the importance of feature alignment. To address this issue, we introduce a novel Hierarchical Feature Fusion (HFF) network, which comprehensively integrates features across various levels through sequential feature extraction. Specifically, we design a pixel-level contrastive loss function that makes pixels in the same region of cross-modality images more similar and distinguishes pixel features at different locations, thereby extracting similar low-frequency information in the shallow network. Furthermore, in the deep network, we extract high-frequency information of different modalities through the Bi-Transformer Layer and propose Node-level Coupling Attention and Modality-level Decoupling Attention. Coupling attention is used for high-frequency information coupling within the same modality while decoupling attention is used for high-frequency information decoupling between different modalities to obtain more texture and detail information. Through a series of experimental results, we validate the superiority of the proposed HFF network in cross-modality person re-identification. Our proposed method achieved 87.16% and 95.23% Rank-1 on the SYSU-MM01 and RegDB datasets, respectively, and extensive experiments have validated its effectiveness in feature alignment.
This paper presents an approach to the registration of individual images to one another to produce a larger composite mosaic. The approach is based on the use of the moments of Zernike orthogonal polynomials to compute the relative scale, rotation and translation between the images. A preliminary stage involves the use of an attention-like operation to estimate potential approximate correspondence points between the images based on extrema of local edge element density. Experimental results illustrate that the technique is effective in a range of environments and over a broad range of image registration parameters. In particular, our method makes few assumptions regarding the image content and yet, unlike several alternative approaches, can perform registration for images with only a limited amount of overlap.
As one of the fatal human diseases, early detection of brain tumors can effectively save patients’ lives. Brain tumor image segmentation is of great practical importance for physicians to perform brain tumor diagnoses quickly. Due to the data complexity of 3D brain images, it is impractical to segment out tumor regions manually, so automatic and reliable methods can be utilized instead of manual work to achieve accurate segmentation of tumor regions. In this paper, we propose an end-to-end, more efficient brain tumor MRI segmentation model, REMU-Net, for the problems of multi-scale feature extraction and difficulty in small target feature extraction in 3D brain tumor image segmentation. Firstly, design and use the multi-channel parallel M-RepVGG module as a decoder to achieve multi-scale feature fusion. Secondly, embedding dilated convolution with different dilated rates in the DM-RepVGG module of the encoder to better extract features at different scales. Finally, introduce the expectation-maximizing attention in the network to better extract the features of the internal details of the tumor. The experimental results on the BraTS2018 validation dataset are Dice scores of 80.93%, 90.13%, and 86.15%, respectively. Experimental results on the BraTS2019 validation dataset can be achieved with Dice scores of 78.29%, 90.65%, and 82.77%, respectively.
The usage of the Internet of Things (IoT) in the field of transportation appears to have immense potential. Intelligent vehicle systems can exchange seamless information to assist cars to ensure better traffic control and road safety. The dynamic topology of this network, connecting a large number of vehicles, makes it vulnerable to several threats like authentication, data integrity, confidentiality, etc. These threats jeopardize the safety of vehicles, riders, and the entire system. Researchers are developing several approaches to combat security threats in connected and autonomous vehicles. Artificial Intelligence is being used by both scientists and hackers for protecting and attacking the networks, respectively. Nevertheless, wirelessly coupled cars on the network are in constant peril. This motivated us to develop an intrusion detection model that can be run in low-end devices with low processing and memory capacity and can prevent security threats and protect the connected vehicle network. This research paper presents an Attention-enabled Hierarchical Deep Neural Network (AHDNN) as a solution to detect intrusion and ensure autonomous vehicles’ security both at the nodes and at the network level. The proposed AHDNN framework has a very low false negative rate of 0.012 ensuring a very low rate of missing an intrusion in normal communication. This enables enhanced security in vehicular networks.
Aiming at the problems of high complexity and low detection accuracy of single-stage three-dimensional (3D) detection method, a vehicle object detection algorithm based on the Efficient Channel Attention (ECA) mechanism is proposed. This paper provides a good solution to the problems of low object recognition accuracy and high model complexity in the field of 3D object detection. First, we voxelized the original point cloud data, taking the average coordinates and intensity values as the initial features. By entering into the Voxel Feature Encoding (VFE) layer, we can extract the features of each voxel. Then, referring to the VoxelNet model, the ECA mechanism is introduced, which reduces the complexity of the model while maintaining the good performance in the model. Finally, experiments on the widely used KITTI dataset show that the algorithm performs well, and the accuracy of the proposed ECA algorithm has reached 87.75%. Compared with the current mainstream algorithm SE-SSD of object detection, the accuracy is increased by 0.21%.
We review a neural network model based on chaotic dynamics [Babloyantz & Lourenço, 1994, 1996] and provide a detailed discussion of its biological and computational relevance. Chaos can be viewed as a "reservoir" containing an infinite number of unstable periodic orbits. In our approach, the periodic orbits are used as coding devices. By considering a large enough number of them, one can in principle expand the information processing capacity of small or moderate-size networks. The system is most of the time in an undetermined state characterized by a chaotic attractor. Depending on the type of an external stimulus, the dynamics is stabilized into one of the available periodic orbits, and the system is then ready to process information. This corresponds to the system being driven into an "attentive" state. We show that, apart from static pattern processing, the model is capable of dealing with moving stimuli. We especially consider in this paper the case of transient visual stimuli, which has a clear biological relevance. The advantages of chaos over more regular regimes are discussed.
Music content has recently been identified as useful information to promote the performance of music recommendations. Existing studies usually feed low-level audio features, such as the Mel-frequency cepstral coefficients, into deep learning models for music recommendations. However, such features cannot well characterize music audios, which often contain multiple sound sources. In this paper, we propose to model and fuse chord, melody, and rhythm features to meaningfully characterize the music so as to improve the music recommendation. Specially, we use two user-based attention mechanisms to differentiate the importance of different parts of audio features and chord features. In addition, a Long Short-Term Memory layer is used to capture the sequence characteristics. Those features are fused by a multilayer perceptron and then used to make recommendations. We conducted experiments with a subset of the last.fm-1b dataset. The experimental results show that our proposal outperforms the best baseline by 3.52% on HR@10.
Computational modeling of the human visual system is of current interest to developers of artificial vision systems, primarily because a biologically-inspired model can offer solutions to otherwise intractable image understanding problems. The purpose of this study is to present a biologically-inspired model of selective perception that augments a stimulus-driven approach with a high-level algorithm that takes into account particularly informative regions in the scene. The representation is compact and given in the form of a topographic map of relative perceptual conspicuity values. Other recent attempts at compact scene representation consider only low-level information that codes salient features such as color, edge, and luminance values. The previous attempts do not correlate well with subjects' fixation locations during viewing of complex images or natural scenes. This study uses high-level information in the form of figure/ground segmentation, potential object detection, and task-specific location bias. The results correlate well with the fixation densities of human viewers of natural scenes, and can be used as a preprocessing module for image understanding or intelligent surveillance applications.
This paper presents a software environment providing human-aware ambient support for a human performing a task that demands substantial amounts of attention. The agent obtains human attention-awareness in an adaptive manner by use of a dynamical model of human attention, gaze sensoring by an eye-tracker, and information about features of the objects in the environment which is parameterised for characteristics of the human specified above. The agent uses a built-in adaptation model to adapt on the fly, the values of these parameters to the personal characteristics of the human. The software agent has been implemented in a component-based manner within the Adobe® Flex® environment, thereby also integrating the Tobii® eye-tracker. It has been applied in a setup for a task where the human has to identify enemies and allies, and eliminate the enemies.
An innovative process that assists users in non-quantitative problem solving is presented. The process, called Ramic, employs the idea of psychological projection in an innovative way to help users focus, express and think through problems. Its applicability ranges from assisting with simple non-analytic decision-making to developing and assessing strategies.
In the virtual realm, harnessing the power of psychological projection for problem solving has been attempted in the form of a process called Sand Tray. Attempts at virtualization have garnered little traction potentially due to encumbrance of the interface. Ramic, in contrast, is innately set up for digital use through a relatively simple interface.
A key question this paper explores is how to quantitatively measure the value of Ramic in relation to the well-established process of Sand Tray. Even though these processes operate on qualitative problems, a preference analysis tool called conjoint analysis is used to build an experiment and derive specific user utilities for each process.
To perform the study, both processes required testing in the physical domain. A 32-person study is presented and indicates the Ramic projective process to have a 23% higher user utility than Sand Tray in the area of problem solving. As such, it presents an opportunity to explore a new way in which individuals can approach non-analytical problem solving and how computers can assist them in the task.
Please login to be able to save your searches and receive alerts for new content matching your search criteria.