Please login to be able to save your searches and receive alerts for new content matching your search criteria.
Tactile P300 brain-computer interface (BCI) generally has a worse accuracy and information transfer rate (ITR) than the visual-based BCI. It may be due to the fact that human beings have a relatively poor tactile perception. This study investigated the influence of visual attention on the performance of a tactile P300 BCI. We designed our paradigms based on a novel cheeks-stim paradigm which attached the stimulators on the subject’s cheeks. Two paradigms were designed as follows: a paradigm with no visual attention and another paradigm with visual attention to the target position. Eleven subjects were invited to perform the two paradigms. We also recorded and analyzed the eyeball movement data during the paradigm with visual attention to explore whether the eyeball movement would have an effect on the BCI classification. The average online accuracy was 89.09% for the paradigm with visual attention, which was significantly higher than that of the paradigm with no visual attention (70.45%). Significant difference in ITR was also found between the two paradigms (p<0.05). The results demonstrated that visual attention was an effective method to improve the performance of tactile P300 BCI. Our findings suggested that it may be feasible to complete an efficient tactile BCI system by adding visual attention.
In this paper we address the issue of how form and motion can be integrated in order to provide suitable information to attentively track multiple moving objects. Such integration is designed in a Bayesian framework, and a Belief Propagation technique is exploited to perform coherent form/motion labeling of regions of the observed scene. Experiments on both synthetic and real data are presented and discussed.
According to human vision theory, the image is conveyed from human visual system to brain when people have a look at. Different from previous work, the study reported in this paper attempts to simulate a more real and complex method for region of interest (ROI) detection and quantitatively analyze the correlation between users' visual perception and ROI. In this paper, a visual perception model-based ROI detection is proposed, which can be realized with an ordinary web camera. Visual perception model employs a combination of visual attention model and gaze tracking data to objectively detect ROIs. The work includes pre-ROI estimation using visual attention model, gaze data collection and ROI detection. Pre-ROIs are segmented by the visual attention model. Since eye feature extraction is critical to the accuracy and performance of gaze tracking, adaptive eye template and neural network are employed to predict gaze points. By computing the density of the gaze points, ROIs are ranked. Experimental results show that the accuracy of our ROI detection method can be raised as high as 97% and it is also demonstrated that our model can efficiently adapt to users' interests and match the objective ROI.
Locating text region from an image of nature scene is significantly helpful for better understanding the semantic meaning of the image, which plays an important role in many applications such as image retrieval, image categorization, social media processing, etc. Traditional approach relies on the low level image features to progressively locate the candidate text regions. However, these approaches often suffer for the cases of the clutter background since the adopted low level image features are fairly simple which may not reliably distinguish text region from the clutter background. Motivated by the recent popular research on attention model, salience detection is revisited in this paper. Based on the case of text detection on nature scene image, saliency map is further analyzed and is adjusted accordingly. Using the adjusted saliency map, the candidate text regions detected by the common low level features are further verified. Moreover, efficient low level text feature, Histogram of Edge-direction (HOE), is adopted in this paper, which statistically describes the edge direction information of the region of interest on the image. Encouraging experimental results have been obtained on the nature scene images with the text of various languages.
Saliency detection refers to the segmentation of all visually conspicuous objects from various backgrounds. The purpose is to produce an object-mask that overlaps the salient regions annotated by human vision. In this paper, we propose an efficient bottom-up saliency detection model based on wavelet generalized lifting. It requires no kernels with implicit assumptions and prior knowledge. Multiscale wavelet analysis is performed on broadly tuned color feature channels to include a wide range of spatial-frequency information. A nonlinear wavelet filter bank is designed to emphasize the wavelet coefficients, and then a saliency map is obtained through linear combination of the enhanced wavelet coefficients. This full-resolution saliency map uniformly highlights multiple salient objects of different sizes and shapes. An object-mask is constructed by the adaptive thresholding scheme on the saliency maps. Experimental results show that the proposed model outperforms the existing state-of-the-art competitors on two benchmark datasets.
Recent related approaches in the areas of vision, motor control and planning are attempting to reduce the computational requirements of each process by restricting the class of problems that can be addressed. Active vision, differential kinematics and reactive planning are all characterized by their minimal use of representations, which simplifies both the required computations and the acquisition of models. This paper describes an approach to visually-guided motor control that is based on active vision and differential kinematics, and is compatible with reactive planning. Active vision depends on an ability to choose a region of the visual environment for task-specific processing. Visual attention provides a mechanism for choosing the region to be processed in a task-specific way. In addition, this attentional mechanism provides the interface between the vision and motor systems by representing visual position information in a 3-D retinocentric coordinate frame. Coordinates in this frame are transformed into eye and arm motor coordinates using kinematic relations expressed differentially. A real-time implementation of these visuomotor mechanisms has been used to develop a number of visually-guided eye and arm movement behaviors.
In recent years, many research works indicate that human's visual attention is very helpful in some research areas that are related to computer vision, such as object recognition, scene understanding and object-based image/video retrieval or annotation. This paper presents a visual attention model for natural scenes based on a dynamic feature combination strategy. The model can be divided into three parts, which are feature extraction, dynamic feature combination and salient objects detection. First, the saliency features of color, information entropy and salient boundary are extracted from an original colored image. After that, two different evaluation measurements are proposed for two different categories of feature maps defined in this dynamic combination strategy, which measures the contribution of each feature map to saliency and carries out a dynamic weighting of individual feature maps. Finally, salient objects are located from an integrated saliency map and a computational method is given to simulate the location shift of the real human visual attention. Experimental results show that this model is effective and robust for saliency detection in natural scenes, also similar to the real human visual attention mechanism.
The inherent value system of a developmental agent enables autonomous mental development to take place right after the agent's "birth." Biologically, it is not clear what basic components constitute a value system. In the computational model introduced here, we propose that inherent value systems should have at least three basic components: punishment, reward and novelty with decreasing weights from the first component to the last. Punishments and rewards are temporally sparse but novelty is temporally dense. We present a biologically inspired computational architecture that guides development of sensorimotor skills through real-time interactions with the environments, driven by an inborn value system. The inherent value system has been successfully tested on an artificial agent in a simulation environment and a robot in the real world.
A biologically inspired foveated attention system in an object detection scenario is proposed. Bottom-up attention uses wide-angle stereo camera data to select a sequence of fixation points. Successive snapshots of high foveal resolution using a telephoto camera enable highly accurate object recognition based on SIFT algorithm. Top-down information is incrementally estimated and integrated using a Kalman-filter, enabling parameter adaptation to changing environments due to robot locomotion. In the experimental evaluation, all the target objects were detected in different backgrounds. Significant improvements in flexibility and efficiency are achieved.
The Human Visual Systems (HVSs) is imperfect and contains highly selective visual acquisition sensors. Not all useful information presented to human eyes can be perceived. Generally, a technique to determine whether a signal can be perceived and how well it can be perceived is called human visual sensitivity analysis. More than one hundred years of psychophysical research on HVSs has revealed that human visual sensitivity is not only determined by local characteristics of visual contents (luminance, contrast orientation, spatial and temporal frequency), but also global modulatory factors (visual attention and motion suppression). In this paper, we provide a review on various factors that affect human visual sensitivity and on various computational models for human visual sensitivity. A comparative study on the performance of various visual sensitivity models by simulations is also reported in the paper. Subjective evaluation on noise-embedded video sequences confirms that the introduction of global modulatory factors does improve the performance of the Just Noticeable Difference (JND) profile used in noise shaping.
Human-computer interaction systems have been developed in recent years. These systems use multimedia techniques to create Mixed-Reality environments where users can train themselves. Although most of these systems rely strongly on interactivity with the users, taking into account users' states, they still lack the possibility of considering users preferences when they help them. In this paper, we introduce an Action Support System for Interactive Self-Training (ASSIST) in cooking. ASSIST focuses on recognizing users' cooking actions as well as real objects related to these actions to be able to provide them with accurate and useful assistance. Before the recognition and instruction processes, it takes users' cooking preferences and suggests one or more recipes that are likely to satisfy their preferences by collaborative filtering. When the cooking process starts, ASSIST recognizes users' hands movement using a similarity measure algorithm called AMSS. When the recognized cooking action is correct, ASSIST instructs the user on the next cooking procedure through virtual objects. When a cooking action is incorrect, the cause of its failure is analyzed and ASSIST provides the user with support information according to the cause to improve the user's incorrect cooking action. Furthermore, we construct parallel transition models from cooking recipes for more flexible instructions. This enables users to perform necessary cooking actions in any order they want, allowing more flexible learning.
Processes occurring in brains, a.k.a. biological neural networks, can and have been modeled within artificial neural network architectures. Due to this, we have conducted a review of research on the phenomenon of blindsight in an attempt to generate ideas for artificial intelligence models. Blindsight can be considered as a diminished form of visual experience. If we assume that artificial networks have no form of visual experience, then deficits caused by blindsight give us insights into the processes occurring within visual experience that we can incorporate into artificial neural networks. This paper has been structured into three parts. Section 2 is a review of blindsight research, looking specifically at the errors occurring during this condition compared to normal vision. Section 3 identifies overall patterns from Sec. 2 to generate insights for computational models of vision. Section 4 demonstrates the utility of examining biological research to inform artificial intelligence research by examining computational models of visual attention relevant to one of the insights generated in Sec. 3. The research covered in Sec. 4 shows that incorporating one of our insights into computational vision does benefit those models. Future research will be required to determine whether our other insights are as valuable.
Many previous eye-tracking studies were conducted to examine how adult readers process different written languages. Relatively, only few eye-tracking studies have been conducted to observe the reading process of Arab children. This study investigated the influence of orthographic regularity on Saudi elementary grades’ English and Arabic words recognition. The eye movements of 15 grade-four students and 15 grade-six students were recorded while they read words that differ in frequency and regularity. Analysis of the visual information from the word-recognition process shows differences in the students’ eye movements for the two languages. There were statistically significant differences in the total fixation duration and fixation count between the two languages and between both groups. All the students showed longer processing time for English sentences than Arabic ones. However, Arabic-speaking students were influenced by English orthography with more processing difficulty for English irregular words. The visual information shows that more cross-linguistic differences are found in grade-four students’ results. Grade-four students transferred their first language (L1) reading strategies to read English words; however, Arabic reading methods cannot be effectively applied to reading irregular orthographies like English. This explains the increased eye-movement measurements of grade-four students compared to grade-six students, who fixated more on unfamiliar English words. Although orthographic regularity had a major effect on the word-recognition process in this study, the development of the students’ Arabic and English orthographic knowledge affected the progress of their visual word recognition across the two levels.
Dual-task obstacle crossing gait paradigms are commonly adopted to examine how attentional demands are associated with sensory-motor processing during obstacle crossing. Various attention tasks have been used with mixed findings. This raises a question whether and how different attention tasks would affect motor function differently. Therefore, we examined and compared the effects of two visual attention tasks on obstacle crossing in healthy young adults. In the first experiment, 10 young adults responded to a 200ms visuospatial attention task during the approaching phase of obstacle crossing. In the second experiment, another 10 young adults responded to a visual Stroop task while approaching and crossing an obstacle. In both experiments, subjects completed an obstacle crossing only, a visual attention only, or a dual-task obstacle crossing condition in a random order. Dual-task costs were calculated for each visual attention task on the accuracy rates, toe-obstacle clearances and gait velocities. Two tripping incidences occurred only in the dual-task condition with visuospital attention task. Trailing toe-obstacle clearance reduced in the dual-task condition with visuospital attention task, but toe-obstacle clearances of both limbs increased in the dual-task condition with Stroop task. Gait velocities were not affected by the visuospatial attention task but were significantly reduced when concurrently performing the Stroop task. Dual-task costs were significantly different between two experiments on the gait velocity and toe clearances, but not on the accuracy rate. Our findings demonstrated that different visual attention tasks lead to distinct modifications on obstacle crossing behaviors.
Recent related approaches in the areas of vision, motor control and planning are attempting to reduce the computational requirements of each process by restricting the class of problems that can be addressed. Active vision, differential kinematics and reactive planning are all characterized by their minimal use of representations, which simplifies both the required computations and the acquisition of models. This paper describes an approach to visually-guided motor control that is based on active vision and differential kinematics, and is compatible with reactive planning. Active vision depends on an ability to choose a region of the visual environment for task-specific processing. Visual attention provides a mechanism for choosing the region to be processed in a task-specific way. In addition, this attentional mechanism provides the interface between the vision and motor systems by representing visual position information in a 3-D retinocentric coordinate frame. Coordinates in this frame are transformed into eye and arm motor coordinates using kinematic relations expressed differentially. A real-time implementation of these visuomotor mechanisms has been used to develop a number of visually-guided eye and arm movement behaviors.
Classically, visual attention is assumed to be influenced by visual properties of objects, e.g. as assessed in visual search tasks. However, recent experimental evidence suggests that visual attention is also guided by action-related properties of objects ("affordances"),1,2 e.g. the handle of a cup affords grasping the cup; therefore attention is drawn towards the handle. In a first step towards modelling this interaction between attention and action, we implemented the Selective Attention for Action model (SAAM). The design of SAAM is based on the Selective Attention for Identification model (SAIM).3 For instance, we also followed a soft-constraint satisfaction approach in a connectionist framework. However, SAAM's selection process is guided by locations within objects suitable for grasping them whereas SAIM selects objects based on their visual properties. In order to implement SAAM's selection mechanism two sets of constraints were implemented. The first set of constraints took into account the anatomy of the hand, e.g. maximal possible distances between fingers. The second set of constraints (geometrical constraints) considered suitable contact points on objects by using simple edge detectors. We demonstrate here that SAAM can successfully mimic human behaviour by comparing simulated contact points with experimental data.
In this paper, a multicore DSP parallel implementation strategy for infrared target detection and tracking is proposed. Tracking-Learning-Detection is considered a highly efficient algorithm for tracking a single target. Although this algorithm can re-track a target when the target is occluded by other targets, many shortcomings still exist. This paper deals with the issue of target tracking by fusing local spectrum suppression based on human visual attention mechanism with tracking-learning-detection algorithm. Specifically, the area of the target is estimated to reduce the detection region and to increase the processing speed. Experimental results conducted on VIVID benchmark video library demonstrate that the proposed method can detect properly and track accurately infrared target in complex scenes.
In this paper, we propose a new form of delivering image advertising in the results of image search or common image management websites. We integrate the visual salience and color information to analyze the context of the image list and get the most relevant image advertisement. Then we calculate the suitable placement for advertising according to the similarities betweenthe image list. By these, we neither destroy the structure of the webpage nor the content of the image itself. Thus, we can recommend the advertisement which user is likely interested in and improve the user’s quality of experience.
Information extraction of visual attention can focus on the key areas of an image or video, which has important implications for target search, image search and other image processing algorithms. In this paper, a task and scale variable focus of attention based visual attention method is proposed. The proposed method top-down selects feature channels based on the key features of task target calculated by current task scenarios reasoning. Meanwhile, the scale variable round is used as focus of attention. The proposed method can have practical meaning, and universally meet the complex situations that the size of task target is not fixed in order to improve the hit rate of targets.
Computational models can reflect the complexity of human behaviour by implementing multiple constraints within their architecture, and/or by taking into account the variety and richness of the environment to which the human is responding. We explore the second alternative in a model of word recognition that learns to map spoken words to visual and semantic representations of the words' concepts. Critically, we employ a phonological representation utilising coarse-coding of the auditory stream, to mimic early stages of language development that are not dependent on individual phonemes to be isolated in the input, which may be a consequence of literacy development. The model was tested at different stages during training, and was able to simulate key behavioural features of word recognition in children: a developing effect of semantic information as a consequence of language learning, and a small but earlier effect of phonological information on word processing. We additionally tested the role of visual information in word processing, generating predictions for behavioural studies, showing that visual information could have a larger effect than semantics on children's performance, but that again this affects recognition later in word processing than phonological information. The model also provides further predictions for performance of a mature word recognition system in the absence of fine-coding of phonology, such as in adults who have low literacy skills. The model demonstrated that such phonological effects may be reduced but are still evident even when multiple distractors from various modalities are present in the listener's environment. The model demonstrates that complexity in word recognition can emerge from a simple associative system responding to the interactions between multiple sources of information in the language learner's environment.