Please login to be able to save your searches and receive alerts for new content matching your search criteria.
Introduction:A combination of internal characteristics, such as motivation, personality, beliefs, and dispositions that combine with external circumstances to affect student results is referred to as teaching efficacy. The English-speaking environment becomes a major problem as students learn by interaction. To communicate effectively, they do not acquire the speaking and listening abilities necessary in English.
Objective:This study examines the impact of computer programming courses and learning analytics on student’s computer programming skills.
Methods:In this study, we proposed a novel Dung Beetle Optimized Flexible Random Forest (DBO-FRF) to evaluate the teaching effectiveness. In this study, 175 students’ data were collected for teaching effectiveness. Using student data, prediction model was constructed based on the attributes of the students, their past academic records, their interactions with online resources, and their advancement in laboratory work related to programming. The proposed method is compared to other traditional algorithms.
Results and conclusion:The proposed method is implemented using Python software. The expected performance in the course and, in the instance that any submitted programs failed to meet the requirements, a programming recommendation from one of the class’s top students. The result shows the proposed method achieved better performance in terms of accuracy, precision, recall, and F1-score. This decreased the performance gap between students who performed lower and those who performed higher, enabling students who adjusted their programs to learn more.
This paper presents a systematic literature review on optimizing feature extraction for palm and wrist multimodal biometrics. Identifying informative features across different modalities can be computationally expensive and time-consuming in such complex systems. Optimization techniques can streamline this process, making it more efficient thereby improving accuracy and reliability. The paper frames four research questions on input traits, approaches for feature extraction, classification approaches, and performance metrics of image data. The search query is generated based on the research questions that help retrieve the information on the above parameters. The focus of this paper is to provide the comprehensive and exhaustive gestalt of the appropriate input traits for image data from the information retrieved as well as optimal feature extraction and selection. However, the paper also intends to highlight the various classification approaches taken as well as the performance indicators against those classifiers. Further, the paper aims to analyze the effectiveness of various filtering techniques in eliminating image noise and improving overall system performance using MATLAB 2018. The paper concludes that a combination of palm and wrist biometrics could be a good input-trait combination. This work is novel as it covers multi-faceted processing, addressing various aspects of optimizing feature extraction and selection for palm and wrist multimodal biometrics.
During the last few years, many algorithms have been proposed in particular for face recognition using classical 2-D images. However, it is necessary to deal with occlusions when the subject is wearing sunglasses, scarves and such. In the same way, ear recognition is arising as a new promising biometric for people recognition, even if the related literature appears to be somewhat underdeveloped. In this paper, several hybrid face/ear recognition systems are investigated. The system is based on IFS (Iterated Function Systems) theory that are applied on both face and ear resulting in a bimodal architecture. One advantage is that the information used for the indexing and recognition task of face/ear can be made local, and this makes the method more robust to possible occlusions. The distribution of similarities in the input images is exploited as a signature for the identity of the subject. The amount of information provided by each component of the face and the ear image has been assessed, first independently and then jointly. At last, results underline that the system significantly outperforms the existing approaches in the state of the art.
Video Multimodal Entity Linking (VMEL) is a task to link entities mentioned in videos to entities in multimodal knowledge bases. However, current entity linking methods primarily focus on text and image modalities, neglecting the significance of video modality. To address this challenge, we propose a novel framework called the multi-perspective enhanced Subgraph Contrastive Network (SCMEL) and construct a VMEL dataset named SceneMEL, based on tourism domain. We first integrate textual, auditory and visual modal contexts of videos to generate a comprehensive high-recall candidate entity set. Furthermore, a semantic-enhanced video description subgraph generation module is utilized to convert videos into a multimodal feature graph structure and perform subgraph sampling on the domain-specific knowledge graph. Lastly, we conduct contrastive learning on local perspectives (text, audio, visual) within the video subgraphs and the knowledge graph subgraphs, as well as global perspectives, to capture fine-grained semantic information about videos and entities. A series of experimental results on SceneMel demonstrate the effectiveness of the proposed approach.
In recent years because of the advances in computer vision research, free hand gestures have been explored as a means of human-computer interaction (HCI). Gestures in combination with speech can be an important step toward natural, multimodal HCI. However, interpretation of gestures in a multimodal setting can be a particularly challenging problem. In this paper, we propose an approach for studying multimodal HCI in the context of a computerized map. An implemented testbed allows us to conduct user studies and address issues toward understanding of hand gestures in a multimodal computer interface.
Absence of an adequate gesture classification in HCI makes gesture interpretation difficult. We formalize a method for bootstrapping the interpretation process by a semantic classification of gesture primitives in HCI context. We distinguish two main categories of gesture classes based on their spatio-temporal deixis. Results of user studies revealed that gesture primitives, originally extracted from weather map narration, form patterns of co-occurrence with speech parts in association with their meaning in a visual display control system. The results of these studies indicated two levels of gesture meaning: individual stroke and motion complex. These findings define a direction in approaching interpretation in natural gesture-speech interfaces.
In the process of learning, learners will express their emotions through a variety of forms, facial expressions and voice are more obvious, which are most easily obtained through computers. In the previous methods, it is mainly based on a single modal, such as expression, speech, text and so on. Due to the diversity of information, the accuracy rate of multimodal recognition is higher than that of single modal recognition. Therefore, this paper proposed a DNN-based multimodal learning emotion analysis method which combines video and speech to detect students’ learning emotion in real time. We use this method to automatically identify learning emotions in primary school English classroom. According to different learning emotions, the PAD emotion scale was used to correspond learning emotions with learning states. Teachers can judge students’ learning state according to the change of students’ learning emotions, so as to adjust teaching methods and strategies in time.
The toroidal tuned liquid column damper (TTLCD) has been known so far only for its multidirectional vibration control capabilities. The details of its liquid motion characteristics and energy dissipation capacity have been missing. To cover this gap, in this paper, experimental studies are carried out by shaking table tests and an effective three-dimensional computational fluid dynamics model of the TTLCD is developed. All parametric effects and the characteristics of the liquid motion in the TTLCD are addressed. The findings of the paper now clearly reveal that the TTLCD encompasses both an oscillatory liquid deflection as the first mode and a sloshing motion as the second mode vibration response. Accordingly, the results prove that the TTLCD operates not only multidirectionally but also as a unique multimodal device, which unites the properties of tuned liquid dampers (TLDs)/tuned sloshing dampers (TSDs) with tuned liquid column dampers (TLCDs).
Subdividing the human brain into several functionally distinct and spatially contiguous areas is important to understand the amazingly complex human cerebral cortex. However, adult aging is related to differences in the structure, function, and connectivity of brain areas, so that the single population subdivision does not apply to multiple age groups. Moreover, different modalities could provide affirmative and complementary information for the human brain subdivision. To obtain a more reasonable subdivision of the cerebral cortex, we make use of multimodal information to subdivide the human cerebral cortex across lifespan. Specifically, we first construct a population average functional connectivity matrix for each modality of each age group. Second, we separately calculate the population average similarity matrix for the cortical thickness and myelin modality of each age group. Finally, we fuse these population average matrixes to obtain the multimodal similarity matrix and feed it into the spectral clustering algorithm to generate the brain parcellation for each age group.
Depression is a prevalent mental condition, and it is essential to diagnose and treat patients as soon as possible to maximize their chances of rehabilitation and recovery. An intelligent detection model based on multimodal fusion technology is proposed based on the findings of this study to address the difficulties associated with depression detection. Text data and electroencephalogram (EEG) data are used in the model as representatives of subjective and objective nature, respectively. These data are processed by the BERT–TextCNN model and the CNN–LSTM model, which are responsible for processing them. While the CNN–LSTM model is able to handle time-series data in an effective manner, the BERT–TextCNN model is able to adequately capture the semantic information that is included in text data. This enables the model to consider the various features that are associated with the various types of data. In this research, a weighted fusion technique is utilized to combine the information contained within the two modal datasets. This strategy involves assigning a weight to the outcomes of each modal data processing in accordance with the degree of contribution that each modal data will make to produce the ultimate depression detection results. In regard to the task of depression identification, the suggested model demonstrates great validity and robustness, as demonstrated by the results of the experimental validation that we carried out on a dataset that we manufactured ourselves. A viable and intelligent solution for the early identification of depression is provided by the proposed model. This solution will likely be widely utilized in clinical practice and will provide new ideas and approaches for the growth of the field of precision medicine.
Information from many different sensory modalities converges on the medial temporal lobe in the mammalian brain, an area that is known to be involved in the formation of episodic memories. Neurons in this region, called place cells, display location-correlated activity. Because it is not feasible to record all neurons using current electrophysiological techniques, it is difficult to address the mechanisms by which different sensory modalities are combined to form place field activity. To address this limitation, this paper presents an embodied neural simulation of the medial temporal lobe and other cortical structures, in which all aspects of the model can be examined during a maze navigation task. The neural simulation has realistic neuroanatomical connectivity. It uses a rate code model where a single neuronal unit represents the local field potential of a pool of neurons. The dynamics of these neuronal units are based on measured neurophysiological parameters. The model is embodied in a mobile device with multiple sensory modalities. Neural activity and behavior are analyzed both in the normal condition and after sensory lesions. Place field activity arose in the model through plasticity, and it continued even when one or more sensory modalities were lesioned. An analysis that traced through all neural circuits in the model revealed that many different pathways led to the same place activity, i.e., these pathways were degenerate. After sensory lesions, the pathways leading to place activity had even greater degeneracy, but more of this variance occurred in entorhinal cortex and sensory areas than in hippocampus. This model predicts that when examining neurons causing place activity in rodents, hippocampal neurons are more likely than entorhinal or sensory neurons to maintain involvement in the circuit after sensory deprivation.
To be accepted as a part of our everyday lives, companion robots will require the capability to communicate socially, recognizing people's behavior and responding appropriately. In particular, we hypothesized that a humanoid robot should be able to recognize affectionate touches conveying liking or dislike because (a) a humanoid form elicits expectations of a high degree of social intelligence, (b) touch behavior plays a fundamental and crucial role in human bonding, and (c) robotic responses providing affection could contribute to people's quality of life. The hypothesis that people will seek to affectionately touch a robot needed to be verified because robots are typically not soft or warm like humans, and people can communicate through various other modalities such as vision and sound. The main challenge faced was that people's social norms are highly complex, involving behavior in multiple channels. To deal with this challenge, we adopted an approach in which we analyzed free interactions and also asked participants to rate short video-clips depicting human–robot interaction. As a result, we verified that touch plays an important part in the communication of affection from a person to a humanoid robot considered capable of recognizing cues in touch, vision, and sound. Our results suggest that designers of affectionate interactions with a humanoid robot should not ignore the fundamental modality of touch.
Porphyrin-based molecules are actively studied as dual function theranostics: fluorescence-based imaging for diagnostics and fluorescence-guided therapeutic treatment of cancers. The intrinsic fluorescent and photodynamic properties of the bimodal molecules allows for these theranostic approaches. Several porphyrinoids bearing both hydrophilic and/or hydrophobic units at their periphery have been developed for the aforementioned applications, but better tumor selectivity and high efficacy to destroy tumor cells is always a key setback for their use. Another issue related to their effective clinical use is that, most of these chromophores form aggregates under physiological conditions. Nanomaterials that are known to possess incredible properties that cannot be achieved from their bulk systems can serve as carriers for these chromophores. Porphyrinoids, when conjugated with nanomaterials, can be enabled to perform as multifunctional nanomedicine devices. The integrated properties of these porphyrinoid-nanomaterial conjugated systems make them useful for selective drug delivery, theranostic capabilities, and multimodal bioimaging. This review highlights the use of porphyrins, chlorins, bacteriochlorins, phthalocyanines and naphthalocyanines as well as their multifunctional nanodevices in various biomedical theranostic platforms.
With the rise in use of social media to promote branded products, the demand for effective influencer marketing has increased. Brands are looking for improved ways to identify valuable influencers among a vast catalogue; this is even more challenging with micro-influencers, which are more affordable than mainstream ones but difficult to discover. In this paper, we propose a novel multi-task learning framework to improve the state of the art in micro-influencer ranking based on multimedia content. Moreover, since the visual congruence between a brand and influencer has been shown to be a good measure of compatibility, we provide an effective visual method for interpreting our model’s decisions, which can also be used to inform brands’ media strategies. We compare with the current state of the art on a recently constructed public dataset and we show significant improvement both in terms of accuracy and model complexity. We also introduce a methodology for tuning the image and text contribution to the final ranking score. The techniques for ranking and interpretation presented in this work can be generalized to arbitrary multimedia ranking tasks that have datasets with a similar structure.
Early detection of vulnerable plaques is the critical step in the prevention of acute coronary events. Morphology, composition, and mechanical property of a coronary artery have been demonstrated to be the key characteristics for the identification of vulnerable plaques. Several intravascular multimodal imaging technologies providing co-registered simultaneous images have been developed and applied in clinical studies to improve the characterization of atherosclerosis. In this paper, the authors review the present system and probe designs of representative intravascular multimodal techniques. In addition, the scientific innovations, potential limitations, and future directions of these technologies are also discussed.
The Putonghua Proficiency Test (Putonghua Shuiping Ceshi, PSC) is a speaking test in China. The propositional speaking section in PSC focuses on the ability of speakers to express ideas fluently and accurately without textual reference. However, unlike other sections of the PSC, propositional speaking is still scored manually, which can result in inefficiency, high costs, and subjectivity. To address these issues, an automatic speech fluency evaluation method based on multimodality is proposed. First, different neural networks are used to extract unimodal features. Then, cross-modal attention is applied to achieve multimodal fusion. Finally, fluency evaluation results are obtained using self-attention to reinforce high-contributing information. The proposed method achieves 81.67% accuracy on a self-built dataset, demonstrating that combining textual and acoustic features provides complementary information to improve automatic speech fluency evaluation accuracy.
Porphyrin-based molecules are actively studied as dual function theranostics: fluorescence-based imaging for diagnostics and fluorescence-guided therapeutic treatment of cancers. The intrinsic fluorescent and photodynamic properties of the bimodal molecules allows for these theranostic approaches. Several porphyrinoids bearing both hydrophilic and/or hydrophobic units at their periphery have been developed for the aforementioned applications, but better tumor selectivity and high efficacy to destroy tumor cells is always a key setback for their use. Another issue related to their effective clinical use is that, most of these chromophores form aggregates under physiological conditions. Nanomaterials that are known to possess incredible properties that cannot be achieved from their bulk systems can serve as carriers for these chromophores. Porphyrinoids, when conjugated with nanomaterials, can be enabled to perform as multifunctional nanomedicine devices. The integrated properties of these porphyrinoid-nanomaterial conjugated systems make them useful for selective drug delivery, theranostic capabilities, and multimodal bioimaging. This review highlights the use of porphyrins, chlorins, bacteriochlorins, phthalocyanines and naphthalocyanines as well as their multifunctional nanodevices in various biomedical theranostic platforms.