You do not have any saved searches
In this paper, a novel speech recognition method is proposed to improve the recognition efficiency and accuracy of English translators in noisy environments. First, by using an efficient noise suppression algorithm, the interference of background noise to the recognition process is significantly reduced. Then, deep neural networks are used to enhance the adaptability to various noise environments, further improving the stability and accuracy of recognition. In the feature extraction stage, this paper focuses on Mel-Frequency Cepstral Coefficients and Mayer filter bank features, which lays a solid foundation for the application of non-autoregressive Transformer. Finally, non-autoregressive Transformer technology is adopted in this study, which gives full play to its advantages in processing speed and efficiency, ensuring fast and accurate speech recognition in complex noise environments. On the whole, the method in this paper not only improves the performance of speech recognition in noisy environment, but also provides a valuable reference for the research and application in related fields.
With the rapid evolution of educational technology, leveraging advanced methodologies for English instruction has become increasingly critical to addressing the growing demand for efficient and engaging language learning. Current approaches in English language teaching often fall short in personalization, adaptability, and learner engagement, primarily due to their static structure and limited integration of cognitive and technological advancements. To bridge this gap, we propose a novel framework grounded in deep neural networks to enhance English Audio Visual Oral (AVO) instruction, aligning with the thematic scope of computational advancements in education. This study introduces the Adaptive Cognitive Learning Model (ACLM), a pedagogical innovation designed to dynamically adjust teaching strategies to individual learner profiles by integrating real-time performance feedback, modular content delivery, and multimedia-assisted learning. The ACLM employs a systematic feedback loop and adaptive mechanisms to personalize learning pathways, addressing domain-specific challenges such as vocabulary acquisition, grammar comprehension, and conversational fluency. Experimental evaluations demonstrate that our method significantly improves learner outcomes in engagement, comprehension, and retention compared to traditional approaches. These findings underscore the potential of combining cognitive alignment with dynamic neural networks to establish scalable, personalized, and effective instructional strategies in English AVO education, contributing to advancements in computational language pedagogy.
A sovereign bond market offers a wide range of opportunities for public and private sector financing and has drawn the interest of both scholars and professionals as they are the main instrument of most fixed-income asset markets. Numerous works have studied the behavior of sovereign bonds at the microeconomic level, given that a domestic securities market can enhance overall financial stability and improve financial market intermediation. Nevertheless, they do not deepen methods that identify liquidity risks in bond markets. This study introduces a new model for predicting unexpected situations of speculative attacks in the government bond market, applying methods of deep learning neural networks, which proactively identify and quantify financial market risks. Our approach has a strong impact in anticipating possible speculative actions against the sovereign bond market and liquidity risks, so the aspect of the potential effect on the systemic risk is of high importance.
Speech enhancement methods differ depending on the degree of degradation and noise in the speech signal, so research in the field is still difficult, especially when dealing with residual and background noise, which is highly transient. Numerous deep learning networks have been developed that provide promising results for improving the perceptual quality and intelligibility of noisy speech. Innovation and research in speech enhancement have been opened up by the power of deep learning techniques with implications across a wide range of real time applications. By reviewing the important datasets, feature extraction methods, deep learning models, training algorithms and evaluation metrics for speech enhancement, this paper provides a comprehensive overview. We begin by tracing the evolution of speech enhancement research, from early approaches to recent advances in deep learning architectures. By analyzing and comparing the approaches to solving speech enhancement challenges, we categorize them according to their strengths and weaknesses. Moreover, we discuss the challenges and future directions of deep learning in speech enhancement, including the demand for parameter-efficient models for speech enhancement. The purpose of this paper is to examine the development of the field, compare and contrast different approaches, and highlight future directions as well as challenges for further research.
In this paper, we consider nonlinear partial integro-differential equations (PIDEs) with gradient-independent Lipschitz continuous nonlinearities and prove that deep neural networks with ReLU activation function can approximate solutions of such semilinear PIDEs without curse of dimensionality in the sense that the required number of parameters in the deep neural networks increases at most polynomially in both the dimension d of the corresponding PIDE and the reciprocal of the prescribed accuracy 𝜖.
Identifying a core set of features is one of the most important steps in the development of an automated seizure detector. In most of the published studies describing features and seizure classifiers, the features were hand-engineered, which may not be optimal. The main goal of the present paper is using deep convolutional neural networks (CNNs) and random forest to automatically optimize feature selection and classification. The input of the proposed classifier is raw multi-channel EEG and the output is the class label: seizure/nonseizure. By training this network, the required features are optimized, while fitting a nonlinear classifier on the features. After training the network with EEG recordings of 26 neonates, five end layers performing the classification were replaced with a random forest classifier in order to improve the performance. This resulted in a false alarm rate of 0.9 per hour and seizure detection rate of 77% using a test set of EEG recordings of 22 neonates that also included dubious seizures. The newly proposed CNN classifier outperformed three data-driven feature-based approaches and performed similar to a previously developed heuristic method.
Linear assignment problems are well-known combinatorial optimization problems involving domains such as logistics, robotics and telecommunications. In general, obtaining an optimal solution to such problems is computationally infeasible even in small settings, so heuristic algorithms are often used to find near-optimal solutions. In order to attain the right assignment permutation, this study investigates a general-purpose learning strategy that uses a bipartite graph to describe the problem structure and a message passing Graph Neural Network (GNN) model to learn the correct mapping. Comparing the proposed structure with two existing DNN solutions, simulation results show that the proposed approach significantly improves classification accuracy, proving to be very efficient in terms of processing time and memory requirements, due to its inherent parameter sharing capability. Among the many practical uses that require solving allocation problems in everyday scenarios, we decided to apply the proposed approach to address the scheduling of electric smart meters access within an electricity distribution smart grid infrastructure, since near-real-time energy monitoring is a key element of the green transition that has become increasingly important in recent times. The results obtained show that the proposed graph-based solver, although sub-optimal, exhibits the highest scalability, compared with other state-of-the-art heuristic approaches. To foster the reproducibility of the results, we made the code available at https://github.com/aircarlo/GNN_LSAP.
In many modern machine learning (ML) models, attention mechanisms (AMs) play a crucial role in processing data and identifying significant parts of the inputs, whether these are text or images. This selective focus enables subsequent stages of the model to achieve improved classification performance. Traditionally, AMs are applied as a preprocessing substructure before a neural network, such as in encoder/decoder architectures. In this paper, we extend the application of AMs to intermediate stages of data propagation within ML models. Specifically, we propose a generalized attention mechanism (GAM), which can be integrated before each layer of a neural network for classification tasks. The proposed GAM allows for at each layer/step of the ML architecture identification of the most relevant sections of the intermediate results. Our experimental results demonstrate that incorporating the proposed GAM into various ML models consistently enhances the accuracy of these models. This improvement is achieved with only a marginal increase in the number of parameters, which does not significantly affect the training time.
This paper proposes a novel optimization approach of the quantum-behaved binary firefly algorithm with a gravitational search algorithm (QBFA-GSA) for discrete feature optimization, which is utilized for the application of human activity recognition. The firefly algorithm (FA) and gravitational search algorithm (GSA) are recently introduced meta-heuristic algorithms that are efficient for optimizing the continuous solution set. The binarized version of the proposed approach enables it to optimize the discrete features and quantum behavior ensures the better diversity of the final optimized features. In the proposed QBFA-GSA approach, the features are optimized by following the combined advantageous attributes of FA and GSA in which the search space is initially explored by firefly agents until the current firefly finds the brighter firefly and further these agents adapt the attributes of GSA to complete the process. These optimized features are passed to deep neural networks (DNN) for the classification of human activities. Here, DNN models of deep convolutional neural networks (DCNN) and DCNN extended with residual blocks (DCNN-RB) are incorporated. The evaluation experiments for human activity recognition are conducted on a benchmark dataset of UCF-101, which is a composition of 101 different activities. The experimental results of the proposed QBFA-GSA approach are superlative to state-of-art techniques, which indicate that the proposed approach is efficient to optimize the features.
The rapid expansion of artificial intelligence technologies has enabled machines to comprehend emotional intelligence. Among various indicators, facial expressions serve as an effective medium for understanding emotions. The concept of facial expression recognition (FER) relies heavily on the accurate and robust features available. Initially, the method of three-channel convolutional neural networks (TC-CNN) is adapted to extract facial features. However, only extracting the features is insufficient, the optimization of the extracted features is crucial to determining precise and robust features. This research work focuses on the optimization of the features using the quantum-inspired vortex search algorithm (QVSA). The QVSA integrates the attributes of Q-bits into the vortex search algorithm (VSA), optimizing the features by using the Q-bits to determine the vortex center on the Bloch sphere. The Q-bit attributes also improve the diversity of the features and help to avoid the premature convergence of the VSA. The final recognition of the facial expressions is performed using the deep neural network method of ResNet101v2. The experiments for facial expression recognition are performed on the datasets of RaFD and KDEF, which include different facial positions such as front pose, diagonal pose and profile pose. Performance comparisons demonstrate the effectiveness of the proposed system over state-of-the-art facial expression techniques.
Using second-order optimization methods for training deep neural networks (DNNs) has attracted many researchers. A recently proposed method, Eigenvalue-corrected Kronecker Factorization (EKFAC), proposed an interpretation by viewing natural gradient update as a diagonal method and corrects the inaccurate re-scaling factor in the KFAC eigenbasis. What’s more, a new method to approximate the natural gradient called Trace-restricted Kronecker-factored Approximate Curvature (TKFAC) is also proposed, in which the Fisher information matrix (FIM) is approximated as a constant multiplied by the Kronecker product of two matrices and the traces can be kept equal before and after the approximation. In this work, we combine the ideas of these two methods and propose Trace-restricted Eigenvalue-corrected Kronecker Factorization (TEKFAC). The proposed method not only corrects the inexact re-scaling factor under the Kronecker-factored eigenbasis, but also considers the new approximation method and the effective damping technique adopted by TKFAC. We also discuss the differences and relationships among the related Kronecker-factored approximations. Empirically, our method outperforms SGD with momentum, Adam, EKFAC and TKFAC on several DNNs.
Domain adaptation aims to generalize the classification model from a source domain to a different but related target domain. Recent studies have revealed the benefit of deep convolutional features trained on a large dataset (e.g. ImageNet) in alleviating domain discrepancy. However, literatures show that the transferability of features decreases as (i) the difference between the source and target domains increases, or (ii) the layers are toward the top layers. Therefore, even with deep features, domain adaptation remains necessary. In this paper, we propose a novel unsupervised domain adaptation (UDA) model for deep neural networks, which is learned with the labeled source samples and the unlabeled target ones simultaneously. For target samples without labels, pseudo labels are assigned to them according to their maximum classification scores during training of the UDA model. However, due to the domain discrepancy, label noise generally is inevitable, which degrades the performance of the domain adaptation model. Thus, to effectively utilize the target samples, three specific robust deep softmax regression (RDSR) functions are performed for them with high, medium and low classification confidence respectively. Extensive experiments show that our method yields the state-of-the-art results, demonstrating the effectiveness of the robust deep softmax regression classifier in UDA.
There are limited coronavirus disease 2019 (COVID-19) testing kits, therefore, development of other diagnosis approaches is desirable. The doctors generally utilize chest X-rays and Computed Tomography (CT) scans to diagnose pneumonia, lung inflammation, abscesses, and/or enlarged lymph nodes. Since COVID-19 attacks the epithelial cells that line our respiratory tract, therefore, X-ray images are utilized in this paper, to classify the patients with infected (COVID-19 +ve) and uninfected (COVID-19 −ve) lungs. Almost all hospitals have X-ray imaging machines, therefore, the chest X-ray images can be used to test for COVID-19 without utilizing any kind of dedicated test kits. However, the chest X-ray-based COVID-19 classification requires a radiology expert and significant time, which is precious when COVID-19 infection is increasing at a rapid rate. Therefore, the development of an automated analysis approach is desirable to save the medical professionals’ valuable time. In this paper, a deep convolutional neural network (CNN) approach is designed and implemented. Besides, the hyper-parameters of CNN are tuned using Multi-objective Adaptive Differential Evolution (MADE). Extensive experiments are performed by considering the benchmark COVID-19 dataset. Comparative analysis reveals that the proposed technique outperforms the competitive machine learning models in terms of various performance metrics.
Improving the reliability of nighttime pedestrian detection is a crucial challenge towards the design of robust autonomous systems. Not surprisingly, most pedestrian fatalities occur in low-illumination settings, thus emphasizing the need for new algorithmic advances. This work presents a novel pedestrian detection approach that makes a number of crucial modifications to the state-of-the-art YOLOV5-PANet architecture, in order to improve the reliability of features extracted from nighttime images. More specifically, the proposed architecture systematically incorporates powerful shuffle attention mechanisms and a transformer module to improve the feature learning pipeline. Instead of advocating the use of other sensing modalities that are better suited for nighttime detection, our approach relies only on conventional RGB cameras and is hence broadly applicable. Our empirical studies with nighttime pedestrian detection benchmarks show that with only minimal increase in model complexity, our approach provides significant improvements in detection efficacy over existing solutions. Finally, we explore the impact of post-hoc network pruning on the speed-accuracy trade-off of our approach and demonstrate that it is well suited for reduced memory/compute requirements.
In the resource-constrained embedded systems, the designing of efficient deep neural networks is a challenging process, due to diversity in the artificial intelligence applications. The quantization in deep neural networks superiorly diminishes the storage and computational time by reducing the bit-width of networks encoding. In order to highlight the problem of accuracy loss, the quantization levels are automatically discovered using Policy Feedback-based Reinforcement Learning Method (PF-RELEQ). In this paper, the Proximal Policy Optimization with Policy Feedback (PPO-PF) technique is proposed to determine the best design decisions by choosing the optimum hyper-parameters. In order to enhance the sensitivity of the value function to the change of policy and to improve the accuracy of value estimation at the early learning stage, a policy update method is devised based on the clipped discount factor. In addition, specifically the loss functions of policy satisfy the unbiased estimation of the trust region. The proposed PF-RELEQ effectively balances quality and speed compared to other deep learning methods like ResNet-1202, ResNet-32, ResNet-110, GoogLeNet and AlexNet. The experimental analysis showed that PF-RELEQ achieved 20% computational work-load reduction compared to the existing deep learning methods on ImageNet, CIFAR-10, CIFAR-100 and tomato leaf disease datasets and achieved approximately 2% of improvisation in the validation accuracy. Additionally, the PF-RELEQ needs only 0.55 Graphics Processing Unit on an NVIDIA GTX-1080Ti to develop DNNs that delivers better accuracy improvement with fewer cycle counts for image classification.
With the development of emerging information technology, the traditional management methods of marine fishes are slowly replaced by new methods due to high cost, time-consumption and inaccurate management. The update of marine fishes management technology is also a great help for the creation of smart cities. However, some new methods have been studied that are too specific, which are not applicable for the other marine fishes, and the accuracy of identification is generally low. Therefore, this paper proposes an ecological Internet of Things (IoT) framework, in which a lightweight Deep Neural Networks model is implemented as a image recognition model for marine fishes, which is recorded as Fish-CNN. In this study, multi-training and evaluation of Fish-CNN is accomplished, and the accuracy of the final classification can be fixed to 89.89%–99.83%. Moreover, the final evaluation compared with Rem-CNN, Linear Regression and Multilayer Perceptron also verify the stability and advantage of our method.
Intelligent forecasting of economic indexes has been an important demand for sustainable management of smart cities. Existing methods for this purpose were mostly established upon the basis of economic mechanism. Econometric models are the most general technical means in this area. However, in era of digital economy, increasing amount of big data has brought great change to traditional production. It is becoming more difficult for conventional technological forecasting methods to deal with multi-dimensional economic indexes. To deal with such challenge, this paper introduces the artificial intelligence algorithms to implement automatic information processing, and proposes a deep neural network-based intelligent forecasting method for multi-dimensional economic indexes in smart cities. Specifically, a deep neural network with three-layer structure is developed as the backbone methodology. For empirical analysis, the real-world data from “Chengdu–Chongqing Economic Circle” in China from 2012 to 2022 are selected as the main simulation scenario. Four major indexes are selected as the main research object: gross product (GDP), per capita GDP, GDP growth rate and the proportion of tertiary industry in GDP. The experimental results show that the proposal can well deal with such forecasting problem from a data-driven perspective, with a proper forecasting effect on historical data.
Real-time images of faces captured in different spectrum bands are considered heterogeneous images. Heterogeneous Face Recognition (HFR) matches faces across domains and is crucial to public safety. This paper proposes an HFR approach based on Deep Neural Networks (DNN). Feature maps are extracted from two images, such as gallery and sketch images, using Squirrel Search Heterogeneous Convolutional-Capsule- Bidirectional Gated Recurrent Unit (S2HConv-Caps-BiGRU). As a method of efficiently recognizing faces, coupled representation similarity metric (CRSM) will use the measure for the similarity of two feature maps. The experimental results will be evaluated to state-of-the-art (SOTA) statistical measures in terms of accuracy, recall, Jaccard score, dice score, mean square error (MSE), image similarity, performance and root mean square error (RMSE). Compared to other SOTA, the model produces the best results. The accuracy value of a CUFS dataset is 98.7%.
With the increase of cyber-attacks on millions of Internet of Things (IoT) devices, the poor network security measures on those devices are the main source of the problem. This article aims to study a number of these machine learning algorithms available for their effectiveness in detecting malware in consumer internet of things devices. In particular, the Support Vector Machines (SVM), Random Forest, and Deep Neural Network (DNN) algorithms are utilized for a benchmark with a set of test data and compared as tools in safeguarding the deployment for IoT security. Test results on a set of 4 IoT devices exhibited that all three tested algorithms presented here detect the network anomalies with high accuracy. However, the deep neural network provides the highest coefficient of determination R2, and hence, it is identified as the most precise among the tested algorithms concerning the security of IoT devices based on the data sets we have undertaken.
Visualizing the decision making procedure of a deep neural network is one of the main challenges towards transparent and trustworthy artificial intelligence. This paper presents an approach which extracts latent variables from a trained network and, through clustering, constructs a set of anchors that represent the network’s data driven knowledge. This set is then used to inform users about the features that create network’s decision.
Please login to be able to save your searches and receive alerts for new content matching your search criteria.