Please login to be able to save your searches and receive alerts for new content matching your search criteria.
With the rapid development of computer-embedded network technology, Chinese word segmentation system, as one of the key technologies in the field of natural language processing, has increasingly attracted attention for its automatic filling and analysis function. The implementation of this function is of great significance for improving text processing efficiency and enhancing information extraction accuracy. This paper conducts in-depth research on the automatic filling mechanism of Chinese word segmentation systems based on computer-embedded network analysis. Firstly, we elaborated on the basic principles of Chinese word segmentation systems and their applications in network environments. The word segmentation system divides continuous Chinese character sequences into meaningful lexical units, providing a foundation for subsequent natural language processing tasks. With the support of embedded computer networks, word segmentation systems can achieve real-time and efficient text processing, meeting the needs of various application scenarios. Next, this paper focuses on analyzing the operation process of the automatic filling mechanism in the Chinese word segmentation system. The automatic filling function is based on a large-scale corpus and advanced algorithm models, and achieves automatic prediction and filling of unknown vocabulary or phrases by learning and recognizing vocabulary patterns in the text. This mechanism not only improves the accuracy of word segmentation, but also greatly enhances the automation level of text processing. In addition, we also explored the key factors that affect the automatic filling effect of Chinese word segmentation systems. These factors include the size and quality of the corpus, the complexity and generalization ability of the algorithm model, as well as text types and domain characteristics. We have proposed a series of optimization strategies and suggestions to further improve the accuracy and efficiency of automatic filling analysis in response to these factors.
Recently, new possibilities for its web portal application have been presented by the automatic discovery of information in educational data, which has been expanding its horizons. The recommendation of university courses for high school students is an unexplored area. Traditional recommendation systems rely on collaborative filtering, which, in this instance, does not apply since the number of items and users needed to achieve high-performance levels is too small. One of the most well-known recommendation approaches in the course recommendation career is Natural Language Processing (NLP). Association rules have been added using data from alumni and assessments of student recommendations. The rules have been created by first mining data about the course that students completed, then comparing the discovered relationships to the existing course tree to improve it. Hence, a Natural Language Processing-based Personalized Recommendation System (NLP-PRS) has been established to help students choose courses that would take them to their desired careers by analyzing data from alumni and job offers. This study can use the alumni web portal and content formation through NLP to extract relevant information from course programs. It then presents recommendations based on students’ performance, interests, and results in each course that makes up each program. In addition, for the best results in improving their learning skills, students’ decisions should inform the recommended compliance. Because students vary in preferences (based on prior knowledge, learning style, emotional state, etc.), it is impossible to recommend a set timeframe for the recommended sequence. Based on the experimental outcome data, it gives the impression that the system is superior for allowing alumni to engage, share updates, and interact with current students who are alumni.
A text’s grammatical errors (GEs) are passages that are broken down to accept grammar standards. Resolving grammatical faults and inconsistencies in a document is the aim of GEs. Different strategies can focus on varying textual spans, from individual phrases to whole documents. Common GEs include improper word choice, punctuation, and syntax. The development of natural language processing (NLP) has fundamentally changed the way textual data is analyzed and processed, leading to notable advancements in automated grammatical mistake detection systems. In this study, we proposed a novel starling murmuration-optimized dense recurrent neural network (SMO-DRNN) model for the detection of English grammar errors. In this study, we collected text samples that were analyzed by marking different types of GEs. Automatic reading also involves converting textual data from English compositions into numerical values for further computation. Data pre-processing techniques include tokenization, stop word removal, stemming, and lemmatization. To extract relevant information from the pre-processing data, term frequency-inverse document frequency (TF-IDF) feature extraction was used for accurate grammar detection. The proposed approach is compared to the traditional algorithms. The overall results show that the proposed approach performed better than the existing method in terms of accuracy and loss, ROC (0.95), recall (92%), precision (96%), and F-0.5 score (94%), for identifying English grammar errors. The suggested method effectively combines current NLP strategies to offer a highly accurate method for identifying grammar mistakes in English.
Automatic abstraction has been a more prevalent technical demand in text analysis. However, due to the complex relationship between sentence structure and semantics, current automatic abstraction models in English texts still suffer from some challenges. Therefore, this paper explores a new type of English text automatic abstraction model that combines sentiment analysis and semantic perception technology. First, it is supposed to make serialization and extraction of English texts, resulting in multi-level sequential text. Then, a hierarchical multihead attention mechanism is introduced to achieve dynamic word vectorization. Finally, it is input into a multi-angle convolutional network, in which each token-level vector in the sentence is input for multiangle sentiment analysis and double-layer semantic perception. Hence, the automatic abstract result for long texts can be achieved. After that, experiments are conducted on multiple English text datasets to compare the proposal with other benchmark methods. The results show that the model has achieved significant improvement in the generation of text summaries, which can more effectively generate text summaries with rich emotions and accurate semantics.
Arabic dialect identification (ADI) is a specific task of natural language processing (NLP) that intends to forecast the Arabic language dialect of the input text automatically. ADI is the preliminary step toward establishing many NLP applications, including cross-language text generation, multilingual text-to-speech synthesis, and machine translation. The automatic classification of the Arabic dialect is the first step in various dialect-sensitive Arabic NLP tasks. ADI includes predicting the dialects related to the textual input and classifying them on their respective labels. As a result, increased interest has been gained in the last few decades to address the problems of ADI through deep learning (DL) and machine learning (ML) algorithms. The study develops an Arabic multi-class dialect recognition using fast random opposition-based fractals learning aquila optimizer with DL (FROBLAO-DL) technique. The FROBLAO-DL technique utilizes the optimal DL model to identify distinct types of Arabic dialects. In the FROBLAO-DL technique, data preprocessing is involved in cleaning the input Arabic dialect dataset. In addition, the ROBERTa word embedding process is used to generate word embedding. The FROBLAO-DL technique uses attention bidirectional long short-term memory (ABiLSTM) network to identify distinct Arabic dialects. Moreover, the ABiLSTM model’s hyperparameter tuning is implemented using the FROBLOA method. The performance evaluation of the FROBLAO-DL method is tested under the Arabic dialect dataset. The empirical analysis implies the supremacy of the FROBLAO-DL technique over recent approaches under various measures.
Sentiment analysis (SA) is an essential application of machine learning (ML) and natural language processing (NLP) that comprises the automatic extraction of opinions or sentiments presented in textual data. By leveraging methods to distinguish the expressive nature conveyed in written content, SA permits businesses and research workers to gain valuable insights into social media discourse, customer feedback, and public reviews. In the field of SA, the synergy of Applied Linguistics and Artificial Intelligence (AI) has led to a robust method that goes beyond conventional methods. By incorporating linguistic principles into AI methods, this interdisciplinary collaboration allows a more nuanced perception of human sentiments expressed in language. Applied Linguistics offers the theoretical basis for understanding the details of pragmatics, semantics, and linguistic structures, while AI algorithms leverage this knowledge for analyzing large datasets with notable accuracy. This study presents an Applied Linguistics-driven Artificial intelligence Approach for SA and Classification (ALAIA-SAC) system in social media. The primary intention of the ALAIA-SAC technique is to apply an attention mechanism with a fractal hyperparameter-tuned deep learning (DL) method for identifying sentiments. In the ALAIA-SAC technique, data preprocessing takes place in several stages to convert the input data into a compatible format. In addition, the TF-IDF model could be employed for the word embedding method. The self-attention directional long short-term memory (SBiLSTM) model is used for sentiment classification. Finally, the SBiLSTM model’s hyperparameter selection is performed using a Fractal Pelican optimization algorithm (FPOA). The experimentation results of the ALAIA-SAC method are assessed under two benchmark datasets. The comparative study of the ALAIA-SAC technique exhibited a superior accuracy value of 99.17% and 99.39% under Twitter US Airlines and IMDB datasets.
The Arabic language has different variants that can roughly be classified into three major types, namely Modern Standard Arabic (MSA), Dialectal Arabic (DA), and Classical Arabic (CA). There are slight variations between CA and MSA regarding pronunciation, syntax, and terminology. On the other hand, DA is quite different from MSA and CA in that it reflects the country of origin, or at least the geographic location of the speaker if the mobility factor is considered. Deep learning demonstrated its effectiveness in DA detection tasks, leveraging neural network models such as transformer models, Convolutional Neural Networks (CNNs), and Recurrent Neural Networks (RNNs). Researchers have established models that automatically discriminate between Arabic dialects by processing contextual information, linguistic features, and phonetic patterns embedded in large datasets. Using Deep Learning (DL) in Arabic dialect detection contributes in various complex systems to the development of natural language processing. It assists in addressing the problems related to a wide variety of Arabic dialects, which facilitate applications including machine translation, speech recognition, and sentimental analysis tailored to regional linguistic variation. This study suggests an Enhanced Fractal Cheetah Optimization Algorithm with Deep Learning for Integrated Arabic Dialect Identification (ECOADL-IADI) technique. The ECOADL-IADI technique mainly intends to classify the variations of the Arabic language into multiple classes. At the preliminary level, the ECOADL-IADI technique performs preprocessing of Arabic text. Next, the BERT word embedding process is carried out. The ECOADL-IADI technique applies a Bidirectional Recurrent Neural Network (BiRNN) model for Arabic dialect identification. Lastly, the ECOA-based hyperparameter selection process improves the classifier outcomes of the BiRNN model. A wide range of experiments were conducted to examine the detection results of the ECODL-IADI technique. The extensive results stated that the ECODL-IADI technique performs well compared to other models.
In the new era of digital communications, cyberbullying is a significant concern for society. Cyberbullying can negatively impact stakeholders and can vary from psychological to pathological, such as self-isolation, depression and anxiety potentially leading to suicide. Hence, detecting any act of cyberbullying in an automated manner will be helpful for stakeholders to prevent any unfortunate results from the victim’s perspective. Data-driven approaches, such as machine learning (ML), particularly deep learning (DL), have shown promising results. However, the meta-analysis shows that ML approaches, particularly DL, have not been extensively studied for the Arabic text classification of cyberbullying. Therefore, in this study, we conduct a performance evaluation and comparison for various DL algorithms (LSTM, GRU, LSTM-ATT, CNN-BLSTM, CNN-LSTM and LSTM-TCN) on different datasets of Arabic cyberbullying to obtain more precise and dependable findings. As a result of the models’ evaluation, a hybrid DL model is proposed that combines the best characteristics of the baseline models CNN, BLSTM and GRU for identifying cyberbullying. The proposed hybrid model improves the accuracy of all the studied datasets and can be integrated into different social media sites to automatically detect cyberbullying from Arabic social datasets. It has the potential to significantly reduce cyberbullying. The application of DL to cyberbullying detection problems within Arabic text classification can be considered a novel approach due to the complexity of the problem and the tedious process involved, besides the scarcity of relevant research studies.
Misinformation is a pervasive issue in today’s society, with the spread of false or misleading information having potentially far-reaching consequences. In recent years, there has been a growing interest in using Artificial Intelligence (AI) technologies, such as Natural Language Processing (NLP) and machine learning, to detect and combat the spread of misinformation. In this study, we compare the performance of Long Short-Term Memory (LSTM) and Bidirectional LSTM (Bi-LSTM) in misinformation detection. We conducted experiments on two public datasets: “ISOT misinformation Dataset”. We trained LSTM and Bi-LSTM models on the preprocessed datasets and evaluated their performance using various evaluation metrics such as accuracy, precision, recall, and F1-score.
Mental health (MH) assessment and prediction have become critical areas of focus in healthcare, leveraging developments in natural language processing (NLP). Recent advancements in machine learning have facilitated the exploration of predictive models for MH based on user-generated comments that overlooked the integration of emotional attention mechanisms. The methods often struggle with contextual nuances and emotional subtleties, leading to suboptimal predictions. The prevailing challenge lies in accurately understanding the emotional context embedded within textual comments, which is crucial for effective prediction and intervention. In this research, we introduce a novel approach employing contextual emotional transformer-based models (CETM) for comment analysis in MH case prediction. CETM leverages state-of-the-art transformer architectures enhanced with contextual embedding layers and emotional attention mechanisms for MH case prediction. By incorporating contextual information and emotional cues, CETM captures the underlying emotional states and MH indicators expressed in user comments. Through extensive experimentation and evaluation, both Roberta and bidirectional encoder representations from transformers (BERT) models exhibited enhanced accuracy, precision, recall, and F1 scores compared to their counterparts lacking emotional attention. Notably, the Roberta model attained a greater accuracy of 94.5% when matched to BERT’s 87.6% when emotional attention was employed. Hence, by incorporating emotional context into the predictive model, we achieved significant improvements, which offers promising avenues for more precise and personalized MH interventions.
The Sanskrit language holds significant importance in Indian culture because it has been extensively used in religious literature, primarily in Hinduism. Numerous ancient Hindu texts originally composed in Sanskrit have since been translated into various Indian and non-Indian languages by Indian and foreign authors. These translations offer a renewed cultural perspective and broaden the reach of Indian literature to a global audience. However, the manual translations of these religious texts often lack thorough validation. Recent advancements in semantic and sentiment analysis, powered by deep learning, have provided enhanced tools for understanding language and text. In this paper, we present a framework that uses semantic and sentiment analysis to validate the English translation of the Ramayana against its original Sanskrit version. The “Ramayana” which narrates the journey of the Rama, the king of Ayodhya, is an ancient Hindu epic written by the sage Valmiki. It is known for its contribution to human values for centuries and has universal relevance. Given the importance of Sanskrit in Indian culture and its influence on literature, understanding the translations of key texts like the Ramayana is essential. Multilingual Bidirectional Encoder Representations from Transformers (mBERT) model is utilized to analyze the selected chapters of the English and the Sanskrit versions of Ramayana. Our analysis reveals that sentiment and semantic alignment between the original Sanskrit and English translations remain consistent despite stylistic and vocabulary differences. The study also compares the findings of Bidirectional Encoder Representations from Transformers (BERT) with its other variants to examine which BERT variant is more suitable for validating Sanskrit text. The paper demonstrates the potential of deep learning techniques for cross-lingual validation of ancient texts.
The integration of Artificial Intelligence (AI) and sign language recognition is a hot topic in the field of AI+Science, aimed at addressing communication barriers faced by the deaf and hard-of-hearing communities. This paper examines the integration of AI with sign language recognition, highlighting its potential to bridge communication gaps for the deaf and hard-of-hearing. It reviews the evolution of sign language recognition from data gloves to computer vision and underscores the role of extensive databases. The paper also discusses the benefits of multi-modal AI models in enhancing recognition accuracy. It highlights the importance of government and industry support, ethical data practices, and user-centered design in advancing this technology. The challenges and opportunities of integrating this technology into daily life, including technical, interface, and ethical considerations, are explored, emphasizing the need for user-focused solutions and innovative technical approaches.
The paper formulates the Hays and Gaifman dependency grammar (HGDG) in terms of constraints on a string based encoding of dependency trees and develops an approach to obtain a regular approximation for these grammars. Our encoding of dependency trees uses brackets in a novel fashion: pairs of brackets indicate dependencies between pairs of positions rather than boundaries of phrases. This leads to several advantages: (i) HGDG rules over the balanced bracketing can be expressed using regular languages. (ii) A new homomorphic representation for context-free languages is obtained. (iii) A star-free regular approximation for the original projective dependency grammar is obtained by limiting the number of stacked dependencies. (iv) By relaxing certain constraints, the encoding can be extended to non-projective dependency trees and graphs, (v) strong generative power of HGDGs can now be characterized through sets of bracketed strings.
It is of great significance for individuals, enterprises, and government departments to analyze and excavate the sentiment in the comments. Many deep learning models are used for text sentiment analysis, and the BiTCN model has good efficacy on sentiment analysis. However, in the actual semantic expression, the contribution of each word to the sentiment tendency is different, BiTCN treats it fairly and does not pay more attention to the key sentiment words. For this problem, a sentiment analysis model based on the BiTCN-Attention is proposed in this paper. The Self-Attention mechanism and Multi-Head Self-Attention mechanism are added to BiTCN respectively to form BiTCN-SA and BiTCN-MHSA, which improve the weight of sentiment words and the accuracy of feature extraction, to increase the effect of sentiment analysis. The experimental results show that the model accuracies of BiTCN-SA and BiTCN-MHSA in the JingDong commodity review data set are 3.96% and 2.41% higher than that of BiTCN, respectively. In the comment data set of DianPing, the accuracy of BiTCN-SA and BiTCN-MHSA improved by 4.62% and 3.49%, respectively, compared with that of BiTCN.
Question answering aims at computing the answer to a question given a context with facts. Many proposals focus on questions whose answer is explicit in the context; lately, there has been an increasing interest in questions whose answer is not explicit and requires multi-hop inference to be computed. Our analysis of the literature reveals that there is a seminal proposal with increasingly complex follow-ups. Unfortunately, they were presented without an extensive study of their hyper-parameters, the experimental studies focused exclusively on English, and no statistical analysis to sustain the conclusions was ever performed. In this paper, we report on our experience devising a very simple neural approach to address the problem, on our extensive grid search over the space of hyper-parameters, on the results attained with English, Spanish, Hindi, and Portuguese, and sustain our conclusions with statistically sound analyses. Our findings prove that it is possible to beat many of the proposals in the literature with a very simple approach that was likely overlooked due to the difficulty to perform an extensive grid search, that the language does not have a statistically significant impact on the results, and that the empirical differences found among some existing proposals are not statistically significant.
Abstract Meaning Representation parsing aims to represent a sentence as a structured, Directed, Acyclic Graph (DAG), in an attempt to extract meaning from text. This paper extends an existing 2-stage pipeline AMR parser with state-of-the-art techniques in dependency parsing. First, Pointer-Generator Networks are used for out-of-vocabulary words in the concept identification stage, with an improved initialization via the use of word-and character-level embeddings. Second, the performance of the Relation Identification module is improved by jointly training the Heads Selection and the Arcs Labeling components. Last, we underline the difficulty of end-to-end training with recurrent modules in a static deep neural network construction approach and explore a dynamic construction implementation, which continuously adapts the computation graph, thus potentially enabling end-to-end training in the proposed pipeline solution.
Quantum Natural Language Processing (QNLP) is coded in the semantic space using a combined semantic distribution classification model with tensor operations. Its theoretical results on quantum circuit mapping and quantum semantic coding of text have been tested in practice with the recent development of quantum back-end equipment. In view of the small scale of quantum natural language processing tasks, single sentence structure of quantum semantic coding, low text coverage and lack of application, this paper proposes a combination of syntactic structure to extract text and extend the sentence components of quantum semantic coding, which improves the utilization of text processing task of quantum computing resources. In view of the fact that quantum natural language processing has few cases in specific applications, this paper studies the application expansion possibility of quantum text matching and question answering applications. The development path of classical natural language processing is referred to enhance the usefulness and explore the practical ability of QNLP in the current resource-constrained conditions.
Machine learning (ML) architectures based on neural model have garnered considerable attention in the field of language classification. Code-mixing is a common phenomenon on social networking sites for exhibiting opinion on a topic. The code-mixed text is the approach of mixing two or more languages. This paper describes the application of the code-mixed index in Indian social media texts and compares the complexity to identify language at the word level using Bi-directional Long Short-Term Memory model. The major contribution of the work is to propose a technique for identifying the language of Hindi–English code-mixed data used in three social media platforms namely, Facebook, Twitter and WhatsApp. Here, we demonstrate that a special class of quantum LSTM network model is capable of learning and accurately predicting the languages used in social media texts. Our work paves the way for future applications of machine learning methods in quantum dynamics without relying on the explicit form of the Hamiltonian.
Image captioning is a multidisciplinary artificial intelligence (AI) research task that has captures the interest of both image and natural language processing experts. Image captioning is a complex problem as it sometimes requires accessing the information that may not be directly visualized in a given scene. It possibly will require common sense interpretation or the detailed knowledge about the object present in image. In this paper, we have given a method that utilizes both visual and external knowledge from knowledge bases such as ConceptNet for better description the images. We demonstrated the usefulness of the method on two publicly available datasets; Flickr8k and Flickr30k.The results explain that the proposed model outperforms the state-of-the art approaches for generating image captions. At last, we will talk about possible future prospects in image captioning.
Many natural language processing tasks, such as word sense disambiguation, knowledge acquisition, information retrieval, use semantically tagged corpora. Till recently, these corpus-based systems relied on text manually annotated with semantic tags; but the massive human intervention in this process has become a serious impediment in building robust systems. In this paper, we present AutoASC, a system which automatically acquires sense tagged corpora. It is based on (1) the information provided in WordNet, particularly the word definitions found within the glosses and (2) the information gathered from Internet using existing search engines. The system was tested on a set of 46 concepts, for which 2071 example sentences have been acquired; for these, a precision of 87% was observed.