The Sanskrit language holds significant importance in Indian culture because it has been extensively used in religious literature, primarily in Hinduism. Numerous ancient Hindu texts originally composed in Sanskrit have since been translated into various Indian and non-Indian languages by Indian and foreign authors. These translations offer a renewed cultural perspective and broaden the reach of Indian literature to a global audience. However, the manual translations of these religious texts often lack thorough validation. Recent advancements in semantic and sentiment analysis, powered by deep learning, have provided enhanced tools for understanding language and text. In this paper, we present a framework that uses semantic and sentiment analysis to validate the English translation of the Ramayana against its original Sanskrit version. The “Ramayana” which narrates the journey of the Rama, the king of Ayodhya, is an ancient Hindu epic written by the sage Valmiki. It is known for its contribution to human values for centuries and has universal relevance. Given the importance of Sanskrit in Indian culture and its influence on literature, understanding the translations of key texts like the Ramayana is essential. Multilingual Bidirectional Encoder Representations from Transformers (mBERT) model is utilized to analyze the selected chapters of the English and the Sanskrit versions of Ramayana. Our analysis reveals that sentiment and semantic alignment between the original Sanskrit and English translations remain consistent despite stylistic and vocabulary differences. The study also compares the findings of Bidirectional Encoder Representations from Transformers (BERT) with its other variants to examine which BERT variant is more suitable for validating Sanskrit text. The paper demonstrates the potential of deep learning techniques for cross-lingual validation of ancient texts.
Emotional analysis can be considered as a kind of classification of sentiment polarity in essence. Against the background of mass data processing, in order to increase the accuracy of judgment on the emotion conveyed by a text, a method to classify the emotional tendency of a text that combines Latent Semantic Analysis (LSA) and Support Vector Machine (SVM) is proposed herein. By this method, a semantic distance vector space modal of “word-document” is developed from semantic aspect following the method of LSA. Then, with the help of SVM that is featured by high classification accuracy and good generalization ability, the emotion is classified. At last, this paper proposed a parallel implementation of LSA-SVM algorithm. The algorithm is developed using Message Passing Interface (MPI) in parallel environment. Experiments show that the accuracy of this method is higher than that of the conventional SVM method in the Blog assessment where sentences are short and emotional tendency is evident, the classification accuracy in a test set approximates to 92.2%, and compared with the serial implementation, the parallel LSA-SVM algorithm increases efficiency significantly.
This article describes the work in the development of the speech understanding and dialog system EVAR. The relevant knowledge bases containing the raw linguistic knowledge and the preprocessors converting this to the specialized form needed by the processing algorithms are treated. Processing so far covers the level of the speech signal up to the level of pragmatic analysis. Some topics of ongoing and future work are mentioned briefly.
Any system for natural language processing must be based on a lexicon. Once a model has been defined, there is the problem of acquiring and inserting words. This task is tedious for a human operator; on the one hand he must not forget any of the words, and on the other the acquisition of a new concept requires the input of a number of parameters.
In view of these difficulties, research work has been undertaken in order to integrate pre-existing “paper” dictionaries. Nevertheless, these are not faultless, and are often incomplete when processing a very specialized technical field. We have therefore searched to mitigate these problems by automating the enrichment of an already partially integrated lexicon.
We work in a technical field on which we have gathered different sorts of texts: written texts, specialist interviews, technical reports, etc. These documents are stored in an object oriented database, and form part of a wider project, called REX (“Retour d’EXpérience” in French, or “Feedback of Experience” in English).
Our system, called ANA, reads the documents, analyses them, and deduces new knowledge, allowing the enrichment of the lexicon. The group of words already integrated into the lexicon form the “Bootstrap” of the discovery process of new words: it collects the instances of the different concepts thought to be interesting, in order to gather the semantic information. A special module makes it possible to avoid an explosion of the size of the database. It is responsible for forgetting certain instances and maintaining the database in such a way that the order in which the texts are introduced bears no influence.
Financial risk warning is a crucial technical issue for enterprises. Traditionally, it relied on modeling analysis of a single data type, which does not fully capture the diverse characteristics of financial risk activities. To address this, this paper introduces a multimodal deep learning framework driven by semantic analysis and image processing for intelligent warning of enterprise financial risks. Initially, natural language processing algorithms analyze textual data such as financial statements, news reports, and social media comments within the industry. Concurrently, two convolutional neural network models, M-CNN and M-RNN, extract features from images and chart data. These textual and visual feature representations are then fused to create a multimodal deep neural network framework. Extensive experimental evaluation and comparative analysis of the proposed framework were conducted. The results indicate that the financial risk rate of consumer fraud graph analysis varies significantly, displaying a fluctuating state with values ranging from 2.1% to 16.8%. Compared to other methods, the proposed approach demonstrates superior performance in financial risk warning tasks, with the risk rate increasing from 1.2% to 26.5% as it iterates from 1 to 6.
Tourism recommendation systems have tended to become popular in recent years. Due to the fact that tourism content is generally with the format of multimodal information, existing research works mostly ignored the fusion of various feature types. To deal with this issue, this paper resorts to multimodal fusion of semantic analysis and image comprehension, and proposes a novel deep learning-based recommender system for tourism routes. First, semantic analysis under tourism route search is conducted, in order to complete destination selection and process selection. Then, image comprehension of overall tourism route planning is conducted by establishing an end-to-end object recognition model. Finally, the previous two parts of characteristics are fused together to formulate an integrated recommender system with multimodal sensing ability. This thought is expected to bring a stronger ability for tourism route discovery. Empirically, operational efficiency and stability analysis are carried out on real-world data to evaluate the performance of the proposal. The experimental results show that it can achieve significant improvement in tourism route recommendation, can accurately capture user preferences, and can provide travel suggestions that meet user requirements.
Medical knowledge is disseminated and shared without any boundary due to the free and convenient sharing of medical documents. However, improper vocabulary and colors of medical documents tend to have an adverse impact on the perception and emotions of medical knowledge demanders. Hence, this paper develops a “Medical Documents Rewriting Model based on Medical Knowledge Demanders’ Feelings and Emotions”, and analyzes the provocative words and negative colors of medical articles. The words and colors of the target medical article are rewritten by calculating synonyms and suitable color codes. This paper also establishes a web-based medical documents rewriting system and conducts a case study to verify the feasibility of the model. The verification results show that when the system maintains about 600 medical documents, the average satisfaction score can be improved to 3.81 (76.2%). Hence, the developed system has a stable and high-performance level in medical documents rewriting. That is, this model and system can be applied to medical article sharing websites (e.g. A++ medicine and National Taiwan University Hospital), and the user’s negative emotion in reading medical documents can be reduced according to the medical article rewriting results. For medical knowledge demanders, the probability of obtaining medical knowledge with friendliness and quality can be enhanced.
We develop the default logic for pragmatic analysis of natural language queries. Ambiguous sentences are considered so that each set of meanings is assigned an extension of default system. The approach is applied to natural language question answering, where even a correct semantic representation needs to be modified in accordance to the set of default rules to better match a knowledge domain.
The recent years have been marked by a rapid growth in the World Wide Web 2.0 applications such as blog posts, forums, mailing lists, and product-review websites. As a result, a special sentiment analysis field has sprung up relevant to the issue of people's responses to the diversity of available subjects. Hence, one might well wonder: how do people feel and react when dealing with certain topics? In this paper, a new automatic sentiment-processing model has been advanced, whereby the current problems faced by the prevalent existing models can be deciphered and more properly treated. The suggested approach consists in developing a multi-agent system based on a thorough linguistic analysis, meanwhile highlighting the major contributions provided by such a study in combination with the syntactic, semantic, and subjective analyses. Actually, the newly-devised framework enables to resolve the ambiguities and complexities of the natural evaluative language and to strengthen, as well as consolidate, the results achieved at the various analysis stages thereof.
Many problems in NLP such as language translation and sentiment analysis have shown a lot of improvement in recent years. As simpler language problems are solved or better understood, the focus shifts to more complex problems such as semantic analysis and understanding. Unfortunately, a lot of studies in the literature suffer from a too much specificity problem. The algorithms and datasets are too domain specific. In this study, we analyze and elaborate on this notion of generality. Instead of selecting a highly specialized data set for semantic analysis, we take a generic and possibly dry data set, and we study how a plain vanilla Transformer performs in learning higher level semantic patterns beyond what was obvious or expected. We tune our Transformer model on a classic language task to ensure correct performance. Once tuned, the goal is to select sentences with specific key words and study whether higher level semantic patterns may have been learned by our model. We believe that we obtained promising results. The average BLEU score for sentences less than 25 words is equal to 39.79. Our initial qualitative analysis of possible semantic content of interest shows a 17 percent rate in finding interesting semantic patterns. We provide discussion of data driven results of unexpectedness as a measure of semantic learning.
A new approach to machine translation, capable of resolving different meanings of a verb in sentences of varying context, is described. The design revolves around the Verb Usage Frame (VUF) and the Noun Classification Hierarchy (NCH). VUF contains different context items which embody the different semantic usages of a verb under different contexts. The meaning of the verb is resolved through the classifications of its subject and object, achieved through the NCH. NCH returns not just the basic classification of a noun but also its super-classification. This allows thorough semantic analysis of both the verb and the noun. The entire design is implemented using object-oriented techniques and a prototype English-Japanese machine translator is built to illustrate the merits of the design.
Social platforms have become one of the major sources of unstructured text. Investigating the unstructured text and interpreting the meaning is a complex job. Sentiment Analysis is an emerging approach as the social platforms have lot of opinionated data.1 It uses language processing, classification of texts and linguistics to retrieve the opinions from the text. Twitter is a micro blogging site which is popular amongst the social users as it is a vast open data-platform and it witnesses lot of sentiments. Twitter Sentiment Analysis is a process of automatic mining of user tweets for opinions, emotions, attitude to derive useful insights into community opinions and classify the opinions as well. Due to the enormous increase in the number of collaborative tweets, it has become complex to identify the terms that carries sentiments. Also, the unstructured tweets may have non-relevant terms and reduce the classification accuracy. To address these issues, we propose a Social-Spider Lex Feature Ensemble Model-Based Syntactic-Senti Rule prediction Recurrent Neural Network Classifier (S2LFEM-S2RRNN) to obtain better classification accuracy. Twitter is used as source of data and we have extracted the tweets using Twitter API. Initially, data pre-processing is done to remove unwanted data, symbols and content terms are extracted to improvise the dataset. Then, the significant lexical content terms are extracted employing the proposed Social Spider Lex Feature Ensemble Model (S2LFEM) based on Syntactic-Senti Rule Prediction. The semantics4 of the terms are analysed on the verbs, subjectivity of the tweet patterns to count the overall weightage of tweets. Based on tweet weightage Recurrent Neural Network is trained to classify the tweets int to positive, negative and neutral. The experiment results show that the proposed classifier outperforms the existing models for sentiment classification in terms of accuracy with a performance score 94.1%.
Identifier lexicon may have a direct impact on software understandability and reusability and, thus, on the quality of the final software product. Understandability and reusability are two important characteristics of software quality. REpresentational State Transfer (REST) style is becoming a de facto standard adopted by software organizations to build their Web applications. Understandable and reusable Uniform Resource Identifers (URIs) are important to attract client developers of RESTful APIs because good URIs support the client developers to understand and reuse the APIs. Consequently, the use of proper lexicon in RESTful APIs has also a direct impact on the quality of Web applications that integrate these APIs. Linguistic antipatterns represent poor practices in the naming, documentation, and choice of identifiers in the APIs as opposed to linguistic patterns that represent the corresponding best practices. In this paper, we present the Semantic Analysis of RESTful APIs (SARA) approach that employs both syntactic and semantic analyses for the detection of linguistic patterns and antipatterns in RESTful APIs. We provide detailed definitions of 12 linguistic patterns and antipatterns and define and apply their detection algorithms on 18 widely-used RESTful APIs, including Facebook, Twitter, and Dropbox. Our detection results show that linguistic patterns and antipatterns do occur in major RESTful APIs in particular in the form of poor documentation practices. Those results also show that SARA can detect linguistic patterns and antipatterns with higher accuracy compared to its state-of-the-art approach — DOLAR.
In this paper, we present major issues in video parsing, abstraction, retrieval and semantic analysis. We discuss the success, the difficulties and the expectations in these areas. In addition, we identify important opened problems that can lead to more sophisticated ways of video content analysis.
For video parsing, we discuss topics in video partitioning, motion characterization and object segmentation. The success in video parsing, in general, will have a great impact on video representation and retrieval. We present three levels of abstracting video content by scene, keyframe and key object representations. These representation schemes in overall serve as a good start for video retrieval. We then describe visual features, in particular motion, and similarity measures adopted for retrieval. Next, we discuss the recent computational approaches in bridging the semantic gap for video content understanding.
This paper presents a new study on the application of the framework of Computational Media Aesthetics to the problem of automated understanding of film. Leveraging Film Grammar as the means to closing the "semantic gap" in media analysis, we examine film rhythm, a powerful narrative concept used to endow structure and form to the film compositionally and enhance its lyrical quality experientially. The novelty of this paper lies in the specification and investigation of the rhythmic elements that are present in two cinematic devices; namely motion and editing patterns, and their potential usefulness to automated content annotation and management systems. In our rhythm model, motion behavior is classified as being either nonexistent, fluid or staccato for a given shot. Shot neighborhoods in movies are then grouped by proportional makeup of these motion behavioral classes to yield seven high-level rhythmic arrangements that prove to be adept at indicating likely scene content (e.g. dialogue or chase sequence) in our experiments. The second part of our investigation presents a computational model to detect editing patterns as either metric, accelerated, decelerated or free. Details of the algorithm for the extraction of these classes are presented, along with experimental results on real movie data. We show with an investigation of combined rhythmic patterns that, while detailed content identification via rhythm types alone is not possible by virtue of the fact that film is not codified to this level in terms of rhythmic elements, analysis of the combined motion/editing rhythms can allow us to determine that the content has changed and hypothesize as to why this is so. We present three such categories of change and demonstrate their efficacy for capturing useful film elements (e.g. scene change precipitated by plot event), by providing data support from five motion pictures.
In the traditional sense, the translation evaluation of English complex long sentences is often limited to the idea of whether or how to realize the semantic transformation of the original text, so many phenomena that have nothing to do with language but directly affect the translation evaluation are not included in the field of vision and can be interpreted. In order to solve the above problems, a multi-label clustering algorithm is proposed to evaluate the translation accuracy of English complex long sentences. The multi-label clustering algorithm is introduced into the translation evaluation activities to carry out the translation and detection parameters of complex long sentences. The comprehensive description, the accuracy of generalization and the rationality of interpretation lay a solid foundation for English translation activities.
Microscopic imaging is one of the most common techniques for investigating biological systems. In recent years there has been a tremendous growth in the volume of biological imaging data owing to rapid advances in optical instrumentation, high-speed cameras and fluorescent probes. Powerful semantic analysis tools are required to exploit the full potential of the information content of these data. Semantic analysis of multi-modality imaging data, however, poses unique challenges. In this paper we outline the state-of-the-art in this area along with the challenges facing this domain. Information extraction from biological imaging data requires modeling at multiple levels of detail. While some applications require only quantitative analysis at the level of cells and subcellular objects, others require modeling of spatial and temporal changes associated with dynamic biological processes. Modeling of biological data at different levels of detail allows not only quantitative analysis but also the extraction of high-level semantics. Development of powerful image interpretation and semantic analysis tools has the potential to significantly help in understanding biological processes, which in turn will result in improvements in drug development and healthcare.
In this article, we survey different types of big data techniques such as Hadoop, NoSQL, and R and how semantic computing can be utilized to improve these methods. We explain the terms big data and semantic analysis. We discuss how big data analysis and semantic analysis are utilized in different domains such as healthcare and business.
In this paper, we want to review one of the challenging problems for the opinion mining task, which is sarcasm detection. To be able to do that, many researchers tried to explore such properties in sarcasm like theories of sarcasm, syntactical properties, psycholinguistic of sarcasm, lexical feature, semantic properties, etc. Studies conducted within last 15 years have not only made progress in semantic features but have also shown increasing amounts of methods of analysis using a machine-learning approach to process data. Therefore, this paper will try to explain the most currently used methods to detect sarcasm. Lastly, we will present a result of our finding, which might help other researchers to gain a better result in the future.
As more and more datasets become available, their utilization in different applications increases in popularity. Their volume and production rate, however, means that their quality and content control is in most cases non-existing, resulting in many datasets that contain inaccurate information of low quality. Especially, in the field of conversational assistants, where the datasets come from many heterogeneous sources with no quality assurance, the problem is aggravated. We present here an integrated platform that creates task- and topic-specific conversational datasets to be used for training conversational agents. The platform explores available conversational datasets, extracts information based on semantic similarity and relatedness, and applies a weight-based score function to rank the information based on its value for the specific task and topic. The finalized dataset can then be used for the training of an automated conversational assistance over accurate data of high quality.
Please login to be able to save your searches and receive alerts for new content matching your search criteria.