Please login to be able to save your searches and receive alerts for new content matching your search criteria.
Question answering aims at computing the answer to a question given a context with facts. Many proposals focus on questions whose answer is explicit in the context; lately, there has been an increasing interest in questions whose answer is not explicit and requires multi-hop inference to be computed. Our analysis of the literature reveals that there is a seminal proposal with increasingly complex follow-ups. Unfortunately, they were presented without an extensive study of their hyper-parameters, the experimental studies focused exclusively on English, and no statistical analysis to sustain the conclusions was ever performed. In this paper, we report on our experience devising a very simple neural approach to address the problem, on our extensive grid search over the space of hyper-parameters, on the results attained with English, Spanish, Hindi, and Portuguese, and sustain our conclusions with statistically sound analyses. Our findings prove that it is possible to beat many of the proposals in the literature with a very simple approach that was likely overlooked due to the difficulty to perform an extensive grid search, that the language does not have a statistically significant impact on the results, and that the empirical differences found among some existing proposals are not statistically significant.
As part of the task of automated question answering from a large collection of text documents, the reduction of the search space to a smaller set of document passages that are actually searched for answers constitutes a difficult but rewarding research issue. We propose a set of precision-enhancing filters for passage retrieval based on semantic constraints detected in the submitted questions. The approach improves the performance of the underlying question answering system in terms of both answer accuracy and time performance.
Visual Question Answering (VQA) is one of the attractive topics in the field of multimedia, affective, and empathic computing to garner user interest. Unlike existing models which aim at addressing challenges of VQA for the scene images, this work aims at developing a new model for Personality Trait Question Answering (PQA). It uses Twitter account information, which includes shared images, profile pictures, banners, text in the images, and descriptions of the images. Motivated by the accomplishments of the transformer, for encoding visual features of the images, a new InfoGain Multi-Axial Wavelet Vision Transformer (IgMaWaViT) is explored here. For encoding textual features in the images and descriptions, a new Information Gain BERT (InfoBert) method is introduced, which can handle the variable length encoding of text by choosing the optimal discriminator. Furthermore, the model fuses encodings of images and text according to the questions on different personality traits for question answering. The model is called InfoGain Multi-Axial Wavelet Vision Transformer for Personality Traits Question Answering (IgMaWaViT-PQA). To validate the efficacy of the proposed model, a dataset has been constructed, and it is used along with standard datasets for experimentation. Comprehensive experiments show that the proposed model is better than the state-of-the-art models. The code is available at the link: https://github.com/biswaskunal29/InfoGain_MultiAxial_PQA.
Scoring of subjective question answering costs much human labor in realistic applications. To deal with this issue, we propose to leverage the powerful semantic comprehension capability of the transformer-Bidirectional Encoder Representations from Transformers (BERT)-integrated model, and propose a large language model-based autonomous scoring method for subjective question-answering in the English context. First, a pretrained BERT structure is used to extract features from question sentences, obtaining vector representations of each word. Then, these vectors are input into the transformer encoder to obtain the semantic representation for the whole sentence. Next, the semantic content of subjective questions based on the context can be realized. Finally, we divide the student’s answers into different scoring intervals to obtain the final rating result. Besides, we compared the scoring results with other methods and used the Kaggle competition published dataset, NLPCC DBQA dataset, Text Retrieval Conference QA (TrecQA) dataset, and WikiQA dataset. The experimental results show that our proposal performs well in subjective English question scoring. Compared with traditional manual scoring, this method has achieved significant improvements in scoring efficiency and accuracy. In addition, we also conducted comparative experiments to demonstrate the effectiveness and applicability of the proposal. Through the powerful capabilities of the transformer-BERT-integrated model, the proposed model can accurately capture students’ writing ability and English proficiency.
The paper addresses the limitations of traditional evaluation metrics for Question Answering (QA) systems that primarily focus on syntax and n-gram similarity. We propose a novel model-based evaluation metric, MQA-metric, and create a human-judgment-based dataset, squad-qametric and marco-qametric, to validate our approach. The research aims to solve several key problems: the objectivity in dataset labeling, the effectiveness of metrics when there is no syntax similarity, the impact of answer length on metric performance, and the influence of real answer quality on metric results. To tackle these challenges, we designed an interface for dataset labeling and conducted extensive experiments with human reviewers. Our analysis shows that the MQA-metric outperforms traditional metrics like BLEU, ROUGE and METEOR. Unlike existing metrics, MQA-metric leverages semantic comprehension through large language models (LLMs), enabling it to capture contextual nuances and synonymous expressions more effectively. This approach sets a standard for evaluating QA systems by prioritizing semantic accuracy over surface-level similarities. The proposed metric correlates better with human judgment, making it a more reliable tool for evaluating QA systems. Our contributions include the development of a robust evaluation workflow, creation of high-quality datasets, and an extensive comparison with existing evaluation methods. The results indicate that our model-based approach provides a significant improvement in assessing the quality of QA systems, which is crucial for their practical application and trustworthiness.
We develop the default logic for pragmatic analysis of natural language queries. Ambiguous sentences are considered so that each set of meanings is assigned an extension of default system. The approach is applied to natural language question answering, where even a correct semantic representation needs to be modified in accordance to the set of default rules to better match a knowledge domain.
Recently, Question Answering has been a hot topic in the research of information retrieval. Question Classification plays a critical role in most Question Answering systems. In this paper, a new approach to classifying questions using Profile Hidden Markov Models (PHMMs) is proposed. The generalization strategies to extract the pattern instances of questions by selective substitution are discussed. Then the classification method with pattern instances' structural features is investigated. Experimental results show that the PHMM based question classifier can reach the accuracy of 92.2% and significantly outperforms most of the state-of-the-art systems.
In the last decades, several research areas experienced key improvements due to the appearance of numerous tools made available to the scientific community. For instance, Moses plays an important role in recent developments in machine translation and Lucene is, with no doubt, a widespread tool in information retrieval. The existence of these systems allows an easy development of baselines and, therefore, researchers can focus on improving preliminary results, instead of spending time in developing software from scratch. In addition, the existence of appropriate test collections leads to a straightforward comparison of systems and of their specific components.
In this paper we describe Just.Ask, a multi-pronged approach to open-domain question answering. Just.Ask combines rule- with machine learning-based components and implements several state-of-the-art strategies in question answering. Also, it has a flexible architecture that allows for further extensions. Moreover, in this paper we report a detailed evaluation of each one of Just.Ask components. The evaluation is split into two parts: in the first one, we use a manually built test collection — the GoldWebQA — that intends to evaluate Just.Ask performance when the information source in use is the Web, without having to deal with its constant changes; in the second one, we use a set of questions gathered from the TREC evaluation forum, having a closed text collection, locally indexed and stored, as information source. Therefore, this paper contributes with a benchmark for research on question answering, since both Just.Ask and the GoldWebQA corpus are freely available for the scientific community.
We describe a system that performs semantic Question Answering based on the combination of classic Information Retrieval methods with semantic ones. First, we use a search engine to gather web pages and then apply a noun phrase extractor to extract all the candidate answer entities from them. Candidate entities are ranked using a linear combination of two IR measures to pick the most relevant ones. For each one of the top ranked candidate entities we find the corresponding Wikipedia page. We then propose a novel way to exploit Semantic Information contained in the structure of Wikipedia. A vector is built for every entity from Wikipedia category names by splitting and lemmatizing the words that form them. These vectors maintain Semantic Information in the sense that we are given the ability to measure semantic closeness between the entities. Based on this, we apply an intelligent clustering method to the candidate entities and show that candidate entities in the biggest cluster are the most semantically related to the ideal answers to the query. Results on the topics of the TREC 2009 Related Entity Finding task dataset show promising performance.
Question Answering is most likely one of the toughest tasks in the field of Natural Language Processing. It aims at directly returning accurate and short answers to questions asked by users in human language over a huge collection of documents or database. Recently, the continuously exponential rise of digital information has imposed the need for more direct access to relevant answers. Thus, question answering has been the subject of a widespread attention and has been extensively explored over the last few years. Retrieving passages remains a crucial but also a challenging task in question answering. Although there has been an abundance of work on this task, this latter still implies non-trivial endeavor. In this paper, we propose an ad-hoc passage retrieval approach for Question Answering using n-grams. This approach relies on a new measure of similarity between a passage and a question for the extraction and ranking of the different passages based on n-gram overlapping. More concretely, our measure is based on the dependency degree of n-gram words of the question in the passage. We validate our approach by the development of the “SysPex” system that automatically returns the most relevant passages to a given question.
This paper briefly describes Toshiba Knowledge Media Laboratory's recent research efforts for effective information retrieval and access. Firstly, I will mention the main research topics that are being tackled by our information access group, including document retrieval, speech-input/multimedia question answering, and evaluation metrics. Secondly, I will focus on the problem of cross-language information retrieval and access, and describe a system called BRIDJE (Bi-directional Retriever/Information Distiller for Japanese and English), which achieved many gold-medal performances at the recent NTCIR (a.k.a. "Asian TREC") workshop. Finally, I will conclude the paper by mentioning some unsolved problems and suggesting possible directions for future Information Access research.
Equipment fault information typically exhibits the characteristics of fragmentation and diversified structure. The existing fault diagnosis methods are incapable of fully exploiting the prior knowledge and expert knowledge within the field, and the diagnosis results are overly one-sided. Given that it is challenging to obtain effective fault diagnostic knowledge for complex equipment and complex faults, this paper proposes the application of the domain knowledge graph (KG) for fault diagnosis. Different from the existing fault diagnosis methods involving the KG, our framework consists of two major parts. The first part is to construct an equipment fault KG from semi-structured and unstructured text. We further enrich the graph knowledge through knowledge completion, which can furnish high-quality knowledge sources for downstream applications. The second part is to employ the built fault KG either online or offline for fault diagnosis. We offer two approaches: the deep learning plus KG approach; the question-answering approach. The former not only guarantees the diagnostic accuracy but also provides more comprehensive diagnostic information. This constitutes the online utilization of the KG. The latter realizes the offline use of the KG, providing users with a natural and user-friendly manner to retrieve fault diagnosis information. We demonstrate and verify our proposed framework in the context of the bearing fault diagnosis.
In this paper, we present results of employing DBpedia and YAGO as lexical databases for answering questions formulated in the natural language. The proposed solution has been evaluated for answering class 1 and class 2 questions (out of 5 classes defined by Moldovan for TREC conference). Our method uses dependency trees generated from the user query. The trees are browsed for paths leading from the root of the tree to the question subject. We call those paths fibers and they represent the user intention. The question analysis consists of three stages: query analysis, query breakdown and information retrieval. The aim of those stages is the detection of the entities of interest and its attributes, indicating the users’ domain of interest. The user query is then converted into a SPARQL query and sent to the DBpedia and YAGO databases. The baseline and the extended methods are presented and the quality of the solution is evaluated and discussed.
In this paper, we proposed a novel approach to improve the performance of multiple choice question answering (MCQA) system using distributed semantic similarity and classification approach. We mainly focus on science-based MCQ which is really difficult to handle. Our proposed method is based on the hypothesis that the relation between question and answer of that question will be high in distributional semantic model rather than other options of that question. We are using IJCNLP shared Task 5 and SciQ dataset for our experiments. We have built three Models (i.e., Model 1, Model 2, Model 3) based on the dataset format. The basic difference between IJCNLP Task 5 and SciQ datasets is that SciQ dataset contains supporting text with questions whereas IJCNLP Task 5 dataset does not contain supporting text. Model 1 and Model 2 are mainly built to deal with IJCNLP Task 5 dataset whereas Model 3 is mainly built for SciQ dataset. Model 2 is mainly built to deal with the dependencies between options (i.e., all of these, two of them, none of them) whereas Model 1 is the basic model for MCQA and it cannot capture the dependencies between options. We also compare the result of SciQ dataset with supporting text (i.e., using Model 3) and without supporting text (i.e., using Model 1). We also compared our system with other existing methods. Though in some cases the performance of our proposed method is not satisfactory, we have noted that our submission is simple and robust that allows it to be more easily integrated into complex applications. This work investigates different techniques for choosing the correct answer of a given question in MCQA system. These experiments may therefore be useful to improve the performance of current science-based question answering (QA) systems. For IJCNLP Task 5 dataset, we achieved 44.5% using Model 2 and PubMed Dataset. Similarly for SciQ dataset we achieved 82.25% using Model 3 and PubMed dataset.