Skip main navigation

Cookies Notification

We use cookies on this site to enhance your user experience. By continuing to browse the site, you consent to the use of our cookies. Learn More
×

System Upgrade on Tue, May 28th, 2024 at 2am (EDT)

Existing users will be able to log into the site and access content. However, E-commerce and registration of new users may not be available for up to 12 hours.
For online purchase, please visit us again. Contact us at customercare@wspc.com for any enquiries.

SEARCH GUIDE  Download Search Tip PDF File

  • articleOpen Access

    A Resource-Sharing Method for College English Translation Corpora Under the Background of Informatization

    In today’s information age, the ability of intelligent sharing and scheduling of college English translation corpus resources needs to be improved. Therefore, a method based on fuzzy autocorrelation statistical feature analysis is proposed. First of all, a model must be constructed to detect the semantically relevant dimensional features of college English translation corpus resources under the background of informatization, and to analyze the essential attributes of translation activities by using the hierarchical parameter detection method of translated texts in the narrative structure. Then, a quantitative difference coverage model of word clusters of different lengths is established, with lexical attribute extraction and statistical examination of these resources being performed via a similarity attribute extraction technique for high-frequency word clusters. Subsequently, a semantic dynamic attribute analysis model is developed to derive statistical attributes of college English translation corpus resources within the informatized context. Ultimately, based on the obtained attribute extraction results, a fuzzy autocorrelation statistical attribute analysis method is employed for clustering large datasets. Furthermore, an intelligent particle swarm optimization algorithm is implemented to extract and disseminate lexical attributes of college English translation corpus resources within the information-driven context, so that the college English translation corpus resources can be optimized under the information background. According to the simulation results, this method has excellent accuracy in extracting and sharing lexical features of translated texts, and its feature discrimination ability is also good. It can indeed improve the ability of extracting, sharing, and detecting lexical features of translated texts from college English translation corpus resources.

  • articleOpen Access

    IMPROVING THE INTER-CORPORA COMPATIBILITY FOR PROTEIN ANNOTATIONS

    Although there are several corpora with protein annotation, incompatibility between the annotations in different corpora remains a problem that hinders the progress of automatic recognition of protein names in biomedical literature. Here, we report on our efforts to find a solution to the incompatibility issue, and to improve the compatibility between two representative protein-annotated corpora: the GENIA corpus and the GENETAG corpus. In a comparative study, we improve our insight into the two corpora, and a series of experimental results show that most of the incompatibility can be removed.

  • articleNo Access

    MULTIMODAL COMPLEX EMOTIONS: GESTURE EXPRESSIVITY AND BLENDED FACIAL EXPRESSIONS

    One of the challenges of designing virtual humans is the definition of appropriate models of the relation between realistic emotions and the coordination of behaviors in several modalities. In this paper, we present the annotation, representation and modeling of multimodal visual behaviors occurring during complex emotions. We illustrate our work using a corpus of TV interviews. This corpus has been annotated at several levels of information: communicative acts, emotion labels, and multimodal signs. We have defined a copy-synthesis approach to drive an Embodied Conversational Agent from these different levels of information. The second part of our paper focuses on a model of complex (superposition and masking of) emotions in facial expressions of the agent. We explain how the complementary aspects of our work on corpus and computational model is used to specify complex emotional behaviors.

  • articleNo Access

    Multi-Stream Graph Convolutional Networks for Text Classification via Representative-Word Document Mining

    Recently, graph convolutional networks (GCNs) for text classification have received considerable attention in natural language processing. However, most current methods just use original documents and words in the corpus to construct the topology of graph which may lose some effective information. In this paper, we propose a Multi-Stream Graph Convolutional Network (MS-GCN) for text classification via Representative-Word Document (RWD) mining, which is implemented in PyTorch. In the proposed method, we first introduce temporary labels and mine the RWDs which are treated as additional documents in the corpus. Then, we build a heterogeneous graph based on relations among a Group of RWDs (GRWDs), words and original documents. Furthermore, we construct the MS-GCN based on multiple heterogeneous graphs according to different GRWDs. Finally, we optimize our MS-GCN model through updated mechanism of GRWDs. We evaluate the proposed approach on six text classification datasets, 20NG, R8, R52, Ohsumed, MR and Pheme. Extensive experiments on these datasets show that our proposed approach outperforms state-of-the-art methods for text classification.

  • articleNo Access

    Collocation, Semantic Prosody, and Near-synonymy: The HELP Verbs in Mandarin Chinese

    In this study, our goal was to find out how the verbs 幫忙 bāngmáng and 幫助 bāngzhù differ in usage. In particular, we aimed to determine the characteristics of the co-occurrences for the two verbs to investigate whether the verb 幫忙 bāngmáng typically takes an event as an object with positive or neutral collocates and whether the verb 幫助 bāngzhù typically takes a person/an organization as an object with negative collocates. Data for the two synonymous verbs were collected from the Chinese GigaWord Corpus using Chinese Word Sketch. As we predicted, our corpus data showed that 幫忙 bāngmáng typically took an event as an object and 幫助 bāngzhù typically took a person/an organization as an object, and that the verb 幫助 bāngzhù was more often linked with lexical items that had negative meanings. Based on the results presented, this study has practical implications for second language acquisition, translation, and dictionary compiling.

  • chapterNo Access

    FROM TEXT TO PATHWAY: CORPUS ANNOTATION FOR KNOWLEDGE ACQUISITION FROM BIOMEDICAL LITERATURE

    We present a new direction of research, which deploys Text Mining technologies to construct and maintain data bases organized in the form of pathway, by associating parts of papers with relevant portions of a pathway and vice versa. In order to materialize this scenario, we present two annotated corpora. The first, Event Annotation, identifies the spans of text in which biological events are reported, while the other, Pathway Annotation, associates portions of papers with specific parts in a pathway.

  • chapterNo Access

    Pattern Based Bootstrapping Method for Named Entity Recognition

    This paper reports about the development of a Named Entity Recognition (NER) system in Bengali. A pattern directed bootstrapping method has been used to develop the NER system from a tagged Bengali news corpus, developed from the web. Different tags of the tagged news corpus help to identify the seed data in the system. The training corpus is initially tagged against the different seed data and a lexical contextual seed pattern is generated for each tag. The entire training corpus is shallow parsed to identify the occurrence of these initial seed patterns and further patterns are generated through bootstrapping. Patterns that occur in the entire training corpus above a certain threshold frequency are considered as the final set of patterns learnt from the training corpus. The test corpus is shallow parsed to identify the occurrence of these patterns and estimate the named entities. System has been tested with four manually tagged Bengali news corpus (Gold Standard Test Sets) and it has demonstrated the highest Recall, Precision and F-Score values of 63.3%, 84.8% and 73.2% respectively.

  • chapterNo Access

    HMM Based POS Tagger and Rule-Based Chunker for Bengali

    The present work describes a Part of Speech (POS) tagger based on the Hidden Markov Model and a rule-based chunker for Bengali. The POS tagger has been trained on a manually tagged corpus and it has demonstrated 87.26% accuracy. The performance of the system has been compared with the popular language independent TnT tagger with the same training and test sets. TnT tagger has demonstrated 85.85% accuracy. The chunker for Bengali has been developed using rule-based approach since adequate training data was not available. The rule-based chunker has been tested on the same test set and demonstrated 97.52% accuracy in chunk boundary identification only and 96.9% accuracy in chunk boundary identification and chunk labelling.