Transliterated Named Entity Recognition Based on Chinese Word Sketch
Abstract
One unique challenge in Chinese Language Processing is cross-strait named entity recognition. Due to the adoption of different transliteration strategies, foreign name transliterations can vary greatly between the PRC and Taiwan, creating difficulties in NLP tasks including data mining, translation and information retrieval. In this paper, we introduce a novel approach to automatic extraction of divergent transliterations of foreign named entities that bootstraps co-occurrence statistics from tagged Chinese corpora, thereby producing higher precision.