Please login to be able to save your searches and receive alerts for new content matching your search criteria.
One unique challenge in Chinese Language Processing is cross-strait named entity recognition. Due to the adoption of different transliteration strategies, foreign name transliterations can vary greatly between the PRC and Taiwan, creating difficulties in NLP tasks including data mining, translation and information retrieval. In this paper, we introduce a novel approach to automatic extraction of divergent transliterations of foreign named entities that bootstraps co-occurrence statistics from tagged Chinese corpora, thereby producing higher precision.