Please login to be able to save your searches and receive alerts for new content matching your search criteria.
Word ordering is among the most important problems in machine translation. In this paper, we describe a general approach to solve this problem in English-Vietnamese- English statistical machine translation. Our model automatically extracts short-range and long-range reordering rules based on part-of-speech tags and alignment information. Our method, therefore, covers both local and global word order, and is more versatile than other methods. To obtain a better set of reordering rules, we omit generated rules if their weight is lower than a threshold α. The experimental results have shown that the translation quality has been improved significantly compared to the distance-based reordering model and comparable to the lexicalized model. Our approach is not only suitable for English-Vietnamese but also for language pairs which have many differences in syntax, such as English-Chinese and Chinese-Vietnamese.
In this paper, we propose a mathematical framework for studying word order optimization. The framework relies on the well-known positive correlation between cognitive cost and the Euclidean distance between the elements (e.g. words) involved in a syntactic link. We study the conditions under which a certain word order is more economical than an alternative word order by proposing a mathematical approach. We apply our methodology to two different cases: (a) the ordering of subject (S), verb (V) and object (O), and (b) the covering of a root word by a syntactic link. For the former, we find that SVO and its symmetric, OVS, are more economical than OVS, SOV, VOS and VSO at least 2/3 of the time. For the latter, we find that uncovering the root word is more economical than covering it at least 1/2 of the time. With the help of our framework, one can explain some Greenbergian universals. Our findings provide further theoretical support for the hypothesis that the limited resources of the brain introduce biases toward certain word orders. Our theoretical findings could inspire or illuminate future psycholinguistics or corpus linguistics studies.
This is a reply to Ramon Ferrer-I-Cancho's paper in this issue "Some Word Order Biases from Limited Brain Resources: A Mathematical Approach." In this reply, I challenge the Euclidean distance model proposed in that paper by proposing a simple alternative model based on linear ordering.
This article is a critical analysis of Michael Cysouw's comment "Linear Order as a Predictor of Word Order Regularities."