Computing Reviews

Syntactic discriminative language model rerankers for statistical machine translation
Carter S., Monz C. Machine Translation25(4):317-339,2011.Type:Article
Date Reviewed: 04/05/12

Machine translation technology has reached a level of maturity that allows for its use in a variety of applications. However, in many cases, the quality of machine translation output still leaves a lot to be desired. In the case of statistical machine translation (SMT) systems, better translations can often be found in the system’s n-best list, ranked below the translation chosen for presentation to the user. Carter and Monz investigate approaches that attempt to discover such better translations by discriminative reranking methods. These methods draw on research on probabilistic parsing inspired by earlier speech recognition research.

The paper compares reranking methods based on n-gram language models with an approach developed by the authors that uses shallow syntactic features, such as part-of-speech tags, as well as deeper features extracted from parse trees. The perceptron learning algorithm is used to optimize the weights for these features on training data obtained from the n-best lists, assuming an oracle that chooses the best translation according to the bilingual evaluation understudy (BLEU) metric. The authors perform an analysis of usefulness of a parser in discriminating between SMT and human-produced output, in addition to the analysis of the performance of the proposed model on the reranking task.

The analysis is facilitated by the existence of a number of open-source tools for SMT, part-of-speech tagging, and language modeling. The authors conclude that the use of syntactic features leads to an improvement, albeit a modest one. Given the size of the parameter space and the large number of possible representation variants, one can probably expect to see other studies in this area in the near future.

Reviewer:  Saturnino Luz Review #: CR140039 (1208-0852)

Reproduction in whole or in part without permission is prohibited.   Copyright 2024 ComputingReviews.com™
Terms of Use
| Privacy Policy