TechTalks from event: NAACL 2015

5B: Machine Translation

  • Morphological Modeling for Machine Translation of English-Iraqi Arabic Spoken Dialogs Authors: Katrin Kirchhoff, Yik-Cheung Tam, Colleen Richey, Wen Wang
    This paper addresses the problem of morphological modeling in statistical speech-to-speech translation for English to Iraqi Arabic. An analysis of user data from a real-time MT-based dialog system showed that generating correct verbal inflections is a key problem for this language pair. We approach this problem by enriching the training data with morphological information derived from source-side dependency parses. We analyze the performance of several parsers as well as the effect on different types of translation models. Our method achieves an improvement of more than a full BLEU point and a significant increase in verbal inflection accuracy; at the same time, it is computationally inexpensive and does not rely on target-language linguistic tools.
  • Continuous Adaptation to User Feedback for Statistical Machine Translation Authors: Frdric Blain, Fethi Bougares, Amir Hazem, Loc Barrault, Holger Schwenk
    This paper gives a detailed experiment feedback of different approaches to adapt a statistical machine translation system towards a targeted translation project, using only small amounts of parallel in-domain data. The experiments were performed by professional translators under realistic conditions of work using a computer assisted translation tool. We analyze the influence of these adaptations on the translator productivity and on the overall post-editing effort. We show that significant improvements can be obtained by using the presented adaptation techniques.
  • Normalized Word Embedding and Orthogonal Transform for Bilingual Word Translation Authors: Chao Xing, Dong Wang, Chao Liu, Yiye Lin
    Word embedding has been found to be highly powerful to translate words from one language to another by a simple linear transform. However, we found some inconsistence among the objective functions of the embedding and the transform learning, as well as the distance measuring. This paper proposes a solution which normalizes the word vectors on a hypersphere and constrains the linear transform as a orthogonal transform. The experimental results confirmed that the proposed solution can offer better performance on a word similarity task and an English-to-Spanish word translation task.
  • Fast and Accurate Preordering for SMT using Neural Networks Authors: Adri de Gispert, Gonzalo Iglesias, Bill Byrne
    We propose the use of neural networks to model source-side preordering for faster and better statistical machine translation. The neural network trains a logistic regression model to predict whether two sibling nodes of the source-side parse tree should be swapped in order to obtain a more monotonic parallel corpus, based on samples extracted from the word-aligned parallel corpus. For multiple language pairs and domains, we show that this yields the best reordering performance against other state-of-the-art techniques, resulting in improved translation quality and very fast decoding.
  • APRO: All-Pairs Ranking Optimization for MT Tuning Authors: Markus Dreyer and Yuanzhe Dong
    We present APRO, a new method for machine translation tuning that can handle large feature sets. As opposed to other popular methods (e.g., MERT, MIRA, PRO), which involve randomness and require multiple runs to obtain a reliable result, APRO gives the same result on any run, given initial feature weights. APRO follows the pairwise ranking approach of PRO (Hopkins and May, 2011), but instead of ranking a small sampled subset of pairs from the k- best list, APRO efficiently ranks all pairs. By obviating the need for manually determined sampling settings, we obtain more reliable results. APRO converges more quickly than PRO and gives similar or better translation results.