EPSRC logo

Details of Grant 

EPSRC Reference: EP/L027623/1
Title: Improving Target Language Fluency in Statistical Machine Translation
Principal Investigator: Byrne, Professor WJ
Other Investigators:
Researcher Co-Investigators:
Project Partners:
Department: Engineering
Organisation: University of Cambridge
Scheme: Standard Research
Starts: 15 October 2014 Ends: 30 June 2018 Value (£): 390,378
EPSRC Research Topic Classifications:
Artificial Intelligence Comput./Corpus Linguistics
EPSRC Industrial Sector Classifications:
No relevance to Underpinning Sectors
Related Grants:
Panel History:
Panel DatePanel NameOutcome
09 Apr 2014 EPSRC ICT Responsive Mode - Apr 2014 Announced
Summary on Grant Application Form
Recent years have seen great improvement in the quality of statistical machine translation (SMT). Automatic translation has benefitted from increasing amounts of monolingual and translated data, from advancements in core modelling algorithms, and from a growing understanding of how best to integrate automatic translation into large-scale language processing systems. Despite these improvements, even the best SMT output is rarely of human quality. Any casual inspection of MT output will quickly find syntactic and semantic errors that only a machine would make. New modelling techniques, capable of extracting the best possible models from all available data, are needed.

This proposal aims to overcome one of technical barriers to delivering 'human quality' statistical machine translation (SMT): the production of grammatical output. We propose here to use multiple grammars in SMT. One grammar is focused on translation of the source language, as in current practice. The second grammar is focused on production, with the aim of producing fluent and grammatical sentences in the target language. We will develop a decoding framework in which translation and production are closely linked but independent processes driven by these two grammars. Our systems will be based on state-of-the-art syntactic SMT, and our aim will be to dramatically improve the fluency of the translation output, particularly in situations where the original source language text is noisy and difficult to translate fluently.

This work will be of value to UK industry. The UK translation and interpretation market was estimated at EURO 290M - EURO 434M in 2009, and UK localisation and language service providers are strong competitors in the worldwide language industry, forecast to grow to EURO 16B by 2015. Reducing the cost of high-quality translation is a concern for this industry which we will address directly, in that improving target language fluency is a key factor in translation post-editing efficiency.

In academia, SMT systems are now used to build systems incorporating speech recognition, speech synthesis, and dialogue systems. Researchers at Edinburgh University, Heriot-Watt University, Oxford University, Sheffield University are among universities with groups working on these problems. Our project will enable SMT researchers to apply their expertise in translation grammar induction, large-scale language modelling, and parameterisation to target language production.

Motivated by these needs, our research hypotheses are that: (1) modelling techniques from syntax-based SMT can be used to build stochastic production systems; (2) production quality can be improved using 'Big Data' and machine learning statistical modelling techniques; and (3) target language production systems can be integrated into syntax-based statistical machine translation systems using risk-based decoding procedures, yielding improvements in translation quality, robustness, and fluency. The novelty in this proposal is in: (1) the use of separate grammars for syntax-based statistical machine translation, one grammar for translation and a second for production; (2) coupling them into a risk-based consensus decoding procedure; (3) incorporation of phrase-based production grammars and search procedures; (4) an explicit focus on fluency.

Our research will yield new models and algorithms in the form of open source software and systems. We take the view that the best pathway to economic impact for this type of research is by: publishing research results; releasing software and data under generous Open Source licenses for unconstrained use by industry; and by training students and PDRAs who can take their skills and knowledge from the university to industry. We believe this is the broadest and surest way to enhance the research capacity, knowledge and skills of businesses and organisations. All results of this research project will be distributed in the public domain.

Key Findings
This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk
Potential use in non-academic contexts
This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk
Impacts
Description This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk
Summary
Date Materialised
Sectors submitted by the Researcher
This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk
Project URL:  
Further Information:  
Organisation Website: http://www.cam.ac.uk