EPSRC logo

Details of Grant 

EPSRC Reference: EP/L010291/1
Title: Adaptive Context-Dependent Machine Translation for Heterogeneous Text
Principal Investigator: Cohn, Dr TA
Other Investigators:
Researcher Co-Investigators:
Project Partners:
Alpha CRC Ltd Carnegie Mellon University Microsoft
Department: Computer Science
Organisation: University of Sheffield
Scheme: EPSRC Fellowship
Starts: 01 October 2013 Ends: 30 September 2018 Value (£): 907,946
EPSRC Research Topic Classifications:
Artificial Intelligence Comput./Corpus Linguistics
EPSRC Industrial Sector Classifications:
No relevance to Underpinning Sectors
Related Grants:
Panel History:
Panel DatePanel NameOutcome
17 Jul 2013 EPSRC ICT Responsive Mode - July 2013 Announced
03 Sep 2013 ICT Fellowships Interviews Meeting - Sept 13 Announced
Summary on Grant Application Form
While automatic machine translation technologies are undoubtedly

useful to a wide range of users, they have many shortcomings. Notably

they often produce incoherent outputs when translating many types of

input text, e.g., medical texts, literature, or even conversational

text. This project aims to develop new machine translation (MT) systems

which can be more efficiently adapted to new domains and text styles,

and handle heterogeneous mixed-domain inputs. This is framed as a

multi-task machine learning problem in which a collection of

domain-specific translation systems are learned jointly, leveraging

correlations between related domains. This approach will help to

reduce the big data requirements of current translation systems, while

also improving translation quality across a wide range of different

language pairs and application domains.

Existing research has tended to focus on a narrow interpretation of

adaptability, specifically the idea of domain adaptation in which

there is a single target domain and the challenge is how to produce

good translations by using parallel data drawn from other

domains. This project will address the more general setting where

there can be many target domains, or the testing domain is not known

in advance. This is a considerably more challenging and eminently more

useful setting than the single target domain used in the

domain-adaptation literature, improving overall translation quality

and facilitating portability to new language pairs and new domains.

This work will create novel and innovative new evaluation resources,

to supplement the standard evaluation setting which uses text from

only one or two domains. This project will create a new

comprehensive evaluation set covering a wide range of topics, drawn

from many different media sources, including user-generated content

from blogs and wikis, and over multiple challenging language

pairs. This evaluation set will highlight the short-comings of

existing machine translation research in terms of handling

heterogeneous inputs and challenging translation domains, and

contribute a critically important dataset to the research community.

Key Findings
This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk
Potential use in non-academic contexts
This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk
Impacts
Description This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk
Summary
Date Materialised
Sectors submitted by the Researcher
This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk
Project URL:  
Further Information:  
Organisation Website: http://www.shef.ac.uk