EPSRC Reference: |
EP/S001271/1 |
Title: |
MTStretch: Low-resource Machine Translation |
Principal Investigator: |
Birch, Dr A |
Other Investigators: |
|
Researcher Co-Investigators: |
|
Project Partners: |
|
Department: |
Sch of Informatics |
Organisation: |
University of Edinburgh |
Scheme: |
EPSRC Fellowship - NHFP |
Starts: |
29 June 2018 |
Ends: |
28 December 2021 |
Value (£): |
517,456
|
EPSRC Research Topic Classifications: |
Artificial Intelligence |
Computational Linguistics |
|
EPSRC Industrial Sector Classifications: |
Creative Industries |
Information Technologies |
|
Related Grants: |
|
Panel History: |
|
Summary on Grant Application Form |
Neural machine translation (NMT) has recently made major advances in translation quality and this technology has been rapidly adopted by industry leaders, such as Google and Amazon, and international organisations, such as the UN and the EU. However, high performing neural models require many millions of human translated sentences for training. For many real-world applications, there is not enough data to build useful MT systems. In this project I plan to stretch the resources and capabilities that we have, in order to develop robust MT technologies which are capable of being deployed for low-resource language pairs and for highly specialised low-resource domains.
I will investigate making translation significantly more robust by using the intuition that translated (or parallel) corpora contain enormous redundancies, and are an inefficient way to learn to translate. Inspired by human learning, we will study Bayesian models which build up meaning compositionally and are able to learn to learn, thus creating models which only need a few training examples. We will also develop machine learning techniques, such as transfer learning and data augmentation, to extract knowledge from monolingual and parallel resources from other languages and domains. This proposal combines fundamental research in rapid deep learning with lower-risk data-driven machine learning research in order to deliver useful products to our industry partners.
My team will provide translations for language pairs which were not previously well served by automatic machine translation. This will allow our partners, BBC World Service and BBC Monitoring, to cover under-resourced languages. Building on an existing scalable platform, created within the EU project called Scalable Understanding of Multilingual MediA (SUMMA), we can already deploy multilingual capabilities in the newsroom. The innovation fellowship will contribute to the commercialisation and sustainability of SUMMA translation components, but crucially it will allow us to cover a wider range of topical and strategic languages. Access to a high-quality translation platform for low-resource languages will help the BBC deliver impartial reporting across the world. Collaboration with our industry partner Quorate, will demonstrate the commercial potential of our research in the highly specialised domain of financial trading.
In the long term, this project will have a wider impact on British industry by breaking down language barriers affecting international trade, and by significantly improving the quality and resilience of transformative AI language technologies.
|
Key Findings |
This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk
|
Potential use in non-academic contexts |
This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk
|
Impacts |
Description |
This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk |
Summary |
|
Date Materialised |
|
|
Sectors submitted by the Researcher |
This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk
|
Project URL: |
|
Further Information: |
|
Organisation Website: |
http://www.ed.ac.uk |