EPSRC logo

Details of Grant 

EPSRC Reference: EP/K036580/1
Title: Bayesian Models of Grammar Induction and Translation
Principal Investigator: Blunsom, Dr P
Other Investigators:
Researcher Co-Investigators:
Project Partners:
Department: Computer Science
Organisation: University of Oxford
Scheme: EPSRC Fellowship
Starts: 31 March 2014 Ends: 07 June 2019 Value (£): 939,908
EPSRC Research Topic Classifications:
Artificial Intelligence Comput./Corpus Linguistics
Psychology Statistics & Appl. Probability
EPSRC Industrial Sector Classifications:
No relevance to Underpinning Sectors
Related Grants:
Panel History:
Panel DatePanel NameOutcome
17 Jul 2013 EPSRC ICT Responsive Mode - July 2013 Announced
03 Sep 2013 ICT Fellowships Interviews Meeting - Sept 13 Announced
Summary on Grant Application Form
The processes by which humans learn the rules that govern what is and is not a valid sentence in a language constitute an enduring theme of linguistic research. The development of computational models able to reproduce these processess holds great promise for both increasing our understanding of how children learn languages, and the development of advanced language technologies for processing online data.

The fields of Computational Linguistics and Machine Learning seek to provide the technologies necessary to enable people to interact seamlessly with the vast quantities of multilingual text published each day on the world wide web. Core amongst these technologies are those that assign syntactic structure to text (parsing) and automatically translate between languages (machine translation). Traditionally researchers have relied upon supervised machine learning techniques to build their systems, first annotating data by hand with the desired output of the system, then training the system to replicate and generalise from these annotations on new data. However this process of hand annotation is both time consuming and expensive, and as a result such data only exists for dominant languages (e.g. English).

The research programme set out for this fellowship will provide a solution to the problem of obtaining syntactic analyses for large quantities of real world language data. The overarching aim of this project is to develop large scale and language independent algorithms for learning syntactic structure from unannotated text using techniques from non-parametric Bayesian probability. In tandem a new syntactic model of machine translation will be developed to evaluate and validate these algorithms.

The project will consist of two major components:

1. Develop scalable models of unsupervised grammar induction suitable for use with languages exhibiting a wide range of morphological and syntactic phenomena.

2. Develop a syntactic model of translation based upon the analyses produced by the unsupervised model. This system will be composed of a source reordering model which reorders the input conditioned on the induced syntactic structure, and a phrasal translation model which maps the reordered source to a translation.

The specific scientific contributions of this project are:

1. The first accurate large scale grammar induction algorithms applicable to a wide range of language processing tasks.

2. New advanced machine learning algorithms for latent variable induction and approximate inference within Bayesian non-parametric models.

3. An investigation of the cognitive implications of the developed grammar induction algorithms, which are considerably more powerful than those previously used for language acquisition simulations within computational cognitive science.

4. An extrinsic evaluation of the induced grammars within a novel machine translation system.

5. A state-of-the-art open source machine translation system capable of producing high quality translations for a much larger range of languages than those handled by current systems.

The success of this research programme will have wide ranging impacts beyond the core contribution of a large scale syntactic induction system; from advanced new algorithms for machine learning and machine translation, to a powerful new tool for simulating child language acquisition within cognitive science.

The aims of this project are adventurous but the contribution of an effective and scalable model for Grammar Induction will be transformative for a wide range of text processing applications.

Key Findings
This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk
Potential use in non-academic contexts
This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk
Impacts
Description This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk
Summary
Date Materialised
Sectors submitted by the Researcher
This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk
Project URL:  
Further Information:  
Organisation Website: http://www.ox.ac.uk