EPSRC Reference: |
EP/L02411X/1 |
Title: |
Large-scale Unsupervised Parsing for Resource-Poor Languages |
Principal Investigator: |
Cohen, Dr S |
Other Investigators: |
|
Researcher Co-Investigators: |
|
Project Partners: |
|
Department: |
Sch of Informatics |
Organisation: |
University of Edinburgh |
Scheme: |
First Grant - Revised 2009 |
Starts: |
11 November 2014 |
Ends: |
10 February 2016 |
Value (£): |
100,651
|
EPSRC Research Topic Classifications: |
Artificial Intelligence |
Comput./Corpus Linguistics |
|
EPSRC Industrial Sector Classifications: |
No relevance to Underpinning Sectors |
|
|
Related Grants: |
|
Panel History: |
Panel Date | Panel Name | Outcome |
04 Feb 2014
|
EPSRC ICT Responsive Mode - Feb 2014
|
Announced
|
|
Summary on Grant Application Form |
This project focuses on the automatic induction of grammatical structure from raw text. Automatic inference of the syntax of sentences is an old problem in natural language processing, which originates in studies attempting to build computational models for the way humans learn language.
This problem is still far from being solved. There is yet no fully-fledged computer program that takes raw text and returns a computational representation of its syntax (for example, identifying the noun phrases, the verb phrases, the prepositional phrases, and how they relate to each other in the text).
This research aims to make a major step toward building such a system. The goal is to derive a new algorithm that recovers, at least partially, the syntax of raw text. The algorithm is based on the assumption that words which frequently tend to co-occur should usually be linked, not just semantically, but also syntactically. For example, if the word "deep" often co-occurs with the word "puddle", the algorithm will assume that "deep" tends to modify the word "puddle."
The algorithm is based on a new learning paradigm developed in the machine learning community called "spectral learning". This paradigm has many advantages, most notably, its well-motivated mathematical component. This means that we can derive mathematical proofs that guarantee that the algorithm will be able to learn the syntax of a language if the algorithm is exposed to sufficiently large enough amounts of raw text.
Such proofs are important partially because they explain the learnability of language by humans. If these proofs show that we do not require much data to learn syntax, they can shed light on humans' ability to learn language from (relatively) short exposure to language through their childhood.
|
Key Findings |
This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk
|
Potential use in non-academic contexts |
This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk
|
Impacts |
Description |
This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk |
Summary |
|
Date Materialised |
|
|
Sectors submitted by the Researcher |
This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk
|
Project URL: |
|
Further Information: |
|
Organisation Website: |
http://www.ed.ac.uk |