EPSRC Reference: |
EP/F055765/1 |
Title: |
Global Inference for Summarization Using Integer Linear Programming |
Principal Investigator: |
Lapata, Professor M |
Other Investigators: |
|
Researcher Co-Investigators: |
|
Project Partners: |
|
Department: |
Sch of Informatics |
Organisation: |
University of Edinburgh |
Scheme: |
Standard Research |
Starts: |
26 January 2009 |
Ends: |
25 January 2012 |
Value (£): |
269,809
|
EPSRC Research Topic Classifications: |
Comput./Corpus Linguistics |
Information & Knowledge Mgmt |
|
EPSRC Industrial Sector Classifications: |
No relevance to Underpinning Sectors |
|
|
Related Grants: |
|
Panel History: |
Panel Date | Panel Name | Outcome |
21 Apr 2008
|
ICT Prioritisation Panel (April 2008)
|
Announced
|
|
Summary on Grant Application Form |
Summarization is the process of condensing a source text into a shorter version while preserving its information content. The applications of summarization are many and varied. From quick access to news and scientific articles to systems that aid physicians in gathering patient information and meeting browsers. Humans summarize on a daily basis and effortlessly (e.g., by describing the contents of a lecture, a meeting or a movie), but producing high quality summaries automatically remains a challenge. The difficulty lies primarily in the nature of the task which is complex, must satisfy many constraints (e.g., summary length, informativeness, coherence, grammaticality) and ultimately requires large-scale text understanding. Since robust text understanding is beyond the capabilities of current NLP technology, most work today focuses on extractive summarization. The idea here is to create a summary simply by identifying and subsequently concatenating the most important sentences in a document. Without a great deal of linguistic analysis, it is possible to create summaries for a wide range of documents, independently of style, text type, and subject matter. Unfortunately, extracts are often documents of low readability and text quality. In this project we will develop novel models for single-document summarization that break away from the sentence extraction paradigm. We will model summarization as an optimisation problem and use integer linear programming (ILP) for finding a summary that is best for the application, task, or user at hand. The ILP formulation is advantageous for two reasons. First, it allows us to explicitly encode the constraints our output summaries must meet. Secondly, ILP is a well studied optimization problem with efficient algorithms for finding a globally optimal solution in the presence of many conflicting constraints. This proposal aims to shift the summarization paradigm by developing novel and unified models based on the ILP framework that are able to identify what is important in a document and express it appropriately. The success of this research will make significant and far-reaching impact on summarization and related areas (e.g., information retrieval) that could not be brought about by incrementally extending conventional models.
|
Key Findings |
This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk
|
Potential use in non-academic contexts |
This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk
|
Impacts |
Description |
This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk |
Summary |
|
Date Materialised |
|
|
Sectors submitted by the Researcher |
This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk
|
Project URL: |
|
Further Information: |
|
Organisation Website: |
http://www.ed.ac.uk |