Details of Grant

EPSRC Reference:

EP/F055765/1

Title:

Global Inference for Summarization Using Integer Linear Programming

Principal Investigator:

Lapata, Professor M

Other Investigators:

Grothey, Dr A

Researcher Co-Investigators:

Project Partners:

Department:

Sch of Informatics

Organisation:

University of Edinburgh

Scheme:

Standard Research

Starts:

26 January 2009

Ends:

25 January 2012

Value (£):

269,809

EPSRC Research Topic Classifications:

Comput./Corpus Linguistics

Information & Knowledge Mgmt

EPSRC Industrial Sector Classifications:

No relevance to Underpinning Sectors

Related Grants:

Panel History:

Panel Date	Panel Name	Outcome
21 Apr 2008	ICT Prioritisation Panel (April 2008)	Announced

Summary on Grant Application Form

Summarization is the process of condensing a source text into a shorter version while preserving its information content. The applications of summarization are many and varied. From quick access to news and scientific articles to systems that aid physicians in gathering patient information and meeting browsers. Humans summarize on a daily basis and effortlessly (e.g., by describing the contents of a lecture, a meeting or a movie), but producing high quality summaries automatically remains a challenge. The difficulty lies primarily in the nature of the task which is complex, must satisfy many constraints (e.g., summary length, informativeness, coherence, grammaticality) and ultimately requires large-scale text understanding. Since robust text understanding is beyond the capabilities of current NLP technology, most work today focuses on extractive summarization. The idea here is to create a summary simply by identifying and subsequently concatenating the most important sentences in a document. Without a great deal of linguistic analysis, it is possible to create summaries for a wide range of documents, independently of style, text type, and subject matter. Unfortunately, extracts are often documents of low readability and text quality. In this project we will develop novel models for single-document summarization that break away from the sentence extraction paradigm. We will model summarization as an optimisation problem and use integer linear programming (ILP) for finding a summary that is best for the application, task, or user at hand. The ILP formulation is advantageous for two reasons. First, it allows us to explicitly encode the constraints our output summaries must meet. Secondly, ILP is a well studied optimization problem with efficient algorithms for finding a globally optimal solution in the presence of many conflicting constraints. This proposal aims to shift the summarization paradigm by developing novel and unified models based on the ILP framework that are able to identify what is important in a document and express it appropriately. The success of this research will make significant and far-reaching impact on summarization and related areas (e.g., information retrieval) that could not be brought about by incrementally extending conventional models.

Key Findings

This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk

Potential use in non-academic contexts

This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk

Impacts

Description	This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk
Summary
Date Materialised

Sectors submitted by the Researcher

This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk

Project URL:

Further Information:

Organisation Website:

http://www.ed.ac.uk