Details of Grant

EPSRC Reference:

EP/C537262/1

Title:

Ranking Word Senses for Disambiguation: Models and Applications

Principal Investigator:

McCarthy, Dr DF

Other Investigators:

Carroll, Professor J

Researcher Co-Investigators:

Project Partners:

Department:

Sch of Engineering and Informatics

Organisation:

University of Sussex

Scheme:

Standard Research (Pre-FEC)

Starts:

01 September 2005

Ends:

30 September 2008

Value (£):

225,499

EPSRC Research Topic Classifications:

Comput./Corpus Linguistics

Human Communication in ICT

EPSRC Industrial Sector Classifications:

Creative Industries

Related Grants:

EP/C538447/1

Panel History:

Summary on Grant Application Form

When faced with the question 'Which plants thrive in chalky soil? humans have no trouble understanding that the plants are floral rather than industrial. Furthermore, humans recognise that the answers Sweetcorn and cabbage family vegetables do well on chalky soil , Sweetcorn and cabbage grow well on chalky ground , and Maize and cabbage-like vegetables grow well on chalky soil are all paraphrases and mean more or less the same thing. Semantic interpretation and disambiguation is performed effortlessly by humans but poses great difficulties to computer-based applications that extract, filter and manipulate information from textual data. Examples include Question Answering and Information Retrieval. With the rapidly growing amounts of text being stored by businesses and available over the Internet, such applications become increasingly important and timely and the development of improved methods for identifying the intended meaning of words (word senses) is a key technology for them.The most accurate techniques for word sense disambiguation (WSD) to date are those which are trained on text in which each word has been manually annotated with its intended sense. A major shortcoming of these methods, though, is that accuracy is strongly correlated with the quantity of training data available, and this is in short supply because its production is very labour-intensive. For many words the distribution of their senses is highly skewed and WSD systems work best when they take the most frequent sense into account. However, the most frequent sense of a word is often not known, particularly in domains (subject areas) in which no text has ever been manually annotated. In this project we will develop novel ways of estimating the frequency distributions of senses of words from raw (unannotated) text. We will exploit these distributions in WSD systems which do not rely on the availability of hand-labelled resources and will demonstrate the benefits of our methods in application to Question Answering.

Key Findings

This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk

Potential use in non-academic contexts

This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk

Impacts

Description	This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk
Summary
Date Materialised

Sectors submitted by the Researcher

This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk

Project URL:

Further Information:

Organisation Website:

http://www.sussex.ac.uk