EPSRC logo

Details of Grant 

EPSRC Reference: EP/C537262/1
Title: Ranking Word Senses for Disambiguation: Models and Applications
Principal Investigator: McCarthy, Dr DF
Other Investigators:
Carroll, Professor J
Researcher Co-Investigators:
Project Partners:
Department: Sch of Engineering and Informatics
Organisation: University of Sussex
Scheme: Standard Research (Pre-FEC)
Starts: 01 September 2005 Ends: 30 September 2008 Value (£): 225,499
EPSRC Research Topic Classifications:
Comput./Corpus Linguistics Human Communication in ICT
EPSRC Industrial Sector Classifications:
Creative Industries
Related Grants:
EP/C538447/1
Panel History:  
Summary on Grant Application Form
When faced with the question 'Which plants thrive in chalky soil? humans have no trouble understanding that the plants are floral rather than industrial. Furthermore, humans recognise that the answers Sweetcorn and cabbage family vegetables do well on chalky soil , Sweetcorn and cabbage grow well on chalky ground , and Maize and cabbage-like vegetables grow well on chalky soil are all paraphrases and mean more or less the same thing. Semantic interpretation and disambiguation is performed effortlessly by humans but poses great difficulties to computer-based applications that extract, filter and manipulate information from textual data. Examples include Question Answering and Information Retrieval. With the rapidly growing amounts of text being stored by businesses and available over the Internet, such applications become increasingly important and timely and the development of improved methods for identifying the intended meaning of words (word senses) is a key technology for them.The most accurate techniques for word sense disambiguation (WSD) to date are those which are trained on text in which each word has been manually annotated with its intended sense. A major shortcoming of these methods, though, is that accuracy is strongly correlated with the quantity of training data available, and this is in short supply because its production is very labour-intensive. For many words the distribution of their senses is highly skewed and WSD systems work best when they take the most frequent sense into account. However, the most frequent sense of a word is often not known, particularly in domains (subject areas) in which no text has ever been manually annotated. In this project we will develop novel ways of estimating the frequency distributions of senses of words from raw (unannotated) text. We will exploit these distributions in WSD systems which do not rely on the availability of hand-labelled resources and will demonstrate the benefits of our methods in application to Question Answering.
Key Findings
This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk
Potential use in non-academic contexts
This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk
Impacts
Description This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk
Summary
Date Materialised
Sectors submitted by the Researcher
This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk
Project URL:  
Further Information:  
Organisation Website: http://www.sussex.ac.uk