EPSRC Reference: |
EP/C537262/1 |
Title: |
Ranking Word Senses for Disambiguation: Models and Applications |
Principal Investigator: |
McCarthy, Dr DF |
Other Investigators: |
|
Researcher Co-Investigators: |
|
Project Partners: |
|
Department: |
Sch of Engineering and Informatics |
Organisation: |
University of Sussex |
Scheme: |
Standard Research (Pre-FEC) |
Starts: |
01 September 2005 |
Ends: |
30 September 2008 |
Value (£): |
225,499
|
EPSRC Research Topic Classifications: |
Comput./Corpus Linguistics |
Human Communication in ICT |
|
EPSRC Industrial Sector Classifications: |
|
Related Grants: |
|
Panel History: |
|
Summary on Grant Application Form |
When faced with the question 'Which plants thrive in chalky soil? humans have no trouble understanding that the plants are floral rather than industrial. Furthermore, humans recognise that the answers Sweetcorn and cabbage family vegetables do well on chalky soil , Sweetcorn and cabbage grow well on chalky ground , and Maize and cabbage-like vegetables grow well on chalky soil are all paraphrases and mean more or less the same thing. Semantic interpretation and disambiguation is performed effortlessly by humans but poses great difficulties to computer-based applications that extract, filter and manipulate information from textual data. Examples include Question Answering and Information Retrieval. With the rapidly growing amounts of text being stored by businesses and available over the Internet, such applications become increasingly important and timely and the development of improved methods for identifying the intended meaning of words (word senses) is a key technology for them.The most accurate techniques for word sense disambiguation (WSD) to date are those which are trained on text in which each word has been manually annotated with its intended sense. A major shortcoming of these methods, though, is that accuracy is strongly correlated with the quantity of training data available, and this is in short supply because its production is very labour-intensive. For many words the distribution of their senses is highly skewed and WSD systems work best when they take the most frequent sense into account. However, the most frequent sense of a word is often not known, particularly in domains (subject areas) in which no text has ever been manually annotated. In this project we will develop novel ways of estimating the frequency distributions of senses of words from raw (unannotated) text. We will exploit these distributions in WSD systems which do not rely on the availability of hand-labelled resources and will demonstrate the benefits of our methods in application to Question Answering.
|
Key Findings |
This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk
|
Potential use in non-academic contexts |
This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk
|
Impacts |
Description |
This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk |
Summary |
|
Date Materialised |
|
|
Sectors submitted by the Researcher |
This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk
|
Project URL: |
|
Further Information: |
|
Organisation Website: |
http://www.sussex.ac.uk |