EPSRC Reference: |
EP/F00575X/1 |
Title: |
Creating anaphorically annotated resources through semantic wikis (AnaWiki) |
Principal Investigator: |
Poesio, Professor M |
Other Investigators: |
|
Researcher Co-Investigators: |
|
Project Partners: |
|
Department: |
Computer Sci and Electronic Engineering |
Organisation: |
University of Essex |
Scheme: |
Standard Research |
Starts: |
01 November 2007 |
Ends: |
30 September 2009 |
Value (£): |
143,320
|
EPSRC Research Topic Classifications: |
Comput./Corpus Linguistics |
Information & Knowledge Mgmt |
|
EPSRC Industrial Sector Classifications: |
No relevance to Underpinning Sectors |
|
|
Related Grants: |
|
Panel History: |
Panel Date | Panel Name | Outcome |
07 Jun 2007
|
ICT Prioritisation Panel (Technology)
|
Announced
|
|
Summary on Grant Application Form |
The ability to make progress in Natural Language Processing - both to develop better NLP systems and to develop better theories of how humans process language - depends on the availability of large annotated corpora: collections of documents annotated with human judgments about, say, what is the interpretation of ambiguous words such as 'bank' or 'stock' in a particular context, or what is the interpretation of anaphoric expressions like 'the corpus'. So the fact that current corpora annotated for semantic information are not large enough and do not collect the judgments of a large enough number of subjects is a major obstacle for NLP. Creating larger hand-annotated corpora with the current methods, however, is very expensive and time consuming; in practice, it is unfeasible to think of annotating more than 1M words. A variety of techniques for solving the problem by semi-automatic annotation have been proposed in the literature, such as bootstrapping and active learning; however, their usefulness has not yet been convincingly demonstrated. However, the success of Wikipedia shows that another approach might be possible: take advantage of the willingness of the Web population to contribute in collaborative resource creation efforts. This willingness has already been harnessed to tag images through the ESP game; we propose to develop tools that will make it possible for large numbers of volunteers over the Web to collaborate in the creation of semantically annotated corpora (specifically, of a corpus annotated with coreference information) . In this, we will build on existing efforts to develop versions of MediaWiki to support work on the Semantic Web, and on our own to develop reliable and easy-to-follow instructions for marking semantic judgments about anaphora. At the very least, these tools will make it possible for the community of NLP researchers themselves to collaborate in the creation of an Anaphoric Bank. We will however also run a pilot developing methods to attract the interest of the Web community at large; if these tests are successful, we may be able to use the power of collaborative effort through the Web to create really large annotated corpora. A distinctive feature of the approach we will adopt is that we will allow volunteers to mark differences in semantic judgments, and to express comments on previously expressed semantic judgments, so as to identify those judgments on which there is wide agreement and ones on which there is disagreement.
|
Key Findings |
This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk
|
Potential use in non-academic contexts |
This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk
|
Impacts |
Description |
This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk |
Summary |
|
Date Materialised |
|
|
Sectors submitted by the Researcher |
This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk
|
Project URL: |
|
Further Information: |
|
Organisation Website: |
http://www.sx.ac.uk |