Details of Grant

EPSRC Reference:

EP/F00575X/1

Title:

Creating anaphorically annotated resources through semantic wikis (AnaWiki)

Principal Investigator:

Poesio, Professor M

Other Investigators:

Kruschwitz, Professor U

Researcher Co-Investigators:

Project Partners:

Department:

Computer Sci and Electronic Engineering

Organisation:

University of Essex

Scheme:

Standard Research

Starts:

01 November 2007

Ends:

30 September 2009

Value (£):

143,320

EPSRC Research Topic Classifications:

Comput./Corpus Linguistics

Information & Knowledge Mgmt

EPSRC Industrial Sector Classifications:

No relevance to Underpinning Sectors

Related Grants:

Panel History:

Panel Date	Panel Name	Outcome
07 Jun 2007	ICT Prioritisation Panel (Technology)	Announced

Summary on Grant Application Form

The ability to make progress in Natural Language Processing - both to develop better NLP systems and to develop better theories of how humans process language - depends on the availability of large annotated corpora: collections of documents annotated with human judgments about, say, what is the interpretation of ambiguous words such as 'bank' or 'stock' in a particular context, or what is the interpretation of anaphoric expressions like 'the corpus'. So the fact that current corpora annotated for semantic information are not large enough and do not collect the judgments of a large enough number of subjects is a major obstacle for NLP. Creating larger hand-annotated corpora with the current methods, however, is very expensive and time consuming; in practice, it is unfeasible to think of annotating more than 1M words. A variety of techniques for solving the problem by semi-automatic annotation have been proposed in the literature, such as bootstrapping and active learning; however, their usefulness has not yet been convincingly demonstrated. However, the success of Wikipedia shows that another approach might be possible: take advantage of the willingness of the Web population to contribute in collaborative resource creation efforts. This willingness has already been harnessed to tag images through the ESP game; we propose to develop tools that will make it possible for large numbers of volunteers over the Web to collaborate in the creation of semantically annotated corpora (specifically, of a corpus annotated with coreference information) . In this, we will build on existing efforts to develop versions of MediaWiki to support work on the Semantic Web, and on our own to develop reliable and easy-to-follow instructions for marking semantic judgments about anaphora. At the very least, these tools will make it possible for the community of NLP researchers themselves to collaborate in the creation of an Anaphoric Bank. We will however also run a pilot developing methods to attract the interest of the Web community at large; if these tests are successful, we may be able to use the power of collaborative effort through the Web to create really large annotated corpora. A distinctive feature of the approach we will adopt is that we will allow volunteers to mark differences in semantic judgments, and to express comments on previously expressed semantic judgments, so as to identify those judgments on which there is wide agreement and ones on which there is disagreement.

Key Findings

This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk

Potential use in non-academic contexts

This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk

Impacts

Description	This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk
Summary
Date Materialised

Sectors submitted by the Researcher

This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk

Project URL:

Further Information:

Organisation Website:

http://www.sx.ac.uk