EPSRC logo

Details of Grant 

EPSRC Reference: EP/J020583/1
Title: RAnDMS (Real time Analysis of Digital Media Streams)
Principal Investigator: Ciravegna, Professor F
Other Investigators:
Tucker, Dr S
Researcher Co-Investigators:
Project Partners:
Department: Computer Science
Organisation: University of Sheffield
Scheme: Standard Research
Starts: 01 May 2012 Ends: 31 August 2013 Value (£): 208,723
EPSRC Research Topic Classifications:
Artificial Intelligence Computer Graphics & Visual.
Information & Knowledge Mgmt
EPSRC Industrial Sector Classifications:
Aerospace, Defence and Marine Information Technologies
Related Grants:
Panel History:
Panel DatePanel NameOutcome
09 Feb 2012 Data Intensive Systems (DaISy) Announced
Summary on Grant Application Form
RAnDMS will study, implement and evaluate Real-time Data and Visual Analytic techniques to enable intelligence agencies, the MoD, the police and emergency responders to monitor and make sense of local, regional and global events using web-scale data from social and traditional media streams. The intelligence gathering task will be defined as identifying, correlating, integrating and presenting data and information, in order to understand situations as they arise. Current technology does not provide efficient and effective solutions, as it mainly focuses on detecting trends in the use of keywords and tags. While this is able to spot overall patterns in the data, it just enables the retrieval of relevant documents, without any correlation and integration of the contained information. Moreover, information concerning local situations and events, which may only be discussed within a handful of documents, is ignored.

Within RAnDMS data analytics will focus on enabling the capture of information from media streams; illuminating situations at all levels, from global to local. This information will support decision making for the intelligence community, which is expected to increase their ability to monitor events and situations relevant to homeland security and to peace-keeping efforts. The scientific challenge is that data and information in these streams are: (i) high in volume, and constantly increasing, (ii) often duplicated, incomplete, imprecise and incorrect; (iii) written in informal style (i.e. short, unedited and conversational); and (iv) generally concerning the short-term zeitgeist. These characteristics make analysis very hard, especially when considering that major requirements of the intelligence community are that (i) documents must be processed in real-time and (ii) the relevant information may be in the long-tail of the distribution, i.e. it may be mentioned very infrequently.

We will provide highly efficient and effective technologies able to associate each document with its context. A documents context is provided by four dimensions: (who) the author of the document, (when) the time it was sent, (where) the location referred to in the document and (what) other documents with similar content. This information is either provided by the media stream or extracted from the document's content using efficient statistical text-mining techniques. By interpreting documents in terms of these four dimensions we enable: (i) the detection of events, i.e. documents and their content (what) are clustered around a time and place; (ii) the profiling of authors from the content (what and where) of the documents they have created; and (iii) determine information that is missing or ambiguous in document, using information present in the documents within their context.

Visual analytics will facilitate the exploration of the information by providing multiple views; enabling focused investigation and trend visualisations across the four dimensions. We will devise methods to (i) suggest the right level of detail (granularity) for the user focus in rapidly changing environments; (ii) alert users to any significant development outside of their current viewpoint; and (iii) enable users to understand how the current state of affairs came into being by browsing along the all information along the time dimension. Methods will en able to see through the irrelevant banter (noise) that often surround events in social media and go directly to the relevant information that can be hidden in the long tail of the distribution.

RAnDMS will be tested on the task of supporting intelligence operators during relevant events happening during 2012/13. We will publish the research and its findings in international journal and conferences. Subject to MoD agreement, we will also create public research resources by generating one publicly available task (inclusive of corpora, resources, etc.) to enable comparison of research results by other researchers.

Key Findings
This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk
Potential use in non-academic contexts
This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk
Impacts
Description This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk
Summary
Date Materialised
Sectors submitted by the Researcher
This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk
Project URL:  
Further Information:  
Organisation Website: http://www.shef.ac.uk