EPSRC Reference: |
EP/F069421/1 |
Title: |
Adaptive cognition for automated sports video annotation (ACASVA) |
Principal Investigator: |
Kittler, Professor J |
Other Investigators: |
|
Researcher Co-Investigators: |
|
Project Partners: |
|
Department: |
Vision Speech and Signal Proc CVSSP |
Organisation: |
University of Surrey |
Scheme: |
Standard Research |
Starts: |
05 May 2009 |
Ends: |
30 September 2013 |
Value (£): |
1,415,482
|
EPSRC Research Topic Classifications: |
Cognitive Science Appl. in ICT |
Human Communication in ICT |
Vision & Senses - ICT appl. |
|
|
EPSRC Industrial Sector Classifications: |
No relevance to Underpinning Sectors |
|
|
Related Grants: |
|
Panel History: |
Panel Date | Panel Name | Outcome |
18 Jul 2008
|
ICT Large Grants Panel (18 July 08)
|
Announced
|
|
Summary on Grant Application Form |
The development of a machine that can autonomously understand and interpret patterns of real-world events remains a challenging goal in AI. Humans are able to achieve this by developing sophisticated internal representational structures for object and events and the grammars that connect them. ACASVA aims to investigate the interaction between visual and linguistic grammars in learning by developing grammars in a scenario where the number of different events is constrained, by a set of rules, to be small: a sport. We will analyse video footage of a game (e.g. tennis) and use computer vision techniques to progressively understand it as a sequence of (possibly overlapping) events, and build a grammar of events. We will do a similar audio/linguistic analysis on the commentary on the game. Both of these grammars will be used to build a representational structure for understanding the game. Visual representations are additionally constrained by the inference of game rules so that object-classification mechanisms are preferentially tuned to game-relevant entities like 'player' rather than game-irrelevant entities like 'crowd-member'. We will also investigate how the two modes, sight and sound, can influence each other in the learning process; interpretation of the video is affected by the linguistic grammar and vice versa. Furthermore, this coupling of modes will lead to improved recognition of both audio and video events when the grammars from the video modes are used to influence the audio recognition, and vice versa. The psychological component of the ACASVA correspondingly attempts to learn how these capabilities are developed in humans; how visual grammars are organized and employed in the learning problem, how these grammars are modified by prior linguistic knowledge of the domain, how visual grammars map onto linguistic grammars, and how game rule-inferences influence lower-level visual learning (determined via gaze-behaviour). These results will feedback into the machine-learning problem and vice versa, as well as providing a performance benchmark for the system.Potential beneficiaries of ACASVA (in addition to the knowledge beneficiaries within the fields of science and engineering) include the broadcasting and on-line video search industries.
|
Key Findings |
This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk
|
Potential use in non-academic contexts |
This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk
|
Impacts |
Description |
This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk |
Summary |
|
Date Materialised |
|
|
Sectors submitted by the Researcher |
This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk
|
Project URL: |
http://cvssp.org/acasva/ |
Further Information: |
|
Organisation Website: |
http://www.surrey.ac.uk |