EPSRC logo

Details of Grant 

EPSRC Reference: EP/I011811/1
Title: Learning to Recognise Dynamic Visual Content from Broadcast Footage
Principal Investigator: Bowden, Professor R
Other Investigators:
Researcher Co-Investigators:
Project Partners:
Department: Vision Speech and Signal Proc CVSSP
Organisation: University of Surrey
Scheme: Standard Research
Starts: 01 September 2011 Ends: 29 February 2016 Value (£): 489,782
EPSRC Research Topic Classifications:
Image & Vision Computing
EPSRC Industrial Sector Classifications:
Communications Creative Industries
Related Grants:
EP/I01229X/1 EP/I012001/1
Panel History:
Panel DatePanel NameOutcome
13 Oct 2010 ICT Prioritisation Panel (Oct 2010) Announced
Summary on Grant Application Form
This research is in the area of computer vision - making computers which can understand what is happening in photographs and video. As humans we are fascinated by other humans, and capture endless images of their activities, for example home movies of our family on holiday, video of sports events or CCTV footage of people in a town center. A computer capable of understanding what people are doing in such images would be able to do many jobs for us, for example finding clips of our children waving, fast forwarding to a goal in a football game, or spotting when someone starts a fight in the street. For Deaf people, who use a language combining hand gestures with facial expression and body language, a computer which could visually understand their actions would allow them to communicate in their native language. While humans are very good at understanding what people are doing (and can learn to understand special actions such as sign language), this has proved extremely challenging for computers.Much work has tried to solve this problem, and works well in particular settings for example the computer can tell if a person is walking so long as they do it clearly and face to the side, or can understand a few sign language gestures as long as the signer cooperates and signs slowly. We will investigate better models for recognising activities by teaching the computer by showing it many example videos. To make sure our method works well for all kinds of setting we will use real world video from movies and TV. For each video we have to tell the computer what it represents, for example throwing a ball or a man hugging a woman . It would be expensive to collect and label lots of videos in this way, so instead we will extract approximate labels automatically from subtitle text and scripts which are available for TV. Our new methods will combine learning from lots of approximately labelled video (cheap because we get the labels automatically), use of contextual information such as which actions people do at the same time, or how one action leads to another ( he hits the man, who falls to the floor ), and computer vision methods for understanding the pose of a person (how they are standing), how they are moving, and the objects which they are using.By having lots of video to learn from, and methods for making use of approximate labels, we will be able to make stronger and more flexible models of human activities. This will lead to recognition methods which work better in the real world and contribute to applications such as interpreting sign language and automatically tagging video with its content.
Key Findings
This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk
Potential use in non-academic contexts
This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk
Impacts
Description This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk
Summary
Date Materialised
Sectors submitted by the Researcher
This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk
Project URL:  
Further Information:  
Organisation Website: http://www.surrey.ac.uk