The notion of video visualization was coined by the PI and his postgraduate student in their 2003 IEEE VIS paper. It is a technology drawing the concepts and methodologies from volume and flow visualization, image and video processing, and vision science. It extracts meaningful information from a video data set and conveys the extracted information to users in appropriate visual representations. It is not intended to provide fully automatic solutions to the traditional problems in video processing, but involves human in the loop of intelligent reasoning while reducing the burden of viewing videos. In the subsequent work in collaboration with Stuttgart, the PI and CI introduced the concept of visual signatures in video visualization, and reported a major user study conducted at Swansea involving some 92 subjects [IEEE TVCG 2006]. This work offered an important scientific insight as to how human observers may learn to recognize visual signatures of events depicted in an abstract visual representation of a video.[Tsotsos01] stated that a bounded visual search (e.g., looking for all moving pixel clusters with 20-60 pixels) can be achieved in linear time, whist an unbounded visual search (e.g., looking for something abnormal in a video) is NP-complete. For most practical problems in video processing and computer vision, we rarely have perfectly bounded visual search. We often search simultaneously for entities (e.g., objects, motions or events) in different classes. The models that are used to guide a search are usually incomplete and may lead to uncertainty or errors in detection, segmentation andclassification. The dynamic and unpredictable nature of the input videos instigates mechanisms for heuristic reasoning and iterative decision optimization, which further depart from linear or polynomial performance.In contrast, the human eye-brain system is undeniable more powerful than any current vision system in performing visual searches, especially unbounded visual searches. Even we suppose that the human eye-brain system is a Turing machine, its 100 billion neurons and 100-500 trillion synaptic connections between neurons will unlikely to be matched by computers in the near future. Hence this raises the possibility that using video visualization to aid unbound visual search may provide a more scalable means for dealing with large volumes of video datasets.Video visualization can be deployed in many application areas, such as scientific experimentation and computation, security industry and media and entertainment industry. However, in traditional visualization (e.g., medical visualization), the users are normally familiar with the 3D objects (e.g., bones or organs) depicted in a visual representation. In contrast, human observers are not familiar with the 3D objects depicted in a visual representation of a video because one spatial dimension of these objects shows the temporal dimension. The problem is further complicated by the fact that, in most videos, each 2D frame is the projective view of a 3D scene. Hence, a visual representation of a video on a computer display is, in effect, a 2D projective view of a 4D spatiotemporal domain. In order to for us to see 'time' without using 'time', we need to address a range of challenges in science, technology, visual perception and applications. This project is intended to continue the UK's leadership in tackling these challenges by building on the existing expertise and excellence in video visualization.
|