Human-machine interfaces, video surveillance, sport performance enhancement, physical therapy, smart environments, to name a few, are important societal challenges that require better automatic behaviour analysis to be fully addressed. In order to move closer to the level of human proficiency, fully automatic understanding of a scene requires a whole range of capabilities: reliable extraction of each actor involved, its pose and their activities. This involves the combined application of pose estimation, multi target tracking and activity recognition. While impressive progress has been made in those fields in isolation, reliable methods, able to be applied to real world and unconstrained environments, are still a challenge. In this project we will focus on the intermediate components of behaviour analysis, by disregarding the traditional cascade pipeline, where pose estimation frequently plays a secondary role or it is completely obliterated due to its complexity, and proposing a novel architecture which has 3D pose estimation as the key central component with feedback between each of the other components.
In this project, we propose to investigate the automated 3D pose estimation and tracking of multiple people in realistic scenarios. This research is suggested on the basis that all current methods perform under strong limitations and assumptions that preclude their application to real-world situations. Thus, while some methods require multiple high-resolution sensors, thereby ruling out the use of current and near future sensor network infrastructures, others struggle with scenes containing multiple persons, or they succeed on the basis of the subjects not interacting and also knowing the activity performed beforehand. This last assumption reduces the practical application of the pose estimation and prevents it use for activity recognition and/or behavioural analysis.
To address this limitation, in this project we propose to extend the assumption from one of a single known activity as prior model, to one where a class of multiple activities is assumed, e.g., walking, running, fighting, shaking hands etc. This requires us to develop a novel multi-activity model that could be used as prior information to accurately and robustly estimate the 3D pose under complex and real world conditions. This multi activity model will avoid presuming the performed activity by each of the subject in the scene among the given set of activities. The development and use of such a model is the key novel contribution of this proposal, and is a first step towards a fully activity-agnostic 3D pose estimation for real environments.
Furthermore, we propose a paradigm change to the conventional behaviour analysis chain, where pose estimation becomes the cornerstone of the system, and the feedback loops with tracking, to address occlusions and interactions, and activity recognition, to switch between a set of plausible activities during the estimation, allows us to deal with the aforementioned issues. By modelling transitions between this set of activities, and observing how predicted poses propagate in time through the activity space, the current activity can be recognised and used as feedback for refining the pose estimation. This is the second novelty of this proposal. Lastly, inaccuracies in the pose estimation, caused by occlusion and multiple persons interacting, can be overcome by using information from the tracker to determine image regions that provide reliable pose estimation information. Similarly, by knowing the pose and activity of subjects in the scene, the tracking performance can be improved. This is the third novel aspect of the proposal.
|