EPSRC Reference: |
EP/C535308/1 |
Title: |
Multi-Modal Blind Source Separation Algorithms |
Principal Investigator: |
Chambers, Professor J |
Other Investigators: |
|
Researcher Co-Investigators: |
|
Project Partners: |
|
Department: |
Sch of Engineering |
Organisation: |
Cardiff University |
Scheme: |
Standard Research (Pre-FEC) |
Starts: |
16 September 2005 |
Ends: |
31 August 2007 |
Value (£): |
278,988
|
EPSRC Research Topic Classifications: |
Digital Signal Processing |
|
|
EPSRC Industrial Sector Classifications: |
|
Related Grants: |
|
Panel History: |
|
Summary on Grant Application Form |
This project concerns the emulation of the ability of a human to separate one speech source from a background of other speakers and possibly noise sources, such as an air conditioning unit, within an office environment. This is termed the cocktail party problem and it is a very challenging task to use a number of microphones together with a computer to process the recordings, and thereby extract the speaker of interest. As humans, we use much more than the sound that is perceived by our two ears to address this problem. Our eyes, for example, also provide visual cues which help in the process. It is therefore the focus of this work to integrate both audio and visual measurements, attained from microphones and cameras within the office, to aid in the separation process. The human is also likely to exploit knowledge of language in the separation process, we therefore plan to utilize mathematical models of the audio and speech recordings in the separation process, these are called coupled (or fused) Hidden Markov Models. When a word is uttered within a room, the sound wave propagates through many paths to the microphone, due to reflections on the walls, ceiling or floor, or other objects in the room, such as a table. This so-called multipath propagation, is modelled by what is called a convolutive mixture. A convolutive model is the relationship between the input and output of a linear, possibly multichannel, system which remembers (has memory) past inputs and possibly outputs (only inputs in this project). To perform the separation process it is therefore necessary to use a convolutive model. Such a model would need many calculations to perform separation but this becomes much easier in the frequency domain. Separation in the frequency domain is, we believe, the way forward to tackle this problem, but there are problems to be solved. In particular, how to reconstruct the extracted speech signal back in the time domain (the so-called permutation problem), how to deal with the case of more than two speakers in the room and when the speakers are moving. Our approach in this work is to use additional visual information to overcome these problems. We therefore wish to equip an intelligent office within the Centre of Digital Signal Processing at the Cardiff School of Engineering with microphones and cameras together with the necessary computing facilities to record examples of audio and visual signals for testing, initially with two well positioned speakers uttering distinct sounds, such as vowels and consenants, and then moving onto more speakers and movement; ultimately, recording natural continuous speech. The overall goal is to be able to demonstrate the ability to separate any one of the speakers utterances within the intelligent office which would then facilitate interaction, for example, with a voice recogniser or third party, at a remote location, as in teleconferencing.
|
Key Findings |
This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk
|
Potential use in non-academic contexts |
This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk
|
Impacts |
Description |
This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk |
Summary |
|
Date Materialised |
|
|
Sectors submitted by the Researcher |
This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk
|
Project URL: |
|
Further Information: |
|
Organisation Website: |
http://www.cf.ac.uk |