EPSRC logo

Details of Grant 

EPSRC Reference: EP/G001960/1
Title: Corpus-Based Speech Separation
Principal Investigator: Ji, Professor M
Other Investigators:
Crookes, Professor D
Researcher Co-Investigators:
Project Partners:
Department: Computer Science
Organisation: Queen's University of Belfast
Scheme: Standard Research
Starts: 01 October 2008 Ends: 30 September 2011 Value (£): 312,394
EPSRC Research Topic Classifications:
Comput./Corpus Linguistics Digital Signal Processing
EPSRC Industrial Sector Classifications:
No relevance to Underpinning Sectors
Related Grants:
Panel History:
Panel DatePanel NameOutcome
21 Apr 2008 ICT Prioritisation Panel (April 2008) Announced
Summary on Grant Application Form
In this project, we will develop new techniques for restoring clear speech from noisy recordings. We will focus on two problems: (1) retrieving speech from background noise, and (2) separating speech sentences spoken by different speakers. For convenience, we reference both problems as speech separation.Over the past decades, there have been many techniques developed for speech separation. While appearing in different forms, most techniques can be viewed as a filter, which aims to pass the frequencies of the targeted speech with minimum distortion, and at the same time block the frequencies of the noise. To build the filter, one thus needs knowledge about the frequency structure of the noise. For certain applications in which the noise remains relatively constant, one may obtain an estimate of the noise structure using the data observed at a time without speech, and then use it to predict the noise structure in the data containing mixed speech and noise. Based on the prediction, a filter can be formed to remove the noise and hence restore the speech. Unfortunately, this strategy does not work if the noise changes fast and thus is unpredictable. Examples of fast-varying noises include crosstalk speech, and the background noises in mobile/Internet communications, which are often complex, highly dynamic, and thus difficult to predict.In this research, we will investigate a new method to speech separation, aiming for the capability of handling unpredictable noise. We will use a pre-recorded speech corpus, consisting of clean speech sentences by various speakers, to help remove the requirement for information about the noise. The new method consists of four major components. First, we compare the noisy sentence, containing mixed speech and noise, with each corpus sentence to find all their matching parts. Second, we combine the longest matching parts from the clean corpus sentences to form a new sentence, as a reconstruction of the target speech. Because of the richer and more distinct contexts, longer speech utterances are less confused by noise, and thus can be recognised with fewer errors than shorter utterances. This explains why we synthesise the target speech using longest recognised speech parts, which minimises the effect of noise on the restoration. The third component of our method is a novel technique to reduce the sensitivity to noise for finding the matching speech parts between the noisy and corpus sentences. The last component uses the speakers characteristics, associated with the individual corpus sentences, to help separate mixed sentences spoken by different speakers. Combining these components, the new method offers the capability to separate speech from noise, and separate mixed speech sentences, without having to predict the noise/crosstalk.
Key Findings
This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk
Potential use in non-academic contexts
This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk
Impacts
Description This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk
Summary
Date Materialised
Sectors submitted by the Researcher
This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk
Project URL:  
Further Information:  
Organisation Website: http://www.qub.ac.uk