EPSRC Reference: |
EP/D053005/1 |
Title: |
Robust Syllable Recognition in the Acousic-Waveform Domain |
Principal Investigator: |
Cvetkovic, Professor Z |
Other Investigators: |
|
Researcher Co-Investigators: |
|
Project Partners: |
|
Department: |
Electronic Engineering |
Organisation: |
Kings College London |
Scheme: |
Standard Research (Pre-FEC) |
Starts: |
01 October 2006 |
Ends: |
31 March 2010 |
Value (£): |
207,533
|
EPSRC Research Topic Classifications: |
Human Communication in ICT |
Music & Acoustic Technology |
|
EPSRC Industrial Sector Classifications: |
|
Related Grants: |
|
Panel History: |
|
Summary on Grant Application Form |
This proposal is concerned with robust classification/recognition of speech units (phonemes and consonant-vowel syllables) in the domain of acoustic waveforms. The motivation for this research comes from the idea that speech units should be much better separated in the high-dimensional spaces formed by acoustic waveforms than in the smaller representation spaces which are used in state-of-the-art speech recognition systems and which involve significant compression and dimension reduction. Hence, recognition/classification in the acoustic waveform domain should exhibit a higher level of robustness to additive noise than classification in low-dimensional feature spaces.In the first phase of the project we will investigate classification of speech units in the acoustic waveform domain under severe noise conditions, around 0dB signal-to-noise ratio and below, while in the second phase we will study techniques which would make classification robust also to linear filtering. The particular tasks that will be tackled in the first phase can be summarized as follows:1. Study the detailed structure of the sets of acoustic waveforms of individual speech units; in particular their intrinsic dimensions, and the existence of possible nonlinear surfaces on which the data are concentrated.2. Guided by the findings from item 1 above, estimate statistical models of the distribution of speech units in the acoustic waveform domain. We will then design and systematically assess so-called generative classifiers, whose defining property is that they are based on such statistical models.3. Investigate classification of speech units in the acoustic waveform domain using discriminative classification techniques (artificial neural networks, support vector machines, and relevance vector machines). These can be a useful alternative to generative techniques because they focus directly on the classification problem without building explicit models of waveform distributions for each speech unit.4. Construct classifiers by grouping speech units hierarchically. Top-level classifiers will be constructed to distinguish between a small of groups of similar speech units, followed by classifiers separating groups into subgroups and so on. Different methods for defining subgroups will be explored, including confusion matrices of the classifiers from item 3, appropriate distance measures between the statistical models obtained in item 2, and possibly perceptual experiments.A potential argument against our approach is that classification in the acoustic waveform domain will break down in the presence of linear filtering. However, this can be avoided by considering narrow-band signals: for these, the effect of linear filtering is approximately equivalent to amplitude scaling and time delay. In the second phase of the project, we will therefore consider speech classification using narrow-band components of acoustic waveforms. For classification of signals in individual sub-bands, the techniques investigated in the first phase of the project will be considered. A new issue is then how to combine the results of sub-band classifiers to minimize the overall classification error. Here recently developed machine learning techniques will be used, as specified in the case for support.As explained, individual sub-band classifiers should be robust to linear filtering because the latter does not significantly alter the shape of narrow-band signals. On the other hand, the dimension of the spaces of sub-band waveforms will be still high enough to facilitate classification robust to additive noise. Hence, the overall scheme is expected to be robust to both additive noise and linear fitering.
|
Key Findings |
This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk
|
Potential use in non-academic contexts |
This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk
|
Impacts |
Description |
This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk |
Summary |
|
Date Materialised |
|
|
Sectors submitted by the Researcher |
This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk
|
Project URL: |
|
Further Information: |
|
Organisation Website: |
|