EPSRC Reference: |
GR/J10426/01 |
Title: |
ENHANCEMENT OF THE INTELLIGIBILITY OF NATURAL AND SYNTHETIC SPEECH |
Principal Investigator: |
Hazan, Professor VL |
Other Investigators: |
|
Researcher Co-Investigators: |
|
Project Partners: |
|
Department: |
Phonetics and Linguistics |
Organisation: |
UCL |
Scheme: |
Standard Research (Pre-FEC) |
Starts: |
25 September 1993 |
Ends: |
24 September 1996 |
Value (£): |
138,785
|
EPSRC Research Topic Classifications: |
Human Communication in ICT |
|
|
EPSRC Industrial Sector Classifications: |
|
Related Grants: |
|
Panel History: |
|
Summary on Grant Application Form |
To investigate techniques of speech pattern enhancement in natural and synthetic speech. These techniques exploit a knowledge of which acoustic cues determine the perception of phonetic features, and are designed to compensate for some of the intrinsic weaknesses of normal speech production and perception. To implement the most successful enhancements in a text-to-speech system in order to evaluate the effect on intelligibility at suprasegmental level. Progress:Phase 1:Patterns of confusions in synthesised and degraded natural VCVsIn order to investigate which features appear to be the least robust in natural speech degraded by noise and in synthesised speech, perceptual tests have been run using nonsense Vowel-Consonant-Vowel (VCV) stimuli (e.g. apa, imi), produced by a male speaker and three different synthesisers. For both natural and synthetic speech conditions, the most common errors were found to be plosive/fricative confusions and place of articulation errors.Phase 2:Investigations of different cue-enhancements for natural speech in noiseThe effect of increases in the relative amplitude of information-rich portions of nonsense VCV utterances on the intelligibility of natural speech in noise was investigated. Various degrees of enhancement were applied to the consonant portion (burst, friction or nasal portions), to the first few cycles of the vowel in order to make the formant transitions more salient, or to both simultaneously. All cue-enhanced conditions led to higher intelligibility scores, due mainly to a reduction of errors in the perception of manner of articulation. Spectral and temporal enhancements are being investigated using filtering and waveform editing techniques. Phase 3:Investigations of cue-enhancements for synthetic speech in noiseFull manipulation of spectral and temporal characteristics of the signal requires the use of high-quality copy-syntheses of natural utterances. An X-Windows graphical interface (KPE) has been developed to facilitate the preparation of these copy-syntheses using the Klatt synthesiser. The synthesised VCVs obtained give error rates in quiet which are similar to those obtained for natural speech. Various degrees of spectral enhancements are being applied to these copy-syntheses including: increasing the extent of formant transitions at vowel onset, and making spectral changes to the burst and friction portions of plosives and fricatives. Phase 4:Choice of text-to-speech system for implementationIn order to test the benefits of acoustic cue-enhancements at supra-segmental level we plan to incorporate the refined enhancement techniques in a text-to-speech system. The availability and relative benefits of diphone-based and synthesis-by-rule systems are currently being explored.
|
Key Findings |
This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk
|
Potential use in non-academic contexts |
This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk
|
Impacts |
Description |
This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk |
Summary |
|
Date Materialised |
|
|
Sectors submitted by the Researcher |
This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk
|
Project URL: |
|
Further Information: |
|
Organisation Website: |
|