EPSRC logo

Details of Grant 

EPSRC Reference: EP/M018946/1
Title: Open Domain Statistical Spoken Dialogue Systems
Principal Investigator: Gasic, Dr M
Other Investigators:
Researcher Co-Investigators:
Project Partners:
VocalIQ Limited
Department: Engineering
Organisation: University of Cambridge
Scheme: Standard Research
Starts: 01 April 2015 Ends: 30 September 2018 Value (£): 603,425
EPSRC Research Topic Classifications:
Artificial Intelligence Comput./Corpus Linguistics
Human Communication in ICT
EPSRC Industrial Sector Classifications:
Communications Information Technologies
Related Grants:
Panel History:
Panel DatePanel NameOutcome
02 Dec 2014 EPSRC ICT Prioritisation Panel - Dec 2014 Announced
Summary on Grant Application Form
Spoken Dialogue Systems (SDS) encompass the technologies required to build effective man-machine interfaces which depend primarily on voice. To date they have mostly been deployed in telephone-based call centre applications such as banking, billing queries and travel information and they are built using hand-crafted rules.

The recent introduction of Apple Siri and Google Now has moved voice-based interfaces into the main-stream. These virtual personal assistants (VPAs) offer the potential to revolutionise the way we interact with machines, and they open the way to properly control and manage the emerging Internet of Things - the rapidly growing network of smart devices which lack any form of conventional user interface. However, current personal assistants are built using the same technology as limited domain spoken dialogue systems. They are not capable of sustaining conversational dialogues except within the selected limited domains which they have been explicitly programmed to handle.

Very recent work on statistical SDS has demonstrated that it is not only possible for such a system to adapt and improve performance within the domain for which it has been designed but it is also possible for the system to automatically extend its coverage to include new, hitherto unseen concepts. This suggests that it should be possible to build on the progress achieved in the development of limited domain statistical SDS to design a radically new form of spoken dialogue system (and hence VPA) which is able to extend and adapt with use to cover an ever-wider range of conversational topics. The design of such a system is the focus of this research proposal.

The key idea is to integrate the latest statistical dialogue technology into a wide coverage knowledge graph (such as freebase) which contains not only ontological information about entities but also the operations that can be applied to those entities (e.g. find flight information, book a hotel room, buy an ebook, etc. ).

The implementation of a single monolithic spoken dialogue system capable of interpreting and responding to every conceivable user request is simply not practicable. Hence, rather than simply trying to broaden the coverage of existing SDS, a novel distributed system architecture is proposed with three key features:

1. the three essential components of an SDS (semantic decoder, dialogue manager and response generator) are distributed across the knowledge-graph. In essence, every node in the graph has the capability to recognise when it is being referred to and have the capability to respond appropriately.

2. when the user speaks, all semantic decoders are listening, based on the activation levels of the decoder outputs, a topic tracker identifies which concept is in focus and activates its dialogue policy.

3. all components are statistical enabling them to be adapted automatically on-line using unsupervised adaptation. Data sparsity is managed by ensuring that the top level nodes in the class hierarchy have well-trained components. Initially, lower level more specialised concepts simply inherit the required statistical models from their super-classes. As the system interacts with users and more data is collected, lower level components acquire sufficient data to train their own dedicated statistical models.

The end result is a system that continually learns on-line. It starts with a limited and stilted conversational style, but the more it is used, the more fluent it becomes, and as users explore new topics, the system learns to adapt and extend its capability to handle those new topics. Since many users can be using the system simultaneously, learning can be fast and capable of accommodating live updates of the underlying data, all of which are characteristics that a virtual personal assistant must have to be genuinely useful.

Key Findings
This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk
Potential use in non-academic contexts
This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk
Impacts
Description This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk
Summary
Date Materialised
Sectors submitted by the Researcher
This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk
Project URL: http://mi.eng.cam.ac.uk/research/dialogue/EPSRCProj
Further Information:  
Organisation Website: http://www.cam.ac.uk