EPSRC logo

Details of Grant 

EPSRC Reference: EP/R018537/1
Title: Big Hypotheses: A Fully Parallelised Bayesian Inference Solution
Principal Investigator: Maskell, Professor S
Other Investigators:
Jones, Professor AR Chong, Dr S Thiyagalingam, Dr J
Pinning, Dr RL Langfeld, Professor KA Alison, Professor L
Green, Dr PL Diaz De la O, Dr F
Researcher Co-Investigators:
Project Partners:
AstraZeneca Atos Origin IT Services UK Ltd AWE
Defence Science & Tech Lab DSTL IBM Intel Corporation Ltd
Knowledge Transfer Network Limited nVIDIA Unilever
Department: Electrical Engineering and Electronics
Organisation: University of Liverpool
Scheme: Standard Research
Starts: 01 April 2018 Ends: 31 March 2023 Value (£): 2,557,654
EPSRC Research Topic Classifications:
Fundamentals of Computing Statistics & Appl. Probability
EPSRC Industrial Sector Classifications:
Information Technologies
Related Grants:
Panel History:
Panel DatePanel NameOutcome
09 Oct 2017 New Approaches to Data Science Interviews 9 and 10 October 2017 Announced
Summary on Grant Application Form
Bayesian inference is a process which allows us to extract information from data. The process uses prior knowledge articulated as statistical models for the data. We are focused on developing a transformational solution to Data Science problems that can be posed as such Bayesian inference tasks.

An existing family of algorithms, called Markov chain Monte Carlo (MCMC) algorithms, offer a family of solutions that offer impressive accuracy but demand significant computational load. For a significant subset of the users of Data Science that we interact with, while the accuracy offered by MCMC is recognised as potentially transformational, the computational load is just too great for MCMC to be a practical alternative to existing approaches. These users include academics working in science (e.g., Physics, Chemistry, Biology and the social sciences) as well as government and industry (e.g., in the pharmaceutical, defence and manufacturing sectors). The problem is then how to make the accuracy offered by MCMC accessible at a fraction of the computational cost.

The solution we propose is based on replacing MCMC with a more recently developed family of algorithms, Sequential Monte Carlo (SMC) samplers. While MCMC, at its heart, manipulates a single sampling process, SMC samplers are an inherently population-based algorithm that manipulates a population of samples. This makes SMC samplers well suited to the task of being implemented in a way that exploits parallel computational resources. It is therefore possible to use emerging hardware (e.g., Graphics Processor Units (GPUs), Field Programmable Gate Arrays (FPGAs) and Intel's Xeon Phis as well as High Performance Computing (HPC) clusters) to make SMC samplers run faster. Indeed, our recent work (which has had to remove some algorithmic bottlenecks before making the progress we have achieved) has shown that SMC samplers can offer accuracy similar to MCMC but with implementations that are better suited to such emerging hardware.

The benefits of using an SMC sampler in place of MCMC go beyond those made possible by simply posing a (tough) parallel computing challenge. The parameters of an MCMC algorithm necessarily differ from those related to a SMC sampler. These differences offer opportunities for SMC samplers to be developed in directions that are not possible with MCMC. For example, SMC samplers, in contrast to MCMC algorithms, can be configured to exploit a memory of their historic behaviour and can be designed to smoothly transition between problems. It seems likely that by exploiting such opportunities, we will generate SMC samplers that can outperform MCMC even more than is possible by using parallelised implementations alone.

Our interactions with users, our experience of parallelising SMC samplers and the preliminary results we have obtained when comparing SMC samplers and MCMC make us excited about the potential that SMC samplers offer as a "New Approach for Data Science".

Our current work has only begun to explore the potential offered by SMC samplers. We perceive significant benefit could result from a larger programme of work that helps us understand the extent to which users will benefit from replacing MCMC with SMC samplers. We propose a programme of work that combines a focus on users' problems with a systematic investigation into the opportunities offered by SMC samplers.

Our strategy for achieving impact comprises multiple tactics. Specifically, we will: use identified users to act as "evangelists" in each of their domains; work with our hardware-oriented partners to produce high-performance reference implementations; engage with the developer team for Stan (the most widely-used generic MCMC implementation); work with the Industrial Mathematics Knowledge Transfer Network and the Alan Turing Institute to engage with both users and other algorithmic developers.

Key Findings
This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk
Potential use in non-academic contexts
This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk
Impacts
Description This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk
Summary
Date Materialised
Sectors submitted by the Researcher
This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk
Project URL:  
Further Information:  
Organisation Website: http://www.liv.ac.uk