Details of Grant

EPSRC Reference:

EP/T030526/1

Title:

CITCoM: Casual Inference for Testing of Computational Models

Principal Investigator:

Walkinshaw, Dr N

Other Investigators:

Latimer, Dr N

Hierons, Professor R

Wagg, Professor DJ

Researcher Co-Investigators:

Project Partners:

Case Western Reserve University	Chalmers University of Technology	Defence Science & Tech Lab DSTL
STFC Laboratories (Grouped)

Department:

Computer Science

Organisation:

University of Sheffield

Scheme:

Standard Research

Starts:

01 January 2021

Ends:

30 June 2024

Value (£):

670,838

EPSRC Research Topic Classifications:

Software Engineering

EPSRC Industrial Sector Classifications:

Aerospace, Defence and Marine

Related Grants:

Panel History:

Panel Date	Panel Name	Outcome
20 May 2020	EPSRC ICT Prioritisation Panel May 2020	Announced

Summary on Grant Application Form

Computational models are being used increasingly to offer answers to important questions that affect us all. Scientists are increasingly resorting to computational models to simulate phenomena as diverse as the effects of drugs on a physiology, transmissions of diseases in a society, or the flow of blood through an artery. Within the public sector, computational models are fundamental to enabling the prediction of weather patterns, both in the short term and also to predict the impact of global warming in the longer term. They are also increasingly vital for supporting decisions on infrastructure spend; our project partners in the DAFNI project are developing computational modelling infrastructure to support the investment of £460bn over the course of the coming decade.

Given the high-stakes decisions that are usually involved, mistakes or "bugs" in a model can lead (and have led) to disastrous consequences. It is critical that these systems are rigorously tested to minimise this risk.

Computational models are however not amenable to traditional software testing and debugging techniques. They can include large numbers of parameters and configuration options. They can take a very long time (and require a lot of computational resources) to execute a single test run, which makes it infeasible to run large numbers of test executions. The data structures that they operate on can be particularly complex (e.g. 3D models of cities or coronary arteries), which means that these can be difficult to synthesise and inspect. Finally, if a test run is found to produce an incorrect result, these factors can make it very difficult to identify where the bug is in the source code of the model.

CITCoM is based on the observation that the challenge is in many ways rooted in data-analysis. In the presence of large numbers of input variables, there is the challenge of analysing the tested behaviour and ensuring that the observed behaviour is caused by the parameters that are the focus of the test (and not accidentally caused by other incidental parameters). There is the converse challenge of selecting which inputs need to be varied and which ones need to be controlled to demonstrate that a given combination of inputs causes a particular behaviour whilst keeping the number of test cases this requires to a minimum. If a fault occurs, there is the challenge of interrogating the data to locate the fault in the code.

Similar problems arise in a wide range of disciplines, and especially in the field of Epidemiology - where population data are scrutinised to determine the effects of drug treatments or medical interventions. Again, there are many variables at play (lifestyle, cultural background, genetic traits, habits). Collecting data can be expensive and time-consuming. Outcomes can be difficult to measure and complex to scrutinise. For such situations, the last decade has seen the rapid rise of a family of statistical analysis approaches called Causal Inference. This has enabled statisticians to design and reason about epidemiological trials and data in new and powerful ways to efficiently sample data, handle missing data-attributes, and use existing data to answer "what-if" questions, even if the data in question has not been collected yet.

CITCoM will use these powerful Causal Inference analysis capabilities to address the problems that arise when testing computational models. We will generate Causal Inference-driven automated test-generation techniques, test oracles, and debugging techniques. These will be trialled and honed on a set of large case-study models in collaboration with our partners on the DAFNI project at STFC, at DSTL, and within The University of Sheffield.

Ultimately, CITCoM will enable us to generate, collect, and analyse evidence from computational models to ensure that they do not contain faults, so that any decisions that they feed into are well-founded and trustworthy.

Key Findings

This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk

Potential use in non-academic contexts

This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk

Impacts

Description	This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk
Summary
Date Materialised

Sectors submitted by the Researcher

This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk

Project URL:

Further Information:

Organisation Website:

http://www.shef.ac.uk