Details of Grant

EPSRC Reference:

EP/V002694/1

Title:

New challenges in robust statistical learning

Principal Investigator:

Cannings, Dr T I

Other Investigators:

Researcher Co-Investigators:

Project Partners:

BIOS Health Ltd

Cambridge Cancer Genomics

Department:

Sch of Mathematics

Organisation:

University of Edinburgh

Scheme:

New Investigator Award

Starts:

01 May 2021

Ends:

30 April 2024

Value (£):

266,366

EPSRC Research Topic Classifications:

Statistics & Appl. Probability

EPSRC Industrial Sector Classifications:

No relevance to Underpinning Sectors

Related Grants:

Panel History:

Panel Date	Panel Name	Outcome
24 Nov 2020	EPSRC Mathematical Sciences Prioritisation Panel November 2020	Announced

Summary on Grant Application Form

In recent years, our ability to collect, store and process vast amounts of data, coupled with rapid advances in technology, have led to the widespread adoption of data-driven decision-making. This includes new application areas, such as precision medicine, where doctors are using data to inform their diagnoses and treatment recommendations. In other areas, such as finance, banks use huge amounts of historical data in order to decide whether a new customer is likely (or not) to default on their loan repayments. It is often the case that we are required to make a discrete prediction about some future patient or customer, based on some (training) data relating to existing patients. In statistics, problems of this type are called classification problems.

Many methods for classification are built on the assumption that any future data we may encounter has the same distribution as our training data. Of course, this assumption is not always valid -- data relating to one set of patients or customers will not necessarily follow the same distribution as data from a new set of people. In this research, we will develop new robust classification algorithms that can deal with noisy and incomplete data. In particular, the new methodology will enable practitioners to combine multiple sources of noisy data, propose modifications to existing methods in order to guarantee they are robust to corruptions in the data, and introduce novel ways of overcoming the issues caused by missing data. We will also provide new theoretical understanding of the limitations of decision-making algorithms when faced with noisy, corrupted and incomplete data.

There are a number of scenarios where our new approaches will be applicable:

- We may have data collected from patients in a particular location (lab or hospital) but wish to make predictions in a different location.

- We may not have access to the full dataset. For example, for privacy reasons, uses may not disclose some of their personal information. In other settings, we may be required to anonymise the data by removing some identifying covariates.

- Often the complexity of the type of data involved will mean that we don't observe the true data. Instead, we only have access to an approximation of the data. This typically occurs in modern settings, where practitioners use crowd-sourcing services such as the Amazon Mechanical Turk to label their data -- such services are rarely perfectly accurate.

- It may be that an adversary is able to arbitrarily contaminate a small proportion of the data (for instance by performing artificial activity online).

Our work will enable practitioners to utilise data that is currently not appropriate for use. We will also provide new insight into the kinds of data that are most useful for a particular purpose.

Key Findings

This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk

Potential use in non-academic contexts

This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk

Impacts

Description	This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk
Summary
Date Materialised

Sectors submitted by the Researcher

This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk

Project URL:

Further Information:

Organisation Website:

http://www.ed.ac.uk