EPSRC Reference: |
EP/V049208/1 |
Title: |
Mathematical foundations of non-reversible MCMC for genome-scale inference |
Principal Investigator: |
Koskela, Dr J |
Other Investigators: |
|
Researcher Co-Investigators: |
|
Project Partners: |
|
Department: |
Statistics |
Organisation: |
University of Warwick |
Scheme: |
Standard Research - NR1 |
Starts: |
01 February 2021 |
Ends: |
31 January 2022 |
Value (£): |
76,222
|
EPSRC Research Topic Classifications: |
Statistics & Appl. Probability |
|
|
EPSRC Industrial Sector Classifications: |
No relevance to Underpinning Sectors |
|
|
Related Grants: |
|
Panel History: |
|
Summary on Grant Application Form |
The world is undergoing an explosion of genetic DNA sequence data. Patterns within data sets carry information about unobservable biological and demographic histories of populations, which in turn are fueling discoveries in areas such as medicine, demography, and conservation. A central tool connecting observed patterns to testable predictions and inference is the Ancestral Recombination Graph, which models the patterns of common ancestry along sampled DNA sequences. Since common ancestry is typically not observable directly, inferences are made by averaging over possible ancestries.
In very simple cases the averaging can be carried out exactly, but in biologically relevant settings it typically has to be approximated. Typical approximation methods create an ensemble of candidate ancestries, and use the ensemble average as a proxy for the true average. The accuracy of this procedure depends on the degree to which the ensemble is representative of the set of all possible ancestries. The computational time required to guarantee a representative ensemble grows rapidly as the size of a data set increases, and in practice, such ensemble-based methods can only be applied to small data sets by modern standards. In practice, researchers resort to computationally faster methods, the theoretical performance of which is less well understood. The lack of theoretical foundations can complicate the interpretability of findings, and makes it difficult to accurately quantify their associated uncertainty.
A new class of methods for building representative ensembles, called zig-zag algorithms, has been developed and become increasingly widespread over the last several years. It has also shown promise in pilot applications in genetics, but an effective data structure for applying zig-zag methods to genome-scale data is an essential ingredient, and remains unknown. This project aims to develop and test suitable data structures, making possible the engineering of software packages which combine feasible run times with efficient use of statistical signal in data sets.
|
Key Findings |
This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk
|
Potential use in non-academic contexts |
This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk
|
Impacts |
Description |
This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk |
Summary |
|
Date Materialised |
|
|
Sectors submitted by the Researcher |
This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk
|
Project URL: |
|
Further Information: |
|
Organisation Website: |
http://www.warwick.ac.uk |