EPSRC logo

Details of Grant 

EPSRC Reference: EP/L021749/1
Title: Sublinear Algorithms for Approximating Probability Distributions
Principal Investigator: Diakonikolas, Dr I
Other Investigators:
Researcher Co-Investigators:
Project Partners:
IBM Corporation (International)
Department: Sch of Informatics
Organisation: University of Edinburgh
Scheme: First Grant - Revised 2009
Starts: 01 September 2014 Ends: 31 August 2015 Value (£): 98,776
EPSRC Research Topic Classifications:
Fundamentals of Computing
EPSRC Industrial Sector Classifications:
Information Technologies
Related Grants:
Panel History:
Panel DatePanel NameOutcome
04 Feb 2014 EPSRC ICT Responsive Mode - Feb 2014 Announced
Summary on Grant Application Form
The goal of this proposal is to advance a research program of developing

sublinear-time algorithms for estimating a wide range of natural and

important classes of probability distributions.

We live in an era of "big data," where the amount of data that can be brought to bear

on questions of biology, climate, economics, etc, is vast and expanding rapidly.

Much of this raw data frequently consists of example points without corresponding labels.

The challenge of how to make sense of this unlabeled data has immediate relevance

and has rapidly become a bottleneck in scientific understanding across many disciplines.

An important class of big data is most naturally modeled as samples from a probability

distribution over a very large domain. The challenge of big data is that the sizes

of the domains of the distributions are immense, typically resulting in unacceptably

slow algorithms. Scaling up a computational framework to comfortably deal with

ever-larger data presents a series of challenges in algorithms.

This prompts the basic question: Given samples from some unknown distribution, what can we infer?

While this question has been studied for several decades by various different communities of researchers,

both the number of samples and running time required for such estimation tasks

are not yet well understood, even for some surprisingly simple types of discrete distributions.

The proposed research focuses on sublinear-time algorithms, that is,

algorithms that run in time that is significantly less than the domain of the underlying distributions.

In this project we will develop sublinear-time algorithms for estimating various classes

of discrete distributions over very large domains.

Specific problems we will address include:

(1) Developing sublinear algorithms to estimate probability distributions that satisfy various

natural types of "shape restrictions" on the underlying probability density function.

(2) Developing sublinear algorithms for estimating complex distributions that result

from the aggregation of many independent simple sources of randomness.

We believe that highly efficient algorithms for these estimation tasks

may play an important role for the next generation of large-scale machine learning applications.
Key Findings
This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk
Potential use in non-academic contexts
This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk
Impacts
Description This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk
Summary
Date Materialised
Sectors submitted by the Researcher
This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk
Project URL:  
Further Information:  
Organisation Website: http://www.ed.ac.uk